Aug 272009
 

The Snapshot Isolation Level and Read Committed Snapshot features were a big improvement when SQL Server 2005 came out – finally, SQL Server had optimistic locking as well as pessimistic locking!  This is a feature that allows writers not to block readers, and readers will not block writers – instead, the readers will look at the most recent row, and ignore the fact that it’s currently being written to.  This sounds great on first inspection – long running transactions won’t block other transactions! If you accidentally have a BEGIN TRANSACTION but you haven’t followed up with a COMMIT or a ROLLBACK, other users won’t be blocked.

It recently came up in discussion that you can just turn on Read Committed Snapshot, and a lot of your blocking will be a thing of the past.  While this is true, I strongly discourage turning on Read Committed Snapshot for existing applications unless you know exactly how your application will respond to it.

Read Committed Snapshot is a modification to the Read Committed Isolation level that uses row versioning to read the previous value.  The trouble comes when you realise that Read Committed Snapshot is the default isolation level, and once you’ve turned it on, every single read committed transaction will run as a snapshot. 

If the developers of your application expect that Read Committed will be the default (back when they wrote the application when SQL Server 2000 was the latest version, this was a reasonable assumption), then changing Read Committed’s behaviour can have unintended consequences.

Let’s look at an example.  We’ll create a new database, containing a single table, Inventory.  This is a simple stock control system, and we’ll be looking at the procedure to check out an item. It’s worth noting that there are better and safer methods of writing the stock checkout function to not have this issue (such as using Repeatable Read), but it’s very possible that the developers of your application used something similar, and it worked “good enough”.

CREATE DATABASE ReadCommittedSnapshotTest
GO

USE ReadCommittedSnapshotTest
GO

CREATE TABLE Inventory (
      ItemID int NOT NULL 
    , LocationID int NOT NULL
    , Qty int
    , PRIMARY KEY (ItemID, LocationID)
)
CREATE NONCLUSTERED INDEX ncixLocationItem ON Inventory(LocationID, ItemID)
GO

Next, we’ll insert some data.

SET NOCOUNT ON
INSERT INTO Inventory (ItemID, LocationID, Qty)
VALUES(rand() * 1000 + 1, 1, rand() * 40 + 1)
INSERT INTO Inventory (ItemID, LocationID, Qty)
VALUES(rand() * 1000 + 1, 2, rand() * 40 + 1)
INSERT INTO Inventory (ItemID, LocationID, Qty)
VALUES(rand() * 1000 + 1, 3, rand() * 40 + 1)
GO 2000
-- Ignore any key violation errors - we'll still get enough data
INSERT INTO Inventory (ItemID, LocationID, Qty) VALUES (796, 1, 5)

For the next part, we need to run the following query on two separate sessions at the same time.  I’ve added a 10-second delay in processing the transaction – you can imagine that there may be other processing required to complete the checkout, and this may take a second or two.  Ten seconds is a little unrealistic, but provides enough time to run the query in the other window.

The way this transaction works is that we begin a transaction, and read the number of items that are in stock at the moment.  If there are more than we want to take, we update the Inventory.

BEGIN TRANSACTION
DECLARE @QtyRequired int, @QtyRemain int
SELECT @QtyRequired = 4
SELECT @QtyRemain = SUM(QTY) FROM Inventory WHERE ItemID = 796 AND LocationID = 1
IF @QtyRemain - @QtyRequired >= 0 
BEGIN
    UPDATE Inventory SET Qty = Qty - @QtyRequired
    WHERE ItemID = 796 AND LocationID = 1
    -- Do other stuff in other tables or databases to check out the item
    WAITFOR DELAY '00:00:10'
    SELECT 'Checkout complete'
END
ELSE
    SELECT 'Not enough qty!'
COMMIT TRANSACTION

SELECT * FROM Inventory WHERE ItemID = 796

If you run the two queries together, you’ll notice that both queries take about 10 seconds to run, and the one that ran first will report “Checkout complete”, and the other will report “Not enough qty!”.  This is good – the second query was blocked until the first was finished.

Let’s turn on READ COMMITTED SNAPSHOT and see what the effect will be.  First we’ll replace the items so we can run the same query again.

-- Replace the inventory
UPDATE Inventory SET Qty = 5 WHERE ItemID = 796 AND LocationID = 1

-- Turn on READ_COMMITTED_SNAPSHOT
ALTER DATABASE ReadCommittedSnapshotTest
SET READ_COMMITTED_SNAPSHOT ON
-- You may need to disconnect the other session for this to complete.

Run the two queries again, side by side.  This time, it still takes 10 seconds to run both queries, but the second query returns a –3 quantity.  Oh dear.

As mentioned, using Repeatable Read would help solve this issue, as you would be guaranteed the data wouldn’t change between the initial stock count check, and the update.  As the two queries are VERY close together, you may never see this issue with the code above, but it still can happen.  As I mentioned, this is an example of “good enough” code.  However, the difference is that the problem only has a narrow window of a few milliseconds to occur with READ COMMITTED, but has 10 seconds to occur with READ COMMITTED SNAPSHOT.

The conclusion of this example is that your application may have unexpected results if you blindly turn on READ COMMITTED SNAPSHOT.  It is a great feature of SQL Server 2005, but unless you know exactly what the effects will be, I don’t recommend turning it on for existing systems.

  22 Responses to “The Potential Dangers of the Read Committed Snapshot Isolation Level”

  1. I cannot agree with your conclusion. SNAPSHOT will give you a correct point in time picture. READ COMMITTED is just as vulnerable to the problem you are demonstrating as READ COMMITTED SNAPSHOT is.

    Examples:

    http://www.simple-talk.com/sql/t-sql-programming/the-case-of-the-skewed-totals/

    http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/10/selects-under-read-committed-and-repeatable-read-may-return-incorrect-results.aspx

  2. Hi AK,

    Thanks for your comment!

    In the example, READ COMMITTED (RC) is safer than READ COMMITTED SNAPSHOT (RCS) as the RC UPDATE will lock the row, and the second session will be blocked until the update has finished. Whereas with RCS, they both read the same data without locks, and as the transaction takes a number of seconds to process, there’s a higher chance that the two will be in conflict.

    RC is vulnerable as well, but the chance of it occurring is a lot smaller, as the window is only between the time the of the the SELECT Qty and the UPDATE Qty. Whereas RCS allows the problem to creep in any time during the 10 second transaction.

    As I said, it wasn’t perfect code – but it’s code of a quality that I have seen before. The problem isn’t so much the code, as the effect of changing the entire database’s READ COMMITTED behavior from a pessimistic concurrency model to an optimistic concurrency model. It can change the entire behavior of the application.

  3. This could not have been a worse sample to blame it to the isolation level.
    There are many things wrong with your code.
    1. LocationId + ItemID was defined as the PK of the table so there can only be ONE Qty for the pair, then why SUM() ?
    2. You are suposed to use (QUERY HINT) locks if you are running this in separated statements even with read commited isolation. A simple with (UPDLOCK,ROWLOCK) will suffice on the select query to make the code work under any isolation level.
    3. IF you turn the Update as an update + Read operation it will go all smoothly into a single statement which will maintain all ACID properties and no hint is needed at all.

  4. The Potential Dangers of the Read Committed Snapshot Isolation Level…

    Kudos for a great Sql Server article – Trackback from SqlServerKudos…

  5. I must agree with AK here; the way your code is written you are vulnerable in either case. The only reason you are not seeing as frequent in your READ COMMITTED scenario is due to where you have the waitfor delay statement. Move the waitfor delay, up to after you are selecting the quantity – and you will see the same behavior with “normal” RC.

    If this is a scenario you are concerned about, then you should use an update locking hint in your select (but you know that already :) ).

    Niels

  6. I agree with all comments so far.

    I’m definitely not making excuses for this code – it’s terrible, but under a pessimistic concurrency model it works “well enough”. And if it works “well enough”, then there’s a possibility that someone working for your application vendor has written code like this without a thought as to concurrency models. In my horrible code sample, this “well enough” has gone from “good enough to work, but still a small chance of problems” to “guaranteed to break much more often” due to the longer available window the row is committed under RCS.

    The point of the post comes down to this: if you’re reading data from a table, and something happens to be modifying the data, and the original developer expects the code to wait until the lock occurs, then you’ll be in trouble if you turn on READ COMMITTED SNAPSHOT to try to reduce blocking.

    The idea of the post originally came from a forum post where a guy performed an BEGIN TRAN…UPDATE and didn’t commit or rollback, and wanted to know if there was a way to not block other users (well, apart from committing).

    noeldr, regarding your point #1, the SUM() was from an evolution of the code and can be removed. It took me a while to write code this bad! :)

  7. [...] There are known concerns with Read Committed Snapshot that mainly exist in older applications that were developed without this isolation level in mind.  For a more detailed description, I recommend checking out this blog post here.  [...]

  8. I must say that the example is not the best, but also I have to recognize that it is a true real life example of how bad people write queries. Obviously that we can write better, more consistent and efficient code ways, but the ugly true is that you will find always code like the example, how many developers knows the SQL hints? I think that the example provides maybe one of the most common and lame scenario that we can fine.

  9. This post is very interesting and quite intriging, but I got truly lost.

    How a good quality code performing the above shoold look like?

    How to do excactly what the code is doing in snapshot isolation and without it?

  10. Good example to demonstrate the negetive effect of Read commmited snapshot. subsequent Comment that suggested to place Updlock,rowlock is also good way around.Thank you.

  11. If you put WAITFOR DELAY before update statement, you get the same result also with default read committed isolation level.
    So, in fact, there is no difference at all. Read committed has only less chance for this to happen but has a lot more locking problems.
    While on the other hand, with snapshot you won’t have locking problems and you will found out the mistakes like this sooner and have chance to rewrite the query.
    Mistakes, which you don’t know they exists, are the most dangerous one.
    So, I vote for snapshot even if this happens.

  12. I’m sorry, works “well enough” is not “good enough” for me. I would not accept that code in my databases. For a “test and update” type of code, you have to use the correct locking using hints.

  13. Jim, despite all the comments on your code, I do agree that most developers do not understand enough of database concurrency issues to avoid them. Yes, the code example might not be typical for a script or a stored procedure. A skilled DBA should review this code and improve it before it goes in production.

    But in many applications this ‘first check and then modify’ pattern occures quite often. It works pretty well most of the times, and the time between the SELECT and UPDATE statement is so small that years may pass before a single errorneous UPDATE occures. I am both a .NET developer and a skilled SQL DBA but even for me it is hard to develop applications do not contain any concurrency issues. You cannot always forsee or imagine all possible scenarios.

    Microsoft does not place any emphasis on this aspect of snapshot isolation. Whatever other people may say, I am very grateful for this post. Read Commited Snapshot is no longer an option for databases accessed by third party software. I’d rather not take a change on it, and stick to the good old fashioned shared locks. Thanks for this post!!!

  14. I use read committed snapshot and I don’t have any concurrency problems.
    You must follow one of this rules:

    1. Never update based on the old select value like in this example. You should always check in update statement, for example if available quantity is still ok.

    2. If that is not possible put update row lock in select statement. I mostly use this approach.

    Since there is usually a lot of select statements without updates, you can only benefit from Read committed snapshot.

  15. What about Snapshot Isolation?
    You have enabled the Read Committed Snapshot but not the Snapshot Isolation level. Use

    ALTER DATABASE ReadCommittedSnapshotTest SET ALLOW_SNAPSHOT_ISOLATION ON

    This will enable the snapshot isolation and when the second transaction will try to update the table it will get the error and transaction will rollback. Although yes this will allow to read the value which is under modification.

  16. I think you are all missing the point.

    Jim isn’t saying that this is something a good developer that only runs their own programs has to worry about.

    When you are setting up a database for a client that is using applications made by other developers, this is something you need to worry about. They may have used a “works well enough” philosophy. Under these circumstances, turning on read committed snapshot could make a bad situation worse.

  17. Thanks Jim, you article has proved very useful at the moment

  18. I don’t know which side to come down on on this.

    On the one hand it’s a point well made that enabling snapshot is far from risk free for legacy code. On

    On the other hand this example demonstrates that you *can* create new databases with snapshot enabled. I say that because there are enough ways to have concurrency problems with standard read committed and the fictional creator of your code has been forced to consider these through trial and error just enough to pass muster.

    However, if you wrote that code from scratch in a snapshot enabled DB you’d get the advantages of snapshot plus it’d force you to write better code.

  19. I don’t know which side to come down on on this.

    On the one hand it’s a point well made that enabling snapshot is far from risk free for legacy code.

    On the other, this example demonstrates that you can create new databases with snapshot enabled. I say that because there are already enough ways to have concurrency problems with read committed and the fictional creator of your code has been forced to consider these problems through trial and error just enough to pass muster.

    However, if he had written that code from scratch in a snapshot enabled DB all would get the advantages of snapshot plus it’d force him to write better code.

  20. Great post. Thank you.

  21. [...] A complication is that the ANSI specification doesn’t capture everything that people commonly think things like dirty read, fuzzy/non-repeatable read, etc. mean in practice. And, there are anomalies (permitted by the ANSI definitions) that can occur under READ COMMITTED SNAPSHOT that cannot occur under READ COMMITTED. For an example, see http://www.jimmcleod.net/blog/index.php/2009/08/27/the-potential-dangers-of-the-read-committed-snaps…. [...]

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>