Thursday 12 July 2018

Optimistic SQL

One of the benefits of learning other programming languages is the way it teaches you about other paradigms and idioms. This is the premise behind the “Seven Xxx in Seven Weeks” range of books. Although I have the database one on my bookshelf I’ve only ever skimmed it as at the time I bought it I suddenly found myself leaving the world of the classic RDBMS behind and working with other types of DB for real; most notably the document-oriented kind.

Although some of these products like MongoDB and Couchbase have come a long way from their early beginnings as highly available key-value stores they often still lack the full-on transaction support of the old stalwarts like SQL Server and PostgreSQL. Coupled with a high-availability service you have to think differently about how you react to concurrency conflicts as explicit locking is almost certainly never the answer [1].

The impetus for this post was going back into the world of SQL databases and being slightly bemused by a stored procedure that appeared to implement an “upsert” (an UPDATE or INSERT depending on whether the row already exists) as I realised it wasn’t how I’d approach it these days.

The Existing SQL Approach

Initially I was somewhat flummoxed why it was even written the way it was as there appeared to be no concurrency issues in play at all, it was a single service doing the writing, but I later discovered that an accident of the implementation meant there were two writers internally competing and they chose to resolve this in the database rather than remove the root source of concurrency in the service.

The upsert was basically written like this:

  • Try SELECTing the existing row.
  • If it exists, UPDATE it.
  • If it doesn’t exist, INSERT it.

In the service code there were a number of comments describing why the transaction level was being bumped up to “serializable” – it was effectively to deal with the concurrency they had introduced within the code by creating two competing writers. On top of that the initial SELECT statement in the upsert applied a HOLDLOCK which also effectively makes the transaction serializable because it puts a range lock on the row’s key (even if that key doesn’t exist yet).

The Document DB Approach

The last few years away from the relational world meant that I was used to dealing with these kinds of conflicts at a slightly lower level. Also, dealing with document updates in the service rather than writing them as SQL mean that updates were done in a server-side loop rather than pushing the concurrency issue down into the database, hence it would look more like this:

  • Try selecting the document.
  • If it exists, update it and try writing it back.
  • If it doesn’t exist, try creating it.
  • If any write fails start over from the beginning.

Due to the lack of transactions and locking, write conflicts are commonly detected by using a version number attribute that gets used in the update predicate [2]. (A write failure, via a “document not found” error, means the predicate failed to match the specific document and version and therefore a conflict has occurred.)

Another SQL Approach

So what does all this have to do with upserts in SQL?

Well, what I found interesting was that my gut reaction was to question why there is the initial select there as I would have written it as:

  • Try to UPDATE the row.
  • If no rows were updated, then INSERT it.

This particular order makes an assumption that updates are more prevalent than inserts and as a I rule I’d say that checking @@ROWCOUNT to see if anything was written is far less ugly than adding a TRY…CATCH block in T-SQL and attempting to verify that the insert failed due to a primary key violation.

That all seemed fairly obvious but I had forgotten that with the document DB approach you tend to expect, and handle, write failures as part of handling concurrency, but in this case if two connections both attempted the insert concurrently it’s theoretically possible that they could both fail the UPDATE step and then one of the INSERTs would succeed and the other would fail resulting in a primary key violation. However the code in the service was not written to detect this and retry the operation (as you would with a document DB) which is why the initial SELECT is there – to lock the “unwritten row” up front which ensures that another transaction is blocked until the row is then inserted or updated. This way no client logic needs to handle the concurrency problem.

However I believe we can still achieve the same effect by adding the same HOLDLOCK hint to our initial UPDATE so that if the row does not exist other writers will be blocked by the range lock until the subsequent INSERT goes through. Hence the initial SELECT is, I believe, redundant.

The MERGE Approach

At this point I remembered that way back in the past SQL Server introduced the MERGE operation which effectively allows you to write an upsert with a single statement as you factor both the hit and miss logic into different branches of the statement. This caused me to go looking to see what the start of the art in upsert techniques were, possibly with performance comparisons to see how much faster this must be (given that SQL Server clearly has the potential to optimise the query plan as it better knows our intent).

I started digging and was somewhat surprised when I came across the page “Performance of the SQL MERGE vs. INSERT/UPDATE”. I was expecting to have my hypothesis validated but discovered that the answer was far from clear cut. Naturally I then googled “SQL Server upsert performance” to see what else had been written on the subject and I discovered this wasn’t an anomaly so much as a misunderstanding about what problem the MERGE statement is really intended to solve.

You should of course never take performance improvements at face value but “measure, measure, measure” yourself. I wasn’t avoiding doing that, I was looking to see if there might be any pitfalls I needed to be wary of when benchmarking the approach.

At this point I haven’t gone any further with this as it’s more of a personal investigation (there is no actual performance issue to solve) but it just goes to show that writing SQL is as much an art as it’s always been.

 

[1] Some document databases, such as Couchbase, do support locking of documents, but there are heavy restrictions so you tend to find another way.

[2] In the particular example I was looking at no version number was needed in the SQL predicate because the data had a total ordering independent of the write order (it was tracking the minimum and maximum of a value over the day).

No comments:

Post a Comment