Tuesday, 22 December 2015

Observable State versus Persisted State

A while back I was working on a replacement service that was intending to use one of those new-fangled document-oriented databases (Couchbase as it goes). During the sprint planning meeting we had a contentious story around persisting data and what it meant to handle multiple writes in a single “business transaction”. There was some consternation that because there is no native transaction support (or locking) to ensure we got an atomic commit on success, or a rollback if a problem occurred somewhere, then we couldn’t deliver the story on that technology stack.

Effectively we had reached the point where we were handling the stories around idempotency and the story had wording in it that assumed a classic relational all-or-nothing style of transactional writing which we naturally couldn’t have. The crux of the question was whether we could perform our writes in such a way that if an error occurred any invariants would still remain, and if the request was retried then we’d be able to complete it after being left temporarily in a potentially half-finished state.

Atomic Multi-Document Writes

The problem revolved around creating a number of child documents (e.g. Orders) for a root document (e.g. Customer). When using a traditional database the child records could just be written as-is because they will not be visible until the transaction is committed (ignoring dirty reads). If an error occurs at any point whilst writing, the whole lot are removed. If the database goes down before the commit is persisted it will roll-back the transaction if it needs to on restart. Either way any invariants violated during the writes are invisible outside the transaction.

Non-Atomic Multi-Document Writes

Whilst writes are atomic at a document level, they are not when multiple documents (or many, separate writes to the same document) are involved. As such we need to perform each insert, update and delete in a way that assumes we might lose connectivity at that moment.

The first problem is ensuring that a failure after any single write cannot leave the data in a state where any invariants have been violated. For instance if the model says that there is a two-way relationship between two documents, then only having one-half of it is unacceptable because navigating the other way will generate an error.

As a consequence of partially written data being a possibility due to a lack of transactions, we likely have to adopt an error handling strategy that either unwinds the state or moves it forward to achieve the original desired outcome [1]. For this to happen we will almost certainly be looking at using idempotent writes where we can try the same action again and again and not incur any additional side-effects if it has already completed successfully (e.g. a counter is incremented once, and only once).

The Observable Effects of Idempotency

And so we come back to the problem we encountered when discussing the story – what exactly does idempotency mean? The way it was worded in the story was that any failed business transaction must not leave any residual state behind. Given the way that the database works and the kind of business transaction we were trying to do meant that this was simply impossible to achieve. With an air of defeat the discussion turned to how we can switch back to using a traditional transactional database to meet this story.

However, I wanted clarification around what it meant for “no state” to be left within the database. What I thought the intent of that phrase really meant was “no observable state” should be left around if the transaction fails. If we consider the system as a black box, not a white one, then we can leave residual state lying around just so long as it is not visible outside the system. And as long as the system is only accessible via our public API we can control how temporary state can remain hidden.

But how? In this instance if we ordered our writes carefully enough we can ensure that any invariants remain intact after every single write. We just need to be careful about how we define when a piece of data becomes visible through the public API.

Example: File-System Writes

To understand how this can be achieved think about how a modern day editor, such as MS Word, saves documents. It does not just open the file and start writing because if it did and the machine failed both the old and new documents would be lost. Instead it follows a sequence something like this, to minimise the loss of data:
  1. Write the new document to a temporary file.
  2. Rename the current backup file to a temporary name.
  3. Rename the old document to make it the backup.
  4. Rename the temporary file to the document’s name.
  5. Delete the old backup file.
In fact this pattern of file-system behaviour (write + rename) is so common that NTFS even recognises it to make sure the newly written document carries over the previous file’s creation date to make it appear as if it just updated the old file.

What makes this work is that the really dangerous work is done off to the side (i.e. writing the new version of the document) leaving just some file-system metadata changes (3 renames and a delete) to “commit” the change. I touched on this idea before in “Copy & Rename (Like Copy & Swap But For File-Systems)” after having to deal with torn files due to a badly written file transfer process.

Idempotent Writes

The way to achieve the same effect in the database is also by writing in a particular way and by tagging each business transaction with a unique ID that we can use to replay or recover from after a failure.

In our example we split the writes up into two stages:
  1. First insert the child documents.
  2. Then update the parent document to refer to them.
It might seem as though the child documents would be visible after the initial write but they aren’t because the public API only publishes the ID of children who are referenced in the parent. As such there may be state persisted, but it is not observable until the single write at the end of the parent document, which is atomic.

The relationship is actually bidirectional (you can find a child and lookup its parent) which might seem like a loophole until you consider the previous point – the child is not publicly visible until the parent has been committed. You can’t ask for the child because you have no way of knowing of its existence via the public API.

The way the idempotent ID works is that it is logged against certain writes so that we can tell what has and hasn’t been performed already. So in our example above each child document is created (possibly with the idempotent ID [2]) and when we add the references into the parent we tag it with the idempotent ID so that we know we completed the transaction. If it fails at any point we can just discard the temporary child documents and recreate them. This does mean we have the potential for detritus to be left around on failures, but they should be rare and can be “garbage collected” in slow time using a background process [2].

Scalability

This technique works for simple object models which is how I’ve used it. It can be extended to some degree if you are willing to add complexity to your model (and probably increase the number of I/Os) by creating more elaborate “invariants”. For example if the sender could have controlled the child document ID it might mean that the public API would have to navigate from child to parent to validate its existence (presence of the document alone not being enough).

Given the choice between using a classic transactional database and having to think really hard about this stuff it’s probably not worth it. But if you have a simple object model and are looking at alternatives for performance reasons, then you need to think a bit differently if you’re going to cope without transactions.


[1] Just ignoring a part-failed request and leaving the data in a valid, but unusual state, should be possible but highly undesirable from a support perspective. It’s hard enough piecing together what’s happened without being plagued unnecessarily by zombie data.


[2] It’s not essential if you always re-submit and roll forward, but can help in the aftermath if cleaning up. It would probably be required though if you needed to roll-back first as it may be the only key you have to the document at that point.

No comments:

Post a Comment