Tuesday 23 September 2014

Deferring the Database Choice

In my recent post “The Courage to Question” I described how the choice of database technology had been another “challenge” for the team. In a much earlier project we had been working with a NOSQL solution, which was the perfect fit for the very large data set, only to be told (as we geared up for production) that we must use a classic SQL technology instead. You can probably imagine our surprise and the ensuing debacle.

When the next project came up we were a little more wise and the choice of database technology was far less critical, at least on face value. However one of the features touted suggested to me that we might be better off with two different databases - one of each sort - as the usage was inherently different. Either way we didn’t want to get embroiled in any “discussion” before we had a better idea of what it was we were actually building, and so we elected to go for no database at all.

No formal database product that is. Our set of functional acceptance tests only involved a single web API service and so “an in-memory database” was the simplest thing we needed. Obviously by “in-memory” what I mean is a dictionary of key to (typed) object. This also allowed us to begin designing the data storage layer without having to get a database into the build pipeline, etc.

Reasonably soon into the project we reached a level of maturity in the demo system where not having any persistent data became a chore for the non-development staff as we were deploying so quickly the existing state kept getting wiped. It was time to consider whether we needed a real database.

We decided not to. In the background we had expressed a preference for which technology we’d prefer to use from a developer productivity perspective, but that appeared to be of no consequence. So, to minimise the disruption that would likely occur when adopting the real technology, we switched to a really dumb file-system based approach. Basically the persistence layer used the same JSON serializer we were using in the web API to turn the data model in to JSON strings that we then persisted as files, where the filename was the object’s primary key.

This got us over the persistence hurdle which helped during the demo-ing and also gave us time to develop the data model a little further before committing to a representation in a database. It was this latter procrastination around settling on a data model that had an interesting outcome. Where in the past the data model might have been modelled up front with a classical Customers / Orders / LineItems style hierarchical relationship we actually ended up with lots of little data silos that happened to have a similar primary key (e.g. customer ID). However the need to relate them formally with any sort of aggregate root (e.g. a Customer table) simply didn’t exist.

It also became apparent that one aspect of the data could even be viewed as an independent service, especially when looking further into the future to see what the strategic plans were for this part of the data. I’m loathe to appear to be jumping on the micro-services bandwagon [1]  but this felt like it was a very simple data service (e.g. an Address Book) that could be independently developed, deployed and serviced. The cost of splitting the services apart at that point in the project’s evolution would have been too high as we still had a lot of functionality to discover about the system we were building. But it was definitely firmly lodged in our minds that this might be desirable and even bandied around the idea of at least splitting the code into a separate assembly to at least enforce some extra partitioning for the time being.

It was an interesting exercise to live without a real database for as long as possible and it definitely had an enlightening effect on both the persistent data model and system architecture.


[1] Although I’ve read Lewis & Fowler’s blog post on the topic I wouldn’t want to presume that I have understood the salient points.

No comments:

Post a Comment