Monday, 16 September 2013

Overcoming the Relational Mindset

My current project is a bit of a departure for me as I’ve left behind the world of the classic SQL RDBMS for a moment and am working on one of those new-fangled NOSQL alternatives - MongoDB. Whilst I haven’t found any real difficulty adjusting to the document-centric world (thanks to too much XML) I have noticed myself slipping back into the relational mindset when making smaller changes to the schema. One such example happened just the other day…

Stop Extending Tables

Imagine you’re working for a retailer that has some form of loyalty card mechanism. Whenever you make a purchase you are told the number of points you have received for the current purchase, plus any accrued up to some date (notionally today). The initial part of the document schema might look something like this:-

LoyaltyBonus:

  CardNumber: “1234 5678”, 
  Points: 100
}

Now, the second part - the accrued points to date - has a slight twist in that the service required to obtain this data might not be available and so it’s not always possible to obtain it. Hence that part of the structure is optional. Slipping back into the relational mindset I automatically added 2 nullable attributes because what I saw was a need to extend the LoyaltyBonus “table” with two optional values like so:-

LoyaltyBonus

  CardNumber: “1234 5678”, 
  Points: 100, 
  BalancePoints: 999,        // Optional 
  BalanceDate: “2013-01-01”  // Optional
}

…and when the loyalty service is not available it might look like this:-

LoyaltyBonus:

  CardNumber: “1234 5678”, 
  Points: 100, 
  BalancePoints: null, 
  BalanceDate: null
}

Of course the null values can be elided in the actual BSON document but I’m showing them for example’s sake. The two attributes BalancePoints and BalanceDate are also tightly coupled, either they both exist or neither does. That might seem fairly obvious in this case, but it’s not always.

Documents, Not Columns

What I realised a little while later (after peer reviewing someone else’s changes!) was that I probably should have created a nested document for the two Balance related fields instead:-

LoyaltyBonus:

  CardNumber: “1234 5678”, 
  Points: 100, 
  Balance: 
  {  
    Points: 999, 
    Date: “2013-01-01” 
  }
}

Now the Balance sub-document exists in its entirety or not at all. Also the two values are essentially non-nullable because that’s handled at the sub-document level instead. The other clue, which in retrospect seemed blindingly obvious [1] was the use of the prefix “Balance” in the two attribute names.

 

[1] It’s never quite that simple in practice because you have probably already gone through a number of other refactorings before you got to this point. In a sense it’s a bit like going through the various Normal Forms in a relational schema - at each step you need to re-evaluate the schema and pull out any more sub-documents until you’ve factored out all the optional parts.

2 comments:

  1. Surely that's not a matter of abandoning the relational paradigm, but one of applying it fully? In the relational world, spotting two fields that are tightly coupled and either both set or both null would lead you to the entity/table "Balance" just as much as surely as it lead you to a sub document here.

    ReplyDelete
  2. @citychurches: Oh no, I'm not abandoning it, far from it. In the footnote I even allude to the use of the Normal Forms to decompose the model exactly as you suggest.

    In repreospect it's not entirely clear, but I was thinking more about the smaller elements of the model. Where in a RDBMS you can end up with many thin tables dominated by the primary key, the cost of factoring out the same data in a document model *appears* to be much cheaper.

    In reality this is the start of a NOSQL journey for me and I have no idea where the trade-offs are yet. I'm just glad I've got people like yourself watching my back :-).

    ReplyDelete