The other day whilst scanning the backlog of change requests in JIRA I came across the following entry:-
Replace the use of nchar(x) with nvarchar(x)
And that’s all it said[#]. Now I knew instantly which tables the author was referring to because, like many teams, we only use a fixed width char(x) column for flag type fields such as Y/N or true fixed width strings like currency codes. The reasons why and how these couple of table definition slipped past the informal code review process is somewhat irrelevant, but suffice to say that we were days away from release and chose to take the ‘hit’ instead.
The fact that variable width text was being stored in a fixed width column was not causing any problems within the database or the back-end server code. Yes it was annoying when you cut-and-pasted result values from SSMS (SQL Server Management Studio) as you got a load of trailing spaces that you just had to manually trim:-
...WHERE t.Column = ‘VALUE ‘
So I put this one aside mentally filing it with the other issues of Technical Debt as there was nothing to fix per-se – the mechanism was ugly but it worked. There was also no reason to suspect it may cause any performance problems either as the tables contain relatively few rows of data.
Then a conversation some weeks later with the change request submitter brought up the subject of raising Technical Debt style change requests and I opined that now we were live the database schema is potentially a whole lot harder to change because we have data to migrate as well[+]. In this instance it wasn’t even vaguely in the area of rocket science, but nevertheless it was non-trivial and ultimately had to be paid for. And then the real requirement came to the surface and brought with it a new perspective…
These fixed width columns were not causing the back-end any problems because we never consumed them – we only every compared them to string literals or ran support queries. The impending GUI on the other hand had controls that displayed some of these fields and apparently the UI framework would bloat the size of the control to allow room for what it presumably thought were always going to be long textual values. My gut reaction was abhorrence at the thought of changing the database schema of a base table just to fix a layout issue in the GUI; that is the kind of nasty coupling that causes change paralysis because managers get worried about what will break. I understood it was a pain having to manually deal with the fallout in the GUI layer but there should be other much simpler solutions that didn’t involve writing SQL scripts to fix the table schema and migrate the data – namely trimming the string somewhere in the data access layer or the SQL query used to fetch the data[*].
Hold on a second I thought. What are you doing directly accessing base tables anyway? The rest of the public interface to the database is via stored procedures and views – you’ll need to abstract that access away behind one of those before going to production. More importantly that gives us another option for remediating this behaviour without touching the schema and data, and better yet, that would be inside the database public interface and so allow the refactoring to occur without changes to any clients. Of course this then forces the issue of what the public interface to this mechanism should really be and at that point it seems prudent to design it properly if the existing implementation is going to leak out badly. But if it doesn’t we can save that job for another day.
The important thing though is that we have identified the root cause and can therefore now begin to assign some value to the potential choices and costs. When the request implies changing the schema of persistent data purely for the sake of it, the risk seems high and costly and the reward seems low. But when faced with a path that allows for incremental change so that the full cost and risk is distributed over smaller steps it seems much more appealing as you can put your spade down at any point.
So how would I have initially phrased that change request? Well a few weeks ago I would have gone for something like this (I can’t remember what the exact issue with the GUI was so it still seems somewhat lame I’m afraid):-
Laying out the XYZ controls has to be done manually because the ABC data has extra whitespace padding due to the use of nchar(x) instead of nvarchar(x) for the FGH columns. These columns should probably be nvarchar(x) anyway as they contain variable width text.
However a recent article by Allan Kelly in the ACCU magazine Overload (The Agile 10 Steps Model) contained a sidebar titled “What’s the Story?” about User Stories. This described a common format for stories:- “As a [Role] I can [Action] So that [Reason]” that felt akin to the use of “[action] [should] [when]” that I follow when writing test case names (itself a variation of the “[given] [when] [then]” style). I felt that this would allow those of us less natural writers to still construct a succinct requirement even if it does sound somewhat robotic, but more importantly it would help bring the value of the request to the forefront. User Stories is something that I’m vaguely aware of but have largely glossed over because my requirements are nearly always technical in nature and tend to come through very informal channels so any connection with ‘End Users’ is usually missing; it just hadn’t clicked that “User Stories” could just as easily be read as “Developer Stories” or “Support Stories”. So this is my attempt at writing it this way:-
As a [developer] I can [trim the whitespace from the ABC data] so that [the XYZ controls are not laid out manually].
OK, so this reads very unnaturally because the story format implies a repeatable action rather than a one off event. A small and hopefully obvious change to the wording and order though should make this more natural:-
As a developer it would save time if the XYZ controls were laid out automatically by ensuring the ABC data contains no excess whitespace.
Do you think that reads better or not? I think the value, while still somewhat intangible as all technical requirements like this are, is clearer because you are forced to justify the action rather than providing it as an afterthought. The beneficiary of the requirement is also clearly stated which helps easily separate the functional from the non-functional.
Of course what the original submitter would probably have written would still be something like this:-
As a developer I need to replace the nchar(x) columns with nvarchar(x) columns so that the GUI layout is automatically laid out.
…because the [action] would likely still be the same, but one would hope that when writing the [reason] alarm bells should start ringing as the tight coupling between the database schema and GUI becomes apparent.
[#] This is also a common failing of n00b posters on sites like Stack Overflow. They have already decided on what their solution is and are trying to elicit an answer that helps implement that solution rather than taking the time to explain the underlying problem in case the reason they can’t see the proverbial wood is because there are a load of trees in the way.
[+] As you will see later our well defined SQL public interface and set of SQL unit test means that it is actually quite easy to refactor the schema.
[*] Eventually an O/RM such as Entity Framework or NHibernate will no doubt be in place to tackle this, and so it will be interesting to see how you would go about dealing with this kind of problem when you don’t immediately have access to either the SQL or the code that binds the result values to the DTO members. I’m sure it is easy, but how much out of the way would you have to go?