Friday, 28 May 2010

Oldwood v4.0

Today saw the release of "Oldwood v4.0" after just 9 months in development. In accordance with the marketing department's wishes it is to be re-badged the "Seth" edition. Screenshots will be available on the web site shortly, but once again no manual can be provided.

Sunday, 23 May 2010

Debug & Release Database Schemas

In the C++ world it’s common to build at least two versions of your code - one Debug and one Release. In the debug version you enable lots of extra checking, diagnostics and instrumentation to make it easier to track down bugs. This is often done at the expense of speed, which is one reason why you might switch these features off in your production Release[*]. For example it’s all too easy to enter into ‘Undefined Behaviour’ in C++ such as by indexing into a vector with an invalid index. In a release build you won’t get notified and you can easily get away with it if you’re only off-by-one, but in a debug build the code should complain loudly. This notion of Undefined Behaviour doesn’t seem to exist in the C# and SQL worlds, at least it’s not nearly as prevalent, so is there any need for a Debug C# build or Debug Database Schema? I think the answer to both is yes and this post focuses on the database side.

Why Have Separate Builds/Schemas?

How many times have you debugged a problem only to find that it was due to some garbage data in your database? In theory this kind of thing shouldn’t happen because databases have support for all manner of ways of ensuring your data is valid, such as Check Constraints, Primary Keys and Foreign Keys. Perhaps ‘valid’ is too strong a word and ‘consistent’ is more appropriate. But even then, such as when you’re mapping objects to tables, your foreign keys can only partly protect you as there may be parent/child table relationships that depend on a ‘type’ column for discrimination. And sometimes it’s just too easy to screw the data up by running a dodgy query - such as forgetting the WHERE clause when doing an UPDATE…

If the database provides all these wonderful features why do some developers avoid them? Putting ignorance to one side, it seems that performance is often a key reason. Some tables are hit very hard and you believe you just can’t afford to have a whole bucket load of triggers and constraints firing constantly to verify your data; especially when you know it’s sound because your extensive system testing ‘proved’ it. So let’s just reiterate that:- you’re avoiding error checking features in your database to improve performance… and how exactly is that different from the scenario I’ve just described above about C++ development?

Perhaps instead of blindly disabling features that actually have value during development to satisfy a production requirement can we have our cake and eat it? Can we enable some features during development to aid with testing and safely turn them off in production because they are free from side-effects and therefore have no impact on the observable behaviour?

Enable Everything by Default

I believe the default position is that you should use whatever features you can to ensure your code is correct and your data remains sane and consistent. The only reason to start removing error checking should be because you have measured the performance and can demonstrate the benefits of removing specific constraints. Intuition may tell you that with the volume of data you’re going to be dealing with in production you really can’t afford certain foreign key relationships or constraints; but remember that’s in production. During development, unit testing, integration testing and even initial end-to-end system testing you’ll likely be building small databases with only representative data of a much lower volume and so there are no performance concerns to worry about – only ones of correctness.

Utilising Debug Specific Code

The most common constraint I’ve heard of causing problems is Foreign Keys, but that’s not just restricted to performance issues. You can also implement complex validation logic in Check constraints and Triggers. It may seem that implementing such logic at the SQL level is far too inefficient or wasteful, especially if you’re going to be driving it with server code that has decent test coverage. And if you’re restricting yourself to using stored procedures for your public interface you may feel you have very tight control over what happens. However your SQL unit tests could have holes that might only show up during integration testing when another team uses your API in ways you never anticipated.

Using self-contained constraints or triggers is one technique, but there is also the opportunity to add others kinds of debug logic to your stored procedures such as by using Asserts. For example you may specify that you cannot pass null as the input to an internal[+] stored procedure, and so to verify the contract you could add an assert at the start of the procedure:-

CREATE PROCEDURE FindCustomerByID (@CustomerID int)
AS
    AssertIntegerIsNotNull @CustomerID,
                          
“Customer ID”,    
                           OBJECT_NAME(@@PROCID)
    . . .
GO

Many SQL error messages can be incredibly terse which makes debugging issues even trickier; especially when you’ve created the mother of all queries. The use of additional debug code like this can allow you verify more state up front and therefore generate far more developer-friendly error messages. I know that cursors are often frowned upon (again for performance reasons) but they can be used effectively to iterate rowsets and provide finer grained debug messages.

Isolating Debug Specific Code

So how do you control what goes into a Debug or Release build? If you’re used to defining your database schemas and objects by using visual tools like SQL Server Enterprise Manager this isn’t going to fly. You need to be able to build your database from scratch[^] using scripts stored in a VCS, just like you would your client & server code; ideally building a database should not be seen as any different from building your assemblies. See my previous post “xUnit Style Database Unit Testing” for another compelling reason why you might want to work this way.

This means you probably have scripts with filenames like “Order_FK_Customer.sql” in, say, a “Foreign Keys” folder. One option would be to name your files with an additional suffix that distinguishes build type, e.g. “Order_FK_Customer.Debug.sql”. Alternatively you could create additional subfolders called “Debug” & “Release” with the build specific scripts. This implies that there are no dependency issues, which is solved by splitting out your constraints from your table creation scripts and using the ALTER TABLE construct.

Applying this technique where you just want to inject a little extra code into triggers and sprocs is not going to be maintainable if you have to have two copies of it, so an alternative would be to add a guard around the build specific code and use a function to control the flow of logic, e.g.

CREATE PROCEDURE FindCustomerByID (@CustomerID int)
AS
    IF (IsDebugBuild() = 1)
    BEGIN
        AssertIntegerIsNotNull @CustomerID,
                               “Customer ID”,
                               OBJECT_NAME(@@PROCID)
    END
    . . .
GO

The implementation of the function IsDebugBuild() can be chosen at build time by applying either “IsDebugBuild.Debug.sql” or “IsDebugBuild.Release.sql”. I don’t know what the overhead of executing a simple function like IsDebugBuild() would be – it’s something I’ve yet to investigate but I would hope it is dwarfed by the cost of whatever queries you end up executing. Another alternative would be to use a pre-processor such as a C compiler to strip code out, but then you lose the ability to write and test your code easily in a tool like SQL Server Management Studio which I would personally consider far more valuable[#].

Use Schemas as Namespaces

To help separate and highlight debug or test functions you can use a separate schema, such as ‘debug’, e.g. you would invoke debug.AssertIntegerIsNotNull. If you prefix your scripts with the schema name (I think SSMS does this when exporting) you can use this to your advantage when building your database. For example, when we build a database for our Continuous Integration process that’s going to the run unit tests we’ll include all files in the ‘test’ schema, but when we’re building an integration testing or production database we leave them out.

Verifying a Release Build with Unit Tests

Earlier I said that it should be possible to remove all the debug code because it must be side-effect free. How can you verify that someone hasn’t inadvertently written code that violates this rule? Well, ideally, you should be able to at least run the unit tests. If you have good code coverage then that should give you a high degree of confidence. The problem is that your unit tests may need to mess about with the schema to get the job done, e.g. dropping an over zealous delete trigger to allow you to cleanup after each test is run. However there is nothing to stop you rebuilding the database again afterwards - after all that’s the point of automating all this stuff.

Caveat Emptor

Most of this stuff seems like common sense to me, but I’m a C++/C# developer by trade and it’s always possible that none/some/all this advice may go against a number of best practices or not scale well as the system grows. My current (greenfield) project has not reached the stage yet where we have performance concerns that require us to evaluate the use of these techniques, but until that moment comes we’ll be utilising every one of these techniques to ensure we continue to develop (and more importantly refactor) with speed.

 

[*] Let’s leave out the debate about whether you should or shouldn’t leave your debug code in at release time and assume that we want/need to take it out.

[+] It seems a common practice to implement behemoth stored procedures that take gazillions of arguments and perform a dizzying array of similar but different requirements instead of writing a number of smaller, more focused procedures that use internal ‘helper’ functions and procedures instead.

[^] Yes, you may choose to deploy to production by patching your existing database, but once again the development and unit testing focus is better served by removing any residual effects.

[#] When I’m developing a stored procedure I like to have two query windows open, one with the stored procedure code and another with the unit tests. I can then edit the stored proc, apply it to my unit test database and then switch windows and run the unit tests to verify my changes. This makes for a nice fast feedback cycle. The unit test runner is the perfect example of debug specific code.

Wednesday, 19 May 2010

The Dying Art of RTFM

The classic stereotype of men is that they don’t read manuals and wont ask for directions when lost. I’m pleased to say that I exhibit neither of these two tendencies – I’ll take all the help I can thank you very much. In fact I find it incredibly frustrating when I see other developers (and let’s face it, rightly or wrongly, they’re mostly men) thrashing around trying to fix a bug or get some code to compile and they’re ignoring the obvious sources of information – the manual (e.g. books/local MSDN) and Google/Bing/Whatever.

Here’s a recent example. A colleague was trying to work out why at runtime a particular C# method in a system class was throwing an exception that neither of us had encountered before but which implied that the behaviour wasn’t implemented. After doing a quick rebuild to make sure the obvious was taken care of I then put the caret on the method name and hit F1. The MSDN help page for the method clearly stated that this method was not supported in the scenario it was being used. Problem solved. Total time time to resolve it: about 3 minutes.

Obviously not all cases are as simple as this, but there have been so many times that fellow developers have done something and I’ve done a quick Google and pointed out a page that sits in the top 3 of the results that provides the same solution. OK, so maybe I don’t get there first time because I don’t know the optimum search terms, but a quick read of a related page or two (especially if it’s a Stack Overflow page) and I can usually concoct the required magic incantation to find the relevant answer. Of course we’re a little bit lazy sometimes and find it easier to ask the person next to us, I’ve done that too. But that doesn’t excuse not taking the most basic steps to solve your own problems.

I believe ‘Google-ability’ is one of the most important skills a developer can have. This sounds like a classic Grumpy Old Man but I didn’t have the Internet as resource when I started out programming. Fortunately the company I worked for at the time soon gained access to the MSDN on CD and had subscriptions to MSJ, DDJ etc. and access to other online services like CiX. Mining the MSDN quickly became a very useful skill as it not only contained manuals, but white papers and previous articles from MSJ that may not have been directly relevant but did point you in new directions. Naturally this skill directly transferred to searching via Alta-Vista/Lycos/etc. (remember them…) and onto this day with Google.

But how do you discover a candidate’s ‘Google-ability’ skill at interview time? I usually ask interviewees about what they’ve read as I find it’s a good indication of how passionate they are about the job and whether they have that desire for learning. The same goes for blogs and and major online resources like Dr Dobbs. Not being able to name any is an ‘interview smell’ but I’ve yet to come across someone who’s a good programmer that doesn’t read around the subject.

Books are a funny thing these days. I’ve probably bought more in the last couple of years than in the decade before. AbeBooks and Amazon’s second hand marketplace means I can easily pick up a copy of a book that I’m only vaguely interested in for just a few quid – and that’s including shipping it across the Atlantic! I really can’t get on with e-books and reading large volumes of text on screen; blogs are about the limit of my screen based reading. But then I’m a self confessed toilet reader – my technical library is in the downstairs toilet – much to my wife’s chagrin. It’s like vinyl records; there is a ceremony involved with selecting a book from the bookshelf and flicking through the pages, just like removing the vinyl from the sleeve and carefully depositing it on the turntable.

Perhaps this is where that investment in moving the Unconscious Incompetence to mere Conscious Incompetence that I blogged about recently pays off. Is my ability to easily find stuff in the MSDN and via Google just because I have lots of vague terms and concepts floating around in my noggin so that I am more aware of what information is actually out there – whether that be in book form, magazine or on the web? Or is it just that I’m not a real bloke? Should I in fact be feeling inferior because I don’t ignore assistance and just slog on relentlessly instead…

Tuesday, 11 May 2010

Refactoring – Do You Tell Your Boss?

[This is one of the posts that I’ve thought long and hard about whether I should publish. I’ve edited it a few of times to try and ensure that it comes across in the spirit that was intended – to illustrate that it’s a complex issue that often fosters an “Us and Them” style attitude which often leads to resentment on the part of the programmer. Stack Overflow once again comes to my rescue with two beautifully titles questions:- “How do you justify Refactoring work to your penny-pinching boss?” and “How can I convince skeptical management and colleagues to allow refactoring of awful code?”]

I want to be as transparent as possible about my work;  if I’m doing something I want that to happen with my boss’ approval. I hope that they trust me to be do the right thing, but will also challenge me to ensure that I’m still being as effective as possible. Although I like to think of myself as being pretty pragmatic, I’m only human and so can get sidetracked or incorrectly prioritise tasks like anyone else.

When it comes to refactoring I’ve found this to be a double-edged sword. Those managers who have a technical background and/or a modern outlook on software development will probably already be aware of concepts like Technical Debt and be willing to allocate time for it in the schedule. But there are also those managers who are are less technical, often described as ‘more aligned with the business’, and for them it’s a much harder sell. To a less technical manager refactoring either sounds like ‘rewriting’ or ‘gold-plating’ and may be dismissed as unnecessary because you are essentially taking something that works and making it continue to work - but in a different way. Given a choice between ‘New Feature X’ and a ‘Better Implemented Feature Y’ they’ll always pick the former - that’s only natural. I’m sure if we software developers were salesman (much like the business people we develop for) we’d have no trouble getting it accepted as a necessary practice, but most of us aren’t and that means we often resort to one of the following choices:-

Don’t do it at all

This isn’t really a choice. If your codebase is already getting you down, doing nothing with only cause you more pain if you care. In the end it’ll win and you’ll be forced to leave and find another job. Hopefully with a more understanding team.

Do it in Your Own Time

Using your own time to improve the codebase might be viable if it’s quite small and could lead to you showing your boss what difference it makes in the long run to adding new features. But even then they may still not ‘get it’. In a larger team and with a larger project this is just an arms race – can you clean up faster than your colleagues can make more mess? If you can’t then you’ll just resent the fact that it’s your own time you’re wasting and in the end be forced to leave and find another job. Hopefully with a more understanding team.

Do it By Stealth

That just leaves doing it by stealth as part of your normal working day. But that means inflating your timescales and lying about what you’re actually doing. This may have the desired effect on the codebase but won’t make you feel at ease inside. You’ll also have to be careful what you email and say to avoid embarrassing questions. Without your teammates behind you, you can only survive like this for so long, eventually it’ll wear you down as you’ll never get the time to tackle the really big issues. In the end you’ll be forced to leave and find another job. Hopefully with…

I’ve been fortunate enough to mostly work for managers that have trusted me to do the right thing; but I’ve also had occasions where I’ve had to hide some refactoring to do what I felt was fundamentally important and always getting sidelined. Not unsurprisingly this seems to be more symptomatic in Big Corporations, at least according to the developers I’ve spoken to about this, but that’s not exactly a statistically valid sample.

One person I asked at the recent ACCU Conference was Roy Osherove because he was speaking about team leading; and he suggested that refactoring is a natural part of the development process so why do I consider it such an issue? In principle I agree with this sentiment, but it I think it misses a key point which is the effect it can have on the project schedule - which is something I do not own. When implementing a feature there is often a quicker way that may incur Technical Debt, and a slower way that avoids any new debt or maybe even allow it to be reduced, plus other variations in between. This implies a sliding timescale for the answer to “How long will it take?” and so that is not something I can always answer with a single figure. I can advise and provide options, but ultimately the choice must surely lie with the customer. Failing that (perhaps the customer isn’t in the right position to make such a decision or doesn’t care) it must be my boss or his boss etc. as they are the ones juggling the budget, schedule and resources balls.

So what is a poor programmer to do? Those people who I feel should make a judgement call on when the opportunity is ripe for refactoring (instead of always picking the ‘crowd pleasing’ items) aren’t inclined to do so; and although I’m probably empowered to do so I feel duty bound to give them first refusal on how I go about my job to ensure I’m acting responsibly and making pragmatic choices to the benefit of the customer.

Saturday, 8 May 2010

My ACCU Conference 2010 Lightning Talk

At this years ACCU 2010 Conference I decided to be brave and enter in for a Lightning Talk. These are 5 minute presentations on pretty much anything you like – although it helps to have some relevance to the hosting event :-) If you so desire you can provide a few background slides, but with only 5 minutes of talking time you probably won’t need many.

The premise of my talk was a homage to the BBC 2 TV comedy show Room 101. Each week a different celebrity tries to convince the host (Paul Merton) that some things they dislike (physical, emotional or whatever) are truly appalling and should be banished from this Earth. Think “Crazy Frog” and you should get the general gist. For my talk I picked three of Microsoft’s products that have somehow managed to attain immortal status in the development world when they should have been retired years ago. The fact that they are all products from Microsoft should suggest nothing about my opinions of Microsoft per se (after all I work solely on their platform through my own choice) but more about the mentality of the companies that have managed to tie themselves so inextricably to the products in question. Funnily enough Udi Dahan sort of covered similar ground himself a few weeks ago – though in a far more intelligent fashion of course.

Given that the slides are going to be up on the ACCU web site I decided that it might be worth adding a little prose to go alongside them and to try and diffuse any flames that might arise from my last slide – should the comment about jQuery be taken at face value. So, if you want to play along at home you can pull up the slides on the ACCU website here, or you can just imagine these bullet points are some slick PowerPoint presentation. Obviously reading the text below won’t have anywhere near the same impact as me standing in front of an audience because what little intonation and comedy timing there was will be completely lost :-) I’m not sure exactly what I said but it went something like this…

Visual SourceSafe

  • 10 Analyze.exe –F, 20 GOTO 10
  • More chatty than Alan Carr
  • Working Folder randomiser

My first choice is the proverbial “Punch Bag” of Version Controls Systems. If you Google “Visual SourceSafe” you’ll quickly discover that the most common complaint is about how it has the ability to corrupt your source code database on a regular occurrence. One of the first automated tasks you learn to create is one to run analyze.exe to find and fix any errors - that assumes of course you can persuade Visual Studio to leave the repository alone long enough to actually run the tool.

One sure-fire way to corrupt your database is to use VSS over a VPN. If you don’t manage to kill your network with the excessive SMB traffic caused by the file-system based architecture you’ll eventually be greeted with the legendry “Invalid DOS handle” error message. At this point you’ll be executing the “20 GOTO 10” part of this slide as your entire team “downs tools” once more.

But not all weirdness is the result of repository corruption, sometimes tools like Visual Studio will try and “help you out” by explicitly setting the “Working Folder” for a project which you later moved to another more suitable location. Now your code lives in one part of the hierarchy in VSS but in the working copy it’s back where it started! So now you start deleting all those .scc files along with the usual .ncb and .suo files in hope that you’ll remove whatever it is that seems to link the project with its past life.

Visual C++ 6

  • Visual Studio 1998 (10 is the new 6)
  • #if _MSC_VER < 1300
  • It’s Visual C++

My second choice is Microsoft’s flagship C++ compiler. As you can see this was also know as Visual Studio 1998. Yes, that’s a year from another millennia!  So, how many of you bought a copy of Andrei Alexandrescu’s seminal book “Modern C++ Design”? And how many of you put it straight back on the shelf when you discovered that you couldn’t even compile the first example because your compilers’ template support was so broken? This compiler was released before the current standard was finalised and it’s interesting to note that some Microsoft bloggers used the phrase “10 is the new 6” when writing about the upcoming Visual Studio 2010. This was a reference to the supposed performance enhancements going into making VS2010 as nippy as VC6, but it seems more amusing to point out that they are going to make the same mistake again and release a major C++ compiler version just before the next C++ standard is finalised.

Many developers think that Boost is incredibly complex because it uses lots of clever Template Meta-Programming. It’s not. It’s because every class has to be implemented twice. One inside an “#if _MSC_VER < 1300” guard for Visual C++ 6 users and another for every other compiler to use.

It’s interesting to note that many recruitment agencies ask for Visual C++ experience, instead of just plain C++, as if somehow it’s a different language? Mind you, given the number of supposedly experienced C++ developers I’ve interviewed in recent years that still believe the default container type is CArray and not std::vector it makes me wonder if the recruitment consultants aren’t actually right on this one.

Internet Explorer 6

  • The Corporate Standard
  • MDI is so 90’s
  • jQuery is fanning the flames

My final choice is Internet Explorer 6. Despite what the cool kids think, this is still the standard web browser in many large corporations. Putting popup iframes on a web site that tells me I’m using a lame browser doesn’t help me fell any better either – if I could do something about it I would. In one organisation you get a virus warning if you try to install a more modern browser like Firefox.

I’ve come to believe that Tabbed Browsing is a right and not a privilege. How I am supposed to adhere to the company’s clean desk policy when IE litters my desktop with Windows and my taskbar with little ‘e’ icons?

I partially lay the blame for the continued existence of IE6 at the feet of jQuery. It used to be hard to support because you needed to know how to code around all its little quirks and foibles. jQuery now makes it easy to support again. Please stop doing that! We want it to be hard so that we have a reason to get off this merry-go-round.

Epilogue

Sadly all three products look like they’re going to be with us for some time yet. So, would anyone like to wager a bet on which we’ll see first - the death of one of these atrocities or the first Liberal Democrat government? [*]

 

[*] Apparently politics was out of bounds, but I felt this incredibly cheap shot was far too good to pass up :-)