Friday, 7 December 2012

To Wait Or Not - Turning Deadlock to Livelock

On the system I’m currently working on we have always seen the occasional failure caused by the database going to sleep for longer than the timeout on a particular write. When I say occasional we were seeing this once every few months, maybe. There is some polling of the table, but effectively there should never be any overlap between the query used to retrieve data and one used to update it later[1]. This eventually started to become a little more frequent and so I decided to add a little extra code to make the specific write query a little more resilient by allowing the 30 second default timeout to be more more configurable. The update is very small and so the actual transaction, when given a chance to do its work is tiny, so the extended timeout didn’t feel too dangerous. I intended to extend it a little bit at a time to ensure I didn’t mask any other issues by accident.

As is always the way I never quite got the time to finish plumbing the changes in in one hit; despite their relative simplicity.

When I finally came back to it I noticed a colleague had decided to take a different route - the more classic back-off and retry. This made me feel a little uneasy because I felt this choice was more risky. The first, and probably least important reason was that the code was now more complex. Alright, so it wasn’t reams and reams of code, but it was far more than just the one line required to bump the timeout. Yes, it could have be simplified by factoring the retry logic using Execute Around Method. But, what concerned me more was the impact on SQL Server of the constant retries from the numerous calling processes there are...

Everyone knows what deadlock is because it’s a classic interview question. But many people I’ve interviewed are far less familiar with its alter ego - livelock. Where with deadlock nobody makes progress but everyone is standing around doing nothing, with livelock the opposite is happening, everyone is so busy trying to avoid one another they still make no progress. The canonical example I know is two people walking towards each other in a corridor but don’t manage to pass because they both keep sidestepping to avoid the path of the other. Change the two people to two processes which are both backing-off and retrying and you can hopefully see where I’m going.

If SQL Server was blocking the update whilst some other lengthy read was occurring[2] it felt better to me to wait for longer on the basis that SQL Server will require no more resources to achieve that. In contrast, backing off and retrying means that the client will timeout, reconnect, and re-execute the query which means a plan has to be re-generated, etc. Even with connection pools and query plan caching it must still add some overhead and therefore would be more costly than waiting longer. If you have many clients doing this you will start to approach the effects you see with livelock.

I wrote about similar effects to this in “Service Providers Are Interested In Your Timeouts Too” where the client of a web service would timeout and retry but with the service still suck executing the previous query it just created more contention. I’m sure SQL Server has been around more than long enough to deal with this admirably, but I’m sure the constant retries don’t come for free.

In this particular case I’m sure the level of contention was easily dealt with either way, but when you have a performance problem that you cannot definitely pin down it’s better to inch your way forward than to change too many things in one go.

 

[1] Use of the READPAST hint helps reduce the readers blocking each other.

[2] We had no obvious candidates for why this might be happening at first. As time went on and the table size grew into many millions of rows there was some correlation with a few support queries that were being run occasionally but nothing really concrete. If it had been a major problem we probably would have stuck the SQL profiler on and tried to force the issue somehow. Intuition is never a good substitute for hard data but sometimes it’s all you have to go on.

Thursday, 6 December 2012

Don’t Overload Stored Procedures Using Nullable Parameters

One common technique for minimising change within the database layer is to “overload” a stored procedure’s functionality by using null-defaulted parameters. For example, say you want to provide something in your public interface to set the status of a Task (which might be a table where you store the details of “tasks”). You know that you need to mark a task as “successful” or a “failure”, but then you get twitchy as you wonder what other states you might need, and what additional attributes you might need to store in the future about a competed task. The uncertainty could cause you to decide to create a “generic” stored procedure like this:-

create procedure dbo.SetTaskStatus
(
  @taskId     dbo.TaskId_t,
  @taskStatus dbo.TaskStatus_t
)
as
  . . .
go

By creating a nice “wide” API you can bend and mould it to all manner of new behaviour without disrupting your clients. But at what cost? You have effectively created a very thin veneer over the underlying tables and as a result you will end up with very little of an abstraction at all. The “generic” name and behaviour hardly conveys any meaning and so you need to understand the calling contexts to understand how it should be used.

The first change is almost certainly going to be to store some kind of error message, so you’ll end up adding 1 extra parameter and giving it a null value by default:-

create procedure dbo.SetTaskStatus
(
  @taskId        dbo.TaskId_t,
  @taskStatus    dbo.TaskStatus_t,
  @errroMesssage dbo.TextMessage_t = null
)
as
  . . .
go

But that new parameter only applies when @taskStatus = FAILED. What are you going to do when someone provides an error message on success? Or what about no error message on a failure? The logic in your procedure has just gone from a simple INSERT to something more complex - assuming that you are going to validate your inputs. If you don’t[1] then it could be head-scratching time for the poor developer that calls your API incorrectly by accident. As the number of arguments grows so does the permutations and yet only a few permutations may actually be valid.

I think it’s far more sensible to be explicit up front, just like you would in your back-end code. So I would have two procedures - one for success and another for failure:-

create procedure dbo.SetTaskToSucceeded
(
  @taskId dbo.TaskId_t
)
as
  . . .
go

create procedure dbo.SetTaskToFailed
(
  @taskId        dbo.TaskId_t,
  @errroMesssage dbo.TextMessage_t = null
)
as
  . . .
go

By creating two separate procedures we have removed one parameter and made it clearer exactly when an error message is expected. Any extra state you need can be added to one, the other or both, and it will still be obvious when you need it. It also means it should be easier to find what callers need updating because you can GREP for the specific “overload” rather than the general one.

In some cases you may be worried about duplicating functionality in both procedures. In which case you can always keep the two separate procedures as the publicly exposed API and internally just delegate to a common internal implementation:-

create procedure pub.SetTaskToSucceeded(. . .)
as
  declare @success pub.TaskStatus_t = 1;

  exec impl.SetTaskStatus @taskId, @success, . . .
go

create procedure pub.SetTaskToFailed(. . .)
as
  declare @failure pub.TaskStatus_t = 2;

  exec impl.SetTaskStatus @taskId, @failure, . . .
go

Don’t be afraid to refactor either. If you started with the overloaded case, such as with legacy code, and you find it spiralling out of control, then break it up and expose a new, more focused API. You can initially implement the new API in terms of the old, build up your set of tests and then switchover the clients. Once you have regained control of the API you then have the ability to refactor away the spaghetti with the barrage of tests to back you up.

 

[1] Performance may dictate that you can’t. Or perhaps you have such a shored-up Data Access Layer with excellent integration test coverage and tight control over all access to the database.

Wednesday, 5 December 2012

Interfaces Are Not the Only Abstract Fruit

A while back I came across some code that I personally thought was over-engineered[1] and subsequently transferred my ire via Twitter. I was going to leave it at that and slip away unnoticed but @willhains didn’t think I should be allowed to sound off without justifying my position. So, this is my attempt to keep what little credibility I already have intact. This is the third attempt at writing this blog post which has always been a “smell” that I was probably wrong, but writing “Primitive Domain Types” along with a Twitter exchange with @jasongorman this weekend convinces me that there is still some merit in my argument. What I said was this:-

“Designing for the future doesn’t just mean sticking interfaces everywhere in your code. It means considering the cost of change.”

With hindsight it probably should have been kept to just the first sentence, but even 140 characters is still far too much rope for some people it seems. I have addressed the latter part to some degree with my recent post “Putting the Cost of Change Into Perspective” but I’ll try and elucidate a little more specifically later (if you’re still awake by then).

IDo Not IAm

One habit I’ve noticed (and been guilty of myself) is that when creating a new class there is a knee-jerk reaction to instantly create an interface for it, and essentially with the same name as the class. The reason for this is presumably because you suspect you’re going to need to mock it in some tests, either now or the very near future. That statement alone should be enough to convince you that TDD will probably tell you whether that’s true or not. So, in practice what you end up coming across is designs like this:-

interface IThing
{ . . . }

public class Thing : IThing
{ . . . }

Take a step back. Does the String class implement IString? Of course not, it implements various behaviours defined in interfaces that are usually common to many types and are therefore non-type-specific in name[2]. Of course none of this is news and in fact has been written about by far more intelligent people than me a long time ago - namely in “Mock Roles, Not Objects”.

This particular practice does make some sense when you’re implementing a service[3], although even then I would be mindful because a File usually doesn’t implement IFile - it is more likely to implement separate Reader and Writer interfaces[4]. No, it’s value and simple entity types where this practice feels wrong by default.

Although I’ve never experienced the pre-generics era of C#, I have lived through the “generic” containers in C and C++ (pre-template era) and even contributed something nastily similar myself[5] in the distant past. The kind of design this leads to is interfaces with names like IKey, IValue, IIdentity, etc. that effectively have no salient attributes. This may be because you don’t have a second use case from which draw any commonality, or you feel the bare interface alone satisfy the needs for static type safety and that’s enough. The problem I have is that this makes the design hard to understand because you cannot see the true relationships between the types - an IDictionary<IKey, IValue> doesn’t say much[6].

This was exactly the scenario I found myself describing when writing about “Primitive Domain Types” last week. I left the advice as just this:-

“There might be a temptation to factor out an ISurrogateKey interface from SurrogateKey ... I suggest you resist it on the grounds that these are simple value types”

The Cost of Change

One reason to use interfaces in the first place is exactly to isolate code from change and is embodied in the Open/Closed Principle. As long as all your new types can satisfy the existing interface contract you can in theory just slot them in. But to do this the interface must be designed with this kind of extensibility in mind and that often involves gazing into a crystal ball. Get it wrong and you save yourself very little, if anything.

When it comes to published interfaces you’ve got no choice but to think really hard, but for internal interfaces does it really matter? If you’ve no idea what the future holds and you’ve no obvious need to mock the type, why bother creating an interface at all? Maybe the type is just a simple Abstract Data Type or perhaps the interface is really just a single callback function?

Programmers brought up on a diet of C# and Java would do well to digest the following tweet from John Carmack:-

“Sometimes, the elegant implementation is just a function. Not a method. Not a class. Not a framework. Just a function.”

 

[1] Trying to remain faithful to The Simplest Thing That Could Possibly Work is hard, very hard. It’s even harder for older software engineers because we were brought up in a time when code was less malleable and “fixing” what wasn’t actually “broken” was heresy.

[2] The classic OO naming scheme is nouns for classes and verbs for interfaces. The example that always comes to my mind is a class called Thread but a behaviour called Runnable. The cheaper alternative to thinking seems to be to just stick the suffix “Service” onto both the class and interface.

[3] What a wonderfully overloaded term “service” is. In one sense you can be talking about the “services” a class provides through its interfaces and in another you might be referring to the “service” (or daemon) that is the process hosting the class.

[4] An aggregate interface like IFile (i.e. interface IFile : IReader, IWriter) can be useful, but should ideally be restricted to interfacing with legacy code. Also at system-test level you probably want to be able to read real data but redirect your writing to a test-specific location. I touched on the configuration side of this in “Testing Drives the Need for Flexible Configuration”.

[5] In C# this is realised with collections of type Object. In C and C++ it means holding values as void* pointers, or using structs in C that have a memory layout that simulates inheritance.

[6] The lack of typedef’s in C# is one my biggest bugbears coming from C++, at least as far as making code using Generics more readable.

Monday, 3 December 2012

Value Type or Reference Type? An Acid Test

Life was simple under C+++ - everything is a value type. As someone who grew up via Assembler and then C on their way to C++, the notion of pointers always made perfect sense, after all it’s just code and memory. When I moved to C# a few years ago I had to learn all about the differences between Reference Types and Value Types. Of course references are pretty close to pointers, but types like String that are implemented as a reference type, but masquerade as a value type seemed confusing at first. That’s probably why it’s such a common interview question. Closely related, though probably not obviously at first, is the subject of immutability. Once again, in C++ you have “const” which allows you to make instances read-only, but in C# it has to be at the type level.

So, the String type is implemented as a “reference type”, but has “value types” semantics. So far so easy, but now let’s move on to the classic Customer type so beloved by sample code writers everywhere:-

public class Customer
{
  public Identity Id { get; }
  public string Name { get; }
  public Address Address { get; }
}

Now imagine you want to write a unit test that checks that the type can be serialized correctly because you’re going to pass it over a WCF channel. How do you check to make sure that an instance of the type will been rehydrated correctly? One answer might be to see if the original and deserialised objects “are equal”:-

[Test]
public void CustomerCanBeSerialised()
{
  var input = new Customer(. . .);
  var output = deserialise(serialise(input));

  Assert.That(output, Is.EqualTo(input));
}

To makes this work means implementing Equals() on the Customer type. And the best practices state that you if you implement Equals then you almost certainly should be looking to override GetHashCode() too.

Hmmm, what a hole we’re digging for ourselves here. It seems logical to use an equality comparison here because we might also have some non-public state that we also need to verify has been serialised correctly. But what’s that got to do with comparing two arbitrary instances of the same type to see if they are equal? Surely in that case only the observable state and behaviour matters?

This is where parts of the codebase I’m working on at the moment have ended up. What I suspect has made this situation worse is the apparent ease with which these member functions can be cranked out with a tool like ReSharper[1]. Some blame might also be laid at the door of Unit Testing because the tests were written with good, honest intentions and it’s easy to be swept along by the desire to do The Right Thing.

Ultimately the question I seem to be asking myself when creating a new type is not “is this a value or reference type?”, but “do I need to implement Equals()?”. And yet when googling around the subject the two terms seem inexplicably linked and often create a circular argument:-

“Implement Equals() if you want your reference type to have value type semantics” vs “if your reference type implements Equals() then its like a value type”

How about another use case - detecting changes to an object. A common pattern for handling updates to objects involves code like this:-

string newValue = m_control.GetValue();
if (newValue != m_oldValue)
{
  // value changed do something…
}

Here we are again using equality as a means to detect change. Of course String is a simple value type, but why shouldn’t I use the same mechanism to decide if it’s worth sending my Customer object over the network to update the database?

var newCustomer = GetCustomerDetails();

if (newCustomer != m_oldCustomer)
  m_database.SaveChanges(newCustomer);

Ah, you say, but Customer is an entity, not a value. Thanks, as if getting your head around this whole idea isn’t hard enough someone’s just added another term to try and throw you off the scent. Using the the logic that an entity is not a value implies it’s a reference type then, no? Oh, so value vs entity is not the same as value vs reference - no wonder this programming lark is so hard!

So, my first rule of thumb seems to be to never implement Equals() or any sort of comparator by default. This makes perfect sense as there is no “natural ordering” for most custom types. Implementing Equals() as a means to another end (e.g. testing) is fraught with danger and will only confound and confuse the future generations of maintenance programmers. Plus, once you’ve used it, your one chance to override Equals() its gone for good.

But what of my original question - whether a type should be implemented as (or like) a value type or a reference type? The default answer must surely be “reference type”. Which is probably of no surprise to anybody. And what of the aforementioned acid test? What I have learnt is that pretty much the only time I probably should ever have actually implemented Equals()/GetHasCode() is because I wanted to use the type as a key in a HashMap, Set, Dictionary, etc. And in cases where I’ve felt inclined to store a Set of <T>, and therefore scratched my head over the lack of a comparator, I’ve realised that I should instead be creating a Dictionary<Key, T>, or to apply the Customer example, it would be Dictionary<Identity, Customer>. Who would have thought the choice of container could have such an effect on the way a type might be implemented?

This nicely allows me to extend my previous post about “Primitive Domain Types”. I suggested that all the “Identity” type needed was a constructor, but of course it was also missing the comparison support and that’s another burden when defining the type. There are quit a few unit tests that need to be written if you want to cover all the cases that the contract for overriding Equals() spells out. Sticking with the underlying types is looking so much more attractive than cranking out your own value types!

At the beginning I may have suggested that with C++ it’s all a bed of roses. Of course it’s not. The equality model does seem to be much simpler though, but that’s probably just 15 years of C++ experience talking. Although you can compare pointers as a form of reference equality you don’t have pointers masquerading as values. Just imagine what would happen if std::shared_ptr<> provided an == operator that compared the de-referenced values instead of the actual pointers. Hey, actually that could work, couldn’t it?

 

[1] This is clearly not ReSharper’s fault but it does illustrate how tooling (or lack thereof) can affect the implementation.