Monday 3 December 2012

Value Type or Reference Type? An Acid Test

Life was simple under C+++ - everything is a value type. As someone who grew up via Assembler and then C on their way to C++, the notion of pointers always made perfect sense, after all it’s just code and memory. When I moved to C# a few years ago I had to learn all about the differences between Reference Types and Value Types. Of course references are pretty close to pointers, but types like String that are implemented as a reference type, but masquerade as a value type seemed confusing at first. That’s probably why it’s such a common interview question. Closely related, though probably not obviously at first, is the subject of immutability. Once again, in C++ you have “const” which allows you to make instances read-only, but in C# it has to be at the type level.

So, the String type is implemented as a “reference type”, but has “value types” semantics. So far so easy, but now let’s move on to the classic Customer type so beloved by sample code writers everywhere:-

public class Customer
{
  public Identity Id { get; }
  public string Name { get; }
  public Address Address { get; }
}

Now imagine you want to write a unit test that checks that the type can be serialized correctly because you’re going to pass it over a WCF channel. How do you check to make sure that an instance of the type will been rehydrated correctly? One answer might be to see if the original and deserialised objects “are equal”:-

[Test]
public void CustomerCanBeSerialised()
{
  var input = new Customer(. . .);
  var output = deserialise(serialise(input));

  Assert.That(output, Is.EqualTo(input));
}

To makes this work means implementing Equals() on the Customer type. And the best practices state that you if you implement Equals then you almost certainly should be looking to override GetHashCode() too.

Hmmm, what a hole we’re digging for ourselves here. It seems logical to use an equality comparison here because we might also have some non-public state that we also need to verify has been serialised correctly. But what’s that got to do with comparing two arbitrary instances of the same type to see if they are equal? Surely in that case only the observable state and behaviour matters?

This is where parts of the codebase I’m working on at the moment have ended up. What I suspect has made this situation worse is the apparent ease with which these member functions can be cranked out with a tool like ReSharper[1]. Some blame might also be laid at the door of Unit Testing because the tests were written with good, honest intentions and it’s easy to be swept along by the desire to do The Right Thing.

Ultimately the question I seem to be asking myself when creating a new type is not “is this a value or reference type?”, but “do I need to implement Equals()?”. And yet when googling around the subject the two terms seem inexplicably linked and often create a circular argument:-

“Implement Equals() if you want your reference type to have value type semantics” vs “if your reference type implements Equals() then its like a value type”

How about another use case - detecting changes to an object. A common pattern for handling updates to objects involves code like this:-

string newValue = m_control.GetValue();
if (newValue != m_oldValue)
{
  // value changed do something…
}

Here we are again using equality as a means to detect change. Of course String is a simple value type, but why shouldn’t I use the same mechanism to decide if it’s worth sending my Customer object over the network to update the database?

var newCustomer = GetCustomerDetails();

if (newCustomer != m_oldCustomer)
  m_database.SaveChanges(newCustomer);

Ah, you say, but Customer is an entity, not a value. Thanks, as if getting your head around this whole idea isn’t hard enough someone’s just added another term to try and throw you off the scent. Using the the logic that an entity is not a value implies it’s a reference type then, no? Oh, so value vs entity is not the same as value vs reference - no wonder this programming lark is so hard!

So, my first rule of thumb seems to be to never implement Equals() or any sort of comparator by default. This makes perfect sense as there is no “natural ordering” for most custom types. Implementing Equals() as a means to another end (e.g. testing) is fraught with danger and will only confound and confuse the future generations of maintenance programmers. Plus, once you’ve used it, your one chance to override Equals() its gone for good.

But what of my original question - whether a type should be implemented as (or like) a value type or a reference type? The default answer must surely be “reference type”. Which is probably of no surprise to anybody. And what of the aforementioned acid test? What I have learnt is that pretty much the only time I probably should ever have actually implemented Equals()/GetHasCode() is because I wanted to use the type as a key in a HashMap, Set, Dictionary, etc. And in cases where I’ve felt inclined to store a Set of <T>, and therefore scratched my head over the lack of a comparator, I’ve realised that I should instead be creating a Dictionary<Key, T>, or to apply the Customer example, it would be Dictionary<Identity, Customer>. Who would have thought the choice of container could have such an effect on the way a type might be implemented?

This nicely allows me to extend my previous post about “Primitive Domain Types”. I suggested that all the “Identity” type needed was a constructor, but of course it was also missing the comparison support and that’s another burden when defining the type. There are quit a few unit tests that need to be written if you want to cover all the cases that the contract for overriding Equals() spells out. Sticking with the underlying types is looking so much more attractive than cranking out your own value types!

At the beginning I may have suggested that with C++ it’s all a bed of roses. Of course it’s not. The equality model does seem to be much simpler though, but that’s probably just 15 years of C++ experience talking. Although you can compare pointers as a form of reference equality you don’t have pointers masquerading as values. Just imagine what would happen if std::shared_ptr<> provided an == operator that compared the de-referenced values instead of the actual pointers. Hey, actually that could work, couldn’t it?

 

[1] This is clearly not ReSharper’s fault but it does illustrate how tooling (or lack thereof) can affect the implementation.

2 comments:

  1. @Anonymous: Thanks for the link but this only describes the technical differences of "pure" value types like ints, it does not address the "reference type with value semantics" types like a string.

    "Variables of value types directly contain their data, whereas variables of reference types store references to their data"

    A KeyValuePair<> fails the "value" test under this rule and yet it has value semantics if the pair of contained values has it, and yet they can be held as references, e.g. strings.

    ReplyDelete