Thursday 28 March 2013

Property vs Method - Ownership Semantics

I was recently doing a spot of refactoring and came across some code that used an object that was backed by a native resource in a buggy way. I can see how the situation came about but have not so far come across any description of a convention that suggests how it should be done. Perhaps it’s blindingly obvious to everyone else, but in case it’s not or my argument is flawed, this is my take on the matter...

The Bug

Imagine you have a C# managed type that is a wrapper around a native type which, because it can be very large and is native needs managing carefully in the C# world using the Dispose pattern. If you want something more concrete imagine you have a managed library that wraps a native library to allow you to load web pages and manipulate the Document Object Model (DOM).

public class DOM : IDisposable
{
  public static DOM LoadFile(string filename)
  {
    // Invoke native code to load file.
  }

  public void Dispose()
  {
    // Invoke native code to cleanup.
  }
}

So, given that this is a resource that generally needs careful management [1] the default stance will be to apply a Using block to ensure it’s correctly cleaned up at the end of use, whether that be through normal or exceptional circumstances:-

using (var dom = DOM.LoadFile(filename))
{
  // Manipulate the DOM to your hearts content…
}

Now, what about when you don’t create this resource directly but instead you acquire it from another object, perhaps via a property or a method. Absent any documentation or looking at the property/method implementation would you expect to take ownership or not? For the sake of the example code I’m going to suggest that the type from which we acquire this DOM is another C# managed class called WebPage [2].

var page = new WebPage(. . .);

// Access DOM from a property - taking ownership?
var dom = page.DOM;

// Access DOM from a method - taking ownership?
var dom = page.ToDOM();

The code I came across was written like this:-

using (var dom = page.DOM)
{
  // Manipulate the DOM and directly dispose it
}

But the implementation of the WebPage’s DOM property looked like this:-

public class WebPage
{
  private DOM m_dom = null;

  public DOM
  {
    get
    { 
      if (m_dom == null) 
        m_dom = CreateDom(); 

      return m_dom;
    }
  }
}

Naturally, being legacy code, the container class didn’t implement the Dispose pattern itself and so you could never have managed the resource properly even if you wanted to. But I’m in the middle of a refactoring and wandering what to do next. Plus, if possible, I want to lay down (or better yet promote an existing one) a convention about when ownership is likely to be transferred. And, when it isn’t what the containing class needs to do instead.

ToXxx() Method Passes Ownership

In the particular refactoring I was looking at the class was really acting like a Builder [2] and so I changed the property to a method and renamed it ToXxx(), where Xxx was the type. Essentially it was a type conversion, like ToString(), and that also meant ownership should be passed to the caller. This also meant I didn’t have to fix the code structure of the buggy caller; I only had to do the renaming.

Going back to my example above this leads to the following implementation and style of invocation:-

public class WebPage
{
  public DOM ToDom()
  {
    return CreateDom();
  }
}

// In the caller
using (var dom = page.DOM)
{
  // Manipulate the DOM and directly dispose it
}

Naturally this is the simplest implementation for both the container and caller.

Xxx Property Retains Ownership

In the past I’ve also found myself taking the property approach. A property is usually a class attribute held by containment and the obvious question might be why something this complex was being exposed directly in the first place. In the places I’ve used it’s because the containing class is essentially a Facade over an underlying complex object. In an idea world it would remain purely an implementation detail but legacy code (until refactored) often demands access to the underlying object in the mean time.

In these cases I have decided to retain ownership for performance reasons and expose the legacy object through a property. This is what the WebPage class in the example above was alluding too. But what it missed was the necessity of implementing the Dispose pattern itself so that it could delegate cleanup to the underlying object at the right time. So it should have looked like this:-

public class WebPage : IDisposable
{
  private DOM m_dom = null;

  public DOM
  {
    get
    { 
      if (m_dom == null) 
        m_dom = CreateDom(); 

      return m_dom;
    }
  }

  public void Dispose()
  {
    if (m_dom != null)
      m_dom.Dispose();
  }
}

// In the caller
using (var page = new WebPage(. . .))
{
  var dom = page.DOM;

  // Manipulate the DOM and indirectly dispose it
}

This is a little more effort for the callee and a fair bit more for the containing class. It probably seems a lot more effort to C# developers who are only used to managing simple resources like files and database connections in a limited scope. Annoyingly in C++ you have types like std::shared_ptr<> that just make this a nonissue, in fact after const it’s probably the second biggest thing I miss.

I would have hoped that a static analysis tool like ReSharper would have pointed out the need to implement IDisposable originally, but it doesn’t. I don’t remember FxCop picking it up either but I’ve not looked at FxCop recently and I suspect both would generate way too much noise for most normal users. Perhaps there is a setting I need to turn on instead? What I’m after (should a more knowledgeable person be reading this) is a warning when one type aggregates another (IDisposable) type and doesn’t implement IDisposable itself. In library and server-side code I assume that any type implementing IDisposable always requires its lifetime to be tightly controlled as you have no idea how tight or lapse the constraints are for the hosting process.

 

[1] Yes, in many cases where the objects are quite small we can let the garbage collector weave “it’s magic” and be blissfully unaware of the memory pressures created by the native goings on. But in many cases these babies are multi-megabyte monsters that have a serious impact on the (32-bit) process’s footprint.

[2] Try and ignore for the moment the possibly obvious (to you) thought that it looks like the Builder pattern and therefore ownership semantics are probably obvious. This is legacy code here, these rules don’t apply until after I’ve refactored it.

4 comments:

  1. This was the right and a good solution. I assume it's working well.

    WebPage has a Dispose method and is IDisposable. This is enough for the diligent user.

    Perhaps, because of legacy implications, and potential misuse it might be wise to implement a Destructor with a GC.SuppressFinalize to mitigate the drawbacks of defining a Destructor? I've never actually done this myself. I'm now wondering if I should have, and should now provide it when designing major resource harbouring classes.

    Example code here: http://msdn.microsoft.com/en-gb/library/system.gc.suppressfinalize.aspx

    ReplyDelete
  2. I don't see what benefits defining a Destructor (aka Finalizer) will bring? You should only free native resources via one as any managed members *may* have already been finalized by the time yours is run. And in any case the proverbial horse has already bolted by then - the point of IDisposable is to (manually) overcome the non-determinism of finalizers.

    ReplyDelete
  3. I formed the impression reading through Destructor documentation, that if Dispose was not called on a class, say your WebPage example class, that if it also implemented a Finalizer, at least the class might eventually be able run dispose at some undetermined point. I assumed this would be better than not ever.

    I gathered this offered the benefit that, hypothetically at least, an existing client app, that used the old WebPage could relink to your new version in the refactored library (assuming you maintained backward compatibility), and get some benefit, despite continuing not to call Dispose on it. You said that the Dom member wrapped a native type.

    I'll have to study it all more carefully, and try to gain a clearer understanding. Like I say I've never implemented a Finalizer myself but have seen it used by others.

    ReplyDelete
  4. The DOM type does need a finalizer, as without it native resources would be leaked if Dispose() was not called. However the WebPage type does not as it's entirely managed.

    ReplyDelete