Friday, 26 September 2014

Parameter Mutation or Result Aggregation?

As part of our normal development cycle I did a bit of code reviewing and came across something that I realised I would have done myself a while back, but these days would consider doing a different way. The code was a function that attempted to perform an action on a collection of items, and as part of the error handling would keep a note of the failures and then call itself recursively to process the remainder. Something like this:

public List<Failure> Write(List<Thing> things)

  var failures =  new List<Failure>();  

  Write(things, ref failures);  

  return failures;
}

private void Write(List<Thing> things,
                   ref List<Failure> failures)

  try 
  { 
    // Use bulk API to try to write all the “things”  
  } 
  catch (WriteException e) 
  {  
    failures.Add(new Failure(...)); 

    var unwrittenThings = things.Skip(e.FailedItem+1)
                                .ToList();  

    Write(unwrittenThings, ref failures);  
  }
}

As always the real code was far more complicated than this (mostly due to various degrees of exception handling) which meant that you couldn’t simply manually convert the recursion into a traditional loop. No, what I found I would have done differently was change the way that the “failures” list is passed into a recursive function where it is then mutated as part of the process. Essentially seeing the “ref” keyword set off my spider-senses, which is exactly what it’s supposed to do.

Although only the outer method is public, and so this is purely a discussion about the internal implementation of a method, I still questioned whether it would be preferable to avoid the mutation (i.e. side-effect). As I get more and more used to adopting immutable values and trying to be thread-agnostic I find the style of code above less appealing.

In my earlier C++ days (i.e. prior to the invention of move semantics) this would have (felt like) the most efficient way to do it, i.e. to mutate a std::vector style parameter passed by reference [1]. But in C#, where such costs are different because the collections are all reference types, it wouldn’t cross my mind not to return a potentially large collection via the return value.

This leads me to an alternate approach to the code above, which might be considered slightly more pure (aside from the huge side-effects of whatever the underlying operation is), which is to modify a local collection, aggregate the results from the recursive calls, and then return that:

public List<Failure> Write(List<Thing> things)

  List<Failure> failures = new List<Failure>(); 

  try 
  { 
    // Use bulk API to try to write all the “things” 
  } 
  catch (WriteException e) 
  {  
    failures.Add(new Failure(...)); 

    unwrittenThings = things.Skip(e.FailedItem+1)
                            .ToList();  

    var moreFailures = Write(unwrittenThings);

    faillures.AddRange(moreFailures);  
  }  

  return failures;
}

Note that we also only need a single method now.

The one aspect that would cause me to think twice is whether the recursion could get so deep that it might generate a stack overflow [2]. As someone who lived through the 16-bit DOS\Windows era I am all too aware how easy it could be to blow the stack with recursion. However a 1 MB default stack these days probably gives you plenty of room for normal use. Thinking more specifically about the example above, if the number of failures was so high that every write caused a failure, then a stack overflow exception may well act as a natural Circuit Breaker that highlights something is badly wrong elsewhere.

 

[1] The other C++ alternative of course (still prior to move semantics) might have been to use std::auto_ptr or a shared_ptr. Whilst auto_ptr was generally considered dangerous for use in collections it was suitable for tightly scoped use where objects were returned from factory functions.

[2] Support for tail recursion would alleviate this, but then you’d have to know that the tail recursion optimisation was being applied if you wanted to be absolutely sure of avoiding a stack overflow.

No comments:

Post a Comment