Monday, 19 November 2012

The File-System Is An Implementation Detail

Using a native file-system to store your data can be a double-edged sword. On the one-hand the great thing about a file-system is that the concept of files and folders is well understood and so there’s a low learning curve for new developers and a potential backdoor for support issues too. The downside of using the file-system is that it’s a well understood concept and there is a potential backdoor for the “time conscious” developer to use instead of any carefully designed “abstraction”.

The ability to manipulate the file-system with your everyday tools such as shell commands and scripts makes the allure of the “easy win” just too tempting some times. This is compounded by the fact that it may not be obvious whether the choice to use the file-system in such a way was a conscious one, exactly because of its flexibility, or imposed for some other reason[1], but not intended to be exploited. Once that power has been abused it can become the rationale for doing it again and again as people then follow its lead “to fit in with the existing code”. Then, any notion of “implementation detail” becomes obscured by the sheer volume of exploitative code.

But building an “abstraction” over the file-system doesn’t have to be some gigantic API with SQL like semantics to query and update data, it could be as simple as a bunch of static methods that take individual query parameters that you then turn into a path inside the abstraction. For example, say you have the following legacy code that loads customer data from a file in a value dated folder:-

public void ProcessCustomer(string name, DateTime date)
{
  string datedFolder = date.ToString(“YYYY-MM-DD”); 
  string path = Path.Combine(“\\server\share”, datedFolder); 
  path = Path.Combine(path, name); 

  Customer customer = Customer.Load(path);
  . . .

A big problem with this code is that it makes testing hard because you’re in the territory of having to mock the file-system API - notably some static methods in C#. Fortunately it’s reasonably easy to build a thin Facade[2] that can serve you well when testing your error recovery code too. But it is another barrier on the path (of least resistance) to a more wholesome codebase.

One objection you might immediately raise is that the Customer class takes a file-system path anyway and so you’re already tied to it, i.e. you’re doing no worse. What you’re missing is that the Customer class actually takes a string, the semantics of which happens to be a path. In future it could be a RESTful URL or some other “location”. The Load() method could also be overloaded to take different types of “location”. To support that agnosticism you’re best off getting the “location” to the class loader without needing to know yourself what it is, i.e. you want to start treating the “location” as an opaque type[3].

The simplest change is to just move the file handling code into a separate class:-

public static string FormatCustomerPath(string root, string name, DateTime date)
{
  string datedFolder = date.ToString(“YYYY-MM-DD”);
  string path = Path.Combine(root, datedFolder);
  path = Path.Combine(path, name); 
 
  return path;
}
. . .
public void ProcessCustomer(string name, DateTime date)
{
  string path = FS.FormatCustomerPath(“\\server\share”, name, date);

  Customer customer = Customer.Load(path);
  . . .
}

This very simple change already divorces the caller from many of the grungy details of where the data is stored. For starters you could change the order of the name and the date in the path. You could even insert other “random” elements into the path such as environment name, version number, etc. Or you could format the dates in DDMMMYYY instead of YYYY-MM-DD. The very fact that you’re passing the date as a rich DateTime type also makes the API less prone to error because you’re not juggling lots of raw string parameter[4]. Also, if your fellow developers have a habit of using format strings like “{0}\\{1}\\{2}” for paths then this simple change will improve portability too.

The root of the path (“\\server\share”) might be hard-coded but more likely is stored in a variable so the next step could be to extract that out entirely by storing it in a static member of the FS class instead. This can then be done once as part of the bootstrapping process in main(), leaving the client code virtually unaware of the existence of the file-system at all:-

public void ProcessCustomer(string name, DateTime date)
{
  string path = FS.FormatCustomerPath(name, date);

  Customer customer = Customer.Load(path);
  . . .
}

The final bare minimum change might be to add an extra level of indirection and hoist these two lines out into a separate class, or rename and change the responsibility of the FS facade to solely be about the file-system aspects of loading a variety of objects:-

public void ProcessCustomer(string name, DateTime date)
{
  Customer customer = ObjectLoader.LoadCustomer(name, date);
  . . .
}

I think all these steps were trivial and if you’ve got a mountain of code to refactor you can’t always get everything done in one hit. Dependencies often have a habit of getting in the way too so you can up either having to back out changes or keep on going to the bitter end. By initially sticking with a static facade you can improve things one piece at a time without making your testing position any worse. In fact you’ll still make it slightly easier.

Remember, I’m not suggesting the use of a static facade over a “proper service” that has a “proper mockable interface” and a “proper factory”. Static facades are still fairly unpleasant beasts because they spark thoughts of global variables and Singletons which are generally considered a bad smell. Yes, there is tooling available that will allow you to mock this kind of behaviour but you should probably be looking to a much higher level of abstraction anyway when you have any sort of persistence layer to deal with. Just don’t forget that sometimes you can make a big difference with a much smaller, localised set of changes.

 

[1] One system I worked on uses it because the manual mocks we built for integration testing ended up becoming the production implementations.

[2] Tim Barrass wrote about one way of doing this for C# in his post “Using static classes in tests”. The same technique could be used in C++ but substitute “static method” with “free function” and “delegate” with “function pointer”.

[3] I appreciate that I’m skating on thin ice with the suggestion that you can play games with the all-singing-all-dancing String type. Ultimately I’d expect to switch to some Abstract Data Type at some time later, but only if an even richer abstraction hasn’t come into play before that to mitigate all this.

[4] There are two very common types used for surrogate keys - int and string. It’s tempting to pass these around using their primitive types but that makes an API much harder to use correctly as it’s all too easy to switch parameters by accident and get some weird behaviour.

No comments:

Post a Comment