Monday, 26 January 2015

IIS Hosted Web API Not Caching Internal HTTP Responses

Executive Summary: Set “Load Profile” to true for the IIS application pool.

I had a fun afternoon last week trying to help a colleague work out why the responses to the internal HTTP requests from one of my team’s web APIs weren’t being cached by the .Net HTTP component. It appeared that this was all working fine in the pre-production environment, but not production itself.

Caching HTTP Responses

The code (which I’m not overly familiar with) uses one of the standard .Net HTTP request classes (WebRequest I believe) to make authentication calls to another internal, RESTful service. It was anticipated by the owner of this service that responses could be cached by the .Net framework thereby alleviating the need for our own service to manually implement a caching layer. Naturally the point of (briefly) caching the HTTP responses was to reduce load on the underlying authentication servers and they had made sure that their response headers were correctly configured to make this “all just work”.

In theory this was true, but in practice it turned out to be a little more complicated once the code as running under the real IIS worker process (w3wp.exe).

Administrator Rights

My colleague had already done a fair bit of analysis and experimentation with a spare production server and the wonderful Process Monitor tool from Sysinternals. By watching the w3wp.exe process whilst he sent requests with CURL he could see that the process was getting Access Denied errors when accessing parts of the Registry and file-system.

He had also tried configuring the IIS application pool to run under another non-service account, i.e. a support user’s account, and that appeared to work. Comparing the behaviours with Process Monitor for the two accounts he could verify that the problem was one of permissions, or so it seemed.

A natural response when seeing Access Denied problems is to grant the underprivileged account admin rights and that’s what he did and lo-and-behold things started working. However further experimentation suggested that this didn’t always fix the problem, which was mightily confusing.

At this point I joined the party...

Verifying Admin Rights

Past experience has taught me that just adding an account to the Administrators group is not always enough - it’s best to actually check the security attributes of the process in question to double-check that it really has become elevated. Conversely when removing rights I make the same check to ensure they’ve been revoked correctly.

For this check I used another of Sysinternals tools - Process Explorer. We hit an initial “Access Denied” for some process information [1] so I used the technique of starting it with “psexec -s -i” which was probably a bit of overkill (See “Process Explorer Failed to Display DLLs or Handles” for when this might be necessary). After right-clicking on w3wp.exe to show the process’s Properties and then switching to the Security tab I could see that the process was indeed a member of BUILTIN\Administrators.

It later transpired that the w3wp.exe process not correctly acquiring or relinquishing its power might have been the source of some of the earlier confusion. I too witnessed the removal of the account from the local Administrators group and after recycling the web site application pool found the w3wp.exe process still appeared to be a member. The process start time strongly suggested it had been restarted too when we recycled the app pool.

The Default Profile

We then switched back to Process Monitor and looked at the Registry and file-system access. The HTTP client was accessing various “Internet Settings” registry keys under the HKEY_USERS tree - HKU\.Default in particular. It was then reading and writing to an internet cache that was stored under “C:\Windows\Temp\Temporary Internet Files\...”.

I had always suspected that user profiles might have a role to play in this problem because I remembered back in the dim-and-distant past that the native WinInet library couldn’t (or perhaps shouldn’t) be used for making HTTP calls when running as a service. Microsoft later added another native component called WinHttp that could be used by services. IIRC the main problem back then was around configuring proxy settings, and whilst we weren’t suffering from that specific ailment this time, the registry keys it appeared to be accessing was pointing in the same direction.

We then revoked the admin rights from the service account, checked they had been revoked with Process Explorer, and then executed the test HTTP requests again so I could see whether the same registry and file-system access patterns occurred, and they did. A brief play with the REG command-line tool and DIR helped prove that, although the keys and folders existed, the unprivileged service account wouldn’t have access to them.

LoadProfile=true

This suggested to me that the problem was likely down to the fact that the service account was being used by the w3wp.exe process without a user profile being loaded. As such, due to the lack of a profile, it was trying to do the same thing it would with just a default profile on hand, but it didn’t have the necessary rights to update the cache in the “C:\Windows\Temp\Temporary Internet Files” folders.

Changing the Application Pool setting for “Load Profile” to “true” seemed to fix the problem. We then applied this to another spare production server (after first proving that it was not behaving correctly) and that worked too.

Admin Rights Not Required

I think this tale highlights why the answer to many deployment issues can be solved by apparently making the service account a local administrator. The huge gain in power that right confers allows the process to read, and more importantly write, to so much of the system. Even if the correct answer is an entirely different setting (as in this case) taking the sledgehammer approach can get you running, but at a potentially disastrous cost [2].

Anyone doing service style deployments on Windows should know how to weld tools like NTRIGHTS and CACLS (or their more modern equivalents) to grant just the right amount of permissions for their processes.

 

[1] I’ve spent so many years supporting pre-Windows Vista systems that I still forget occasionally that I need to elevate admin tools. I usually start with a non-elevated command prompt for safety and then wonder later why a tool doesn’t work properly because I’ve forgotten that I’ve gone for safety first. Which of course is the whole point :-).

[2] If the service is compromised it might grant an intruder further access. Even if it’s an internal service putting it in a sandbox helps ensure it does what you think it should.

Thursday, 22 January 2015

Configuration Can Be Simpler

Every time a developer creates an XML based IoC container config file I’m sure not just one, but an entire litter of kittens die. Not just them but any build managers [1] and support staff too. I’ll admit that I have yet to work on a project where I’ve seen any clear reason to even use an IoC container. I’m sure they exist, it’s just where I’ve seen them used it’s as a workaround for some other problem, e.g. a weak build and deployment pipeline [2].

Configuration Files are a User Interface

In “Diagnostic & Support User Interfaces” and “Testing Drives the Need for Flexible Configuration” I tried to show how system configuration files are a form of Human Computer Interface (HCI). This is a view that is also promoted in Michael Nygard’s wonderful book “Release It! Design and Deploy Production-Ready Software”.

If that is the case, then files with hundreds or thousands of lines of XML is not my idea of a pleasant UI to work with. One knee-jerk reaction might be to manage the complexity by trying to use an extra level of indirection and generate the XML config file(s) from a much simpler DSL written with, say, a scripting language. To me this just appears to be an attempt to overcome the XML “angle-bracket tax” instead of getting to the heart of the problem, which is using an overly-flexible and complex tool to solve what might be a much simpler problem.

Example: Configuring a Chain of Responsibility

To try and explain my position further I’m going to use a real-world example that has come up. Within a web API I worked on there was a component that handles the authentication aspects of an internal system. This component used the Chain of Responsibility design pattern to achieve its goals so that a client request could be fired into the head of the chain and somewhere along it one of the providers would answer the question of who the principle was, even if the eventual answer was “unknown”.

The section of an IoC container XML config file needed to describe this chain was not pretty - it was littered with fully qualified types names (assembly, namespace and class/interface). Whilst I grant you that it is possible to configure a Directed Acyclic Graph (DAG) of objects using the tool and its config file I would suggest that it is probably not desirable to. The example - a chain of responsibility - does not call for that amount of power (an arbitrary DAG) and I think we can do something much simpler and with very little coding effort.

Key/Value Configuration

I’ve always found the old fashioned .ini file style of configuration files one of the most easy to read and maintain (from a multi-environment perspective). One of the reasons for this I believe is that the configuration is generally flattened to a simple set of key/value pairs. Having one entry per line makes them really easy to process with traditional tools and the constraint of one-setting-per-line forces you to think a bit harder about how best to break-down and represent each “setting”. If the configuration value cannot be distilled down to a simple value then perhaps a richer structure might be needed (indicated with a simple setting like a filename), but just for that one setting, it does not invite unnecessary complexity elsewhere.

Consequently I would look to try and express the structure of our example “chain” in a single key/value entry, albeit in this case as a .Net style “App Setting”. This is how I would express it:

<appSettings>
  . . .
  <add key=“Security:AuthenticationChain” 
       value=“Token,IP,Default” />
  <!-- component specific settings below it -->
  . . .
</appSettings>

You’ll immediately notice that the name of the setting describes the structure we are trying to create - a chain - you don’t have to infer it from the structure of the XML or read an XML comment. You’ll also see that we haven’t included any implementation details in the setting’s value - each term is a shorthand (an abstraction if you will) for the component that rests in that position within the chain. This makes code refactoring a breeze as no implementation details have leaked outside the code and could therefore be hidden from a refactoring tool.

The Factory Method

If you’re thinking that all the complexity can’t have vanished, then you’d be right. But much of the accidental complexity has gone away simply by choosing to use a simpler and more restrictive representation. Parsing this representation only requires a handful of lines of code:

public static class Factory
{
  public static IProvider CreateChain(IConfiguration configuration)
  {
    var key = “Security:AuthenticationChain”;
    var setting = configuration[key];
    var chain = setting.Split(‘,’).Reverse();

    IProvider root = null;
    foreach (var type in chain)
      root = CreateProvider(type, configuration, root);

    return root;
  }

  private static IProvider CreateProvider(string type, IConfiguration configuration, IProvider nextProvider)
  {
    switch (type)
    {
      case “Token”:
        return new TokenProvider(..., nextProvider);
      case “IP”:
        return new IpProvider(..., nextProvider);
      case “Default”:
        return new DefaultProvider(..., nextProvider);
    }

    throw new ArgumentException(“Unknown provider type ‘{0}’”, type);
  }
}

If all the dependencies in this chain are purely functional (and they almost certainly should be for a Chain of Responsibility) then the factory and chain can be easily tested using component-level tests [3]. The only mock it requires is for the configuration object which is trivial to knock-up, even as a manual mock.

What About Adding New Providers?

One of the allures of IoC containers is the apparent ease with which you can “re-configure” your system and add support for a new component. Whilst that may be true for the deployment configuration step, I know that the development and testing cost of any new component is easily going to dwarf the miniscule amount of change required to the Factory Method shown above to support such a change. Unless the new component is going to be implemented in an entirely new assembly and there is a strict requirement not to touch the existing codebase, then you’ve already got a code change and re-deployment on your hands anyway.

Configuration Validation

One of the things in particular I like about manual factory methods is that they provide a suitable place to put validation code for your configuration. For example, in our authentication chain the final provider is the default one. This provider is “a provider of last resorts”, no matter what happens before it, it can always answer our query and return an answer, even if that answer is “no-one knows who they are”. The fact that this provider should always appear at the end of the chain means that there is the possibility to mis-configure the system by accident. Imagine what would happen if it was the first in the chain - every other provider would essentially be ignored.

One approach to this problem is to punt and let the documentation (if any exists) provide a warning; if the sysadmins then mess it up that’s their problem. Personally I’d like to give them a fighting chance, especially when I know that I can add some trivial validation (and even unit test its behaviour):

case “Default”:
{
  if (nextProvider != null)
  {
    throw new ConfigurationException(“The ‘Default’ provider can only be used at the end of the chain”);
  }
  return new DefaultProvider(...);
}

If you look closely you’ll see that I’ve also removed the final argument from the call to the DefaultProvider constructor. The great thing about expressing your design in code (for a statically compiled language [4]) is that once I decided to remove the final parameter from the DefaultProvider class ctor I could lean on the compiler and fix-up the call sites as appropriate. If one of the call sites wouldn’t have been usable this way I’d have spotted it immediately rather than having to debug a curious runtime issue some time later.

Test-Driven Design

I’ve heard it said that TDD done properly leads you naturally towards using an IoC container. Frankly I’m bemused by that statement because when I do TDD it leads me exactly towards what I describe above. Frankly I genuinely cannot see how the simplest step can be to go from a hardcoded implementation (i.e. the DefaultProvider in our example) to using a highly-configurable tool to define an arbitrary DAG - there is always something much simpler in between.

Even if some of the steps below are not actually coded up and just become a thought exercise, I’d still not end up reaching for the big guns  by the end:

  1. Identify a need to do authentication
  2. Create the default provider with TDD
  3. Hard-code it as the only implementation
  4. Identify a need for other authentication mechanisms
  5. Create them with TDD
  6. Hard-code the initial chain
  7. Identify a need to make the chain order configurable
  8. Introduce a factory (method) to house the creation logic, using TDD

As I said right back at the beginning, with a fluid development process and one eye on good design principles, the term “hard-coded” no longer remains quite such the pejorative it once was.

 

[1] Not a role I had come across until recently as the build pipeline and deployment process has always been maintained by the developers.

[2] When you can make a change in isolation, build your artefacts, run an extensive test suite and deploy changes to production quickly, many needs for components to be configurable through config files just evaporate.

[3] The difference between unit-level tests and component-level tests is really just the size of your “unit”. I chose not to say “unit tests” to make it clear that I am aware there would be more than one class under test in certain scenarios.

[4] Or a non-statically compiled language with some form of type annotations.

Friday, 16 January 2015

Terse Exception Messages - Part II

Quite unintentionally it looks like my post back in October last year “Terse Exception Messages” needs a second part. The initial post was about dealing with terse error messages generated by external code, however I’d forgot that terse exception messages are not just the domain of 3rd party libraries, they can appear in “our code” too. Take this example from some real world production code:

throw new NotFoundException(“Invalid ID”);

Sadly this will likely appear in any diagnostic log file with only the usual thread context information to back it up, e.g.

<date, time, thread, etc.> ERROR: Invalid ID

If you also have a stack trace to work from, that could give you a clue as to what type the ID refers to. You might also be able to hunt around and find an earlier diagnostic message that could give you the value too, but you’re probably going to have work for it.

Clearer Exception Messages

At a bare minimum I would also include the “logical” type and, if possible (and it usually is), then the offending value too. As such what I would have thrown is something more like this:

throw new NotFoundException(“Invalid customer ID ‘{0}’”, id);

This message contains the two key pieces of information I know are available at the throw site and so would generate a more support-friendly message like this:

<date, time, thread, etc.> ERROR: Invalid customer ID ‘1234-5678’

Discerning Empty Values

You’ll notice that the value, which in this case was a string value, is surrounded by quotes. This is important in a message where it might not be obvious that the value is empty. For example if I had not added the quotes and the customer ID was blank (a good indication of what the problem might be) then it would have looked thus:

ERROR: Invalid customer ID

Unless you know intimately what all messages look like, you could easily be mistaken for believing that the error message contains little more than our first example, when in fact the value is there too, albeit invisibly. In the improved case you would see this:

ERROR: Invalid customer ID ‘’

For strings there is another possible source of problems that can be hard to spot when there are no delimiters around the value, and that is leading and trailing whitespace. Even when using a fixed width font for viewing log files it can be tricky to spot a single erroneous extra space - adding the quotes makes that a little easier to see:

ERROR: Invalid customer ID ‘1234-5678 ’

As an aside my personal preference for using single quotes around the value probably stems from many years working with SQL. I’ve lifted values out of log files only to paste them into a variety of SQL queries countless times and so automatically including the quotes seemed a natural thing to do.

Discerning Incorrect Values

It’s probably obvious but another reason to include the value is to discern the difference between a value that is badly formed, and could therefore never work (i.e. a logic error), and one that is only temporarily unusable (i.e. a runtime error). For example, if I saw the following error I would initially assume that the client has passed the wrong value in the wrong argument or that there is an internal logic bug which has caused the value to become “corrupted”:

ERROR: Invalid customer ID ‘10p off baked beans’

If I wanted to get a little more nit-picky about this error message I would stay clear of the word “invalid” as it is a somewhat overloaded term, much like “error”. Unless you have no validation at all for a type, there are usually two kinds of errors you could generate - one during parsing and another during range validation. This is most commonly seen with datetimes where you might only support ISO-8601 formatted values which, once parsed, could be checked to ensure an end date does not precede a start date.

To me the kind of error below would still be a terse message; better than the original one, but could be even better.

ERROR: Invalid datetime ‘01/01/2001 10:12 am’

My preferred term for values that fail “structural validation” was adopted from XML parsing - malformed. A value that parses correctly is considered “well formed” whereas the converse would be “malformed”. Hence I would have thrown:

ERROR: Malformed datetime ‘01/01/2001 10:12 am’

For types like a datetime where there are many conventions I would also try and include the common name for the format or perhaps a template if the format is dynamically configurable:

ERROR: Malformed datetime ‘01/01/2001 10:12 am’ (Only ISO-8601 supported)

ERROR: Malformed datetime ‘01/01/2001 10:12 am’ (Must conform to ‘YYYY-MM-DD HH:MM:SS’)

Once a value has passed structural validation if it then fails “semantic validation” I would report that using an appropriate term, such as “out of range” or explicitly describe the comparison. With semantic validation you often know more about the role the value is playing (e.g. it’s a ‘start’ or ‘end’ date) and so you can be more explicit in your message about what the problem is:

ERROR: The end date ‘2010-01-01’ precedes the start date ‘2014-02-03’

Custom Parsing Methods

As we saw back in my original post the language framework parsing methods are often horribly terse, even when failing to parse a simple value like an integer. The reason is that, like all general purpose functions, there is a balance between performance, security, etc. and so that means including the value in the exception might be undesirable in some classes of application.

However in the systems on which I work this is not a problem and I’d happily trade-off better diagnostics for whatever little extra garbage this might produce, especially given that it’s only in (hopefully) exceptional code paths.

My solution to this problem is the proverbial “extra level of indirection” and so I wrap the underlying TryParse() framework methods with something that produces a more satisfying diagnostic. In C# extension methods are a wonderful mechanism for doing that. For example I’ve wrapped the standard configuration mechanism with little methods that read a string value, try to parse it, and if fails then include the setting name and value in the message. Consequently instead of the usual terse FormatException affair you get when parsing an int with code like this:

int.Parse(settings[“timeoutInMs”]);

You invoke an extension method like so:

settings.ReadInt(“timeoutInMs”);

If this fails you’ll get a much more pleasing error message along these lines:

ERROR: The ‘timeoutInMs’ configuration setting value ‘3.1415’ was not a well formed integer value

Formatting Exceptions in C#

Whenever I create a custom exception type in C# I generally add a variety of overloads so that you can create one directly with a message format and selection of arguments without forcing the caller to use String.Format(). This is how the earlier example worked:

throw new NotFoundException(“Invalid customer ID ‘{0}’”, id);

All the class needs for this kind of use is a signature akin to String.Format’s:

public NotFoundException(string format,
                         params object[] args)
    : base(String.Format(format, args))
{ }

However this is not enough by itself, it’s dangerous. If we pass a raw message that happens to contain formatting instructions but no arguments (e.g. “Invalid list {0, 1, 2}”) it will throw during the internal call to String.Format(), so we need to add a simple string overload as well:

public NotFoundException(string message)
    : base(message)
{ }

As an aside I never add a public default constructor because I’m not convinced you should be allowed to throw an exception without providing some further information.

Due to the framework classes not having a varargs style ctor you might already be happy putting formatting calls in the callee, either with String.Format() directly or via a simple extension method like FormatWith():

throw new NotFoundException(String.Format(“Invalid customer ID ‘{0}’”, id));

throw new NotFoundException(“Invalid customer ID ‘{0}’”.FormatWith(id));

One scenario where the varargs style falls down in C# is when you start adding overloads to capture any inner exception. You can’t just add it on the end of the signature due to the final params-based args parameter, so you have to add it to the beginning instead:

public NotFoundException(Exception inner,
                         string format,
                         params object[] args)
    : base(String.Format(format, args), inner)
{ }

Sadly, this is the exact opposite of what the framework does, it adds them on the end.

In my experience very little effort is put in by developers to simulate faults and validate that exception handling is behaviour correctly, e.g. verifying what is reported and how. Consequently this makes the varargs overload potentially dangerous as adding an exception argument on the end by following the framework style would cause it to be silently swallowed as no placeholder would reference it. This has never been a problem for me in practice because I always throw custom types and always try and format a string, but that’s easy when you’re the one who discovers and instigates the pattern - others are more likely to follow the pattern they’re used to, which is probably how the framework does it.

Testing Error Messages

At the tail end of the previous section I touched briefly on what I think is at the heart of why (server-side) error messages are often so poor - a lack of testing to make sure that they satisfy the main “requirement”, which is to be helpful in diagnosing a problem. When writing unit tests for error scenarios it’s all too common to see something like this:

Assert.That(() => o.DoIt(. . .), Throws.Exception);

This verifies and documents very little about what will is expected to happen. In fact if “o” is a null reference the test will still pass and not even invoke the behaviour we want to exercise! Our expectations should be higher; but of course at the same time we don’t want to over specify and make the test brittle.

If a caller is going to attempt recovery of the failure then they need to know what the type of the exception is, lest they be forced to resort to “message scraping” to infer its type. Consequentially it’s beneficial to verify that the exception is of at least a certain type (i.e. it may also be a derived type). Where an obvious value, such as an input argument, is a factor in the error we can loosely test that the value is contained within the message somewhere too:

const string id = “malformed-value”;

Assert.That(() => o.DoIt(id),
            Throws.InstanceOf<ValidationException>()
                  .And.Message.Contains(is));

If there are many arguments to be validated it is important to try and match on the message too, or some other attribute, to discern which of the many arguments caused the exception to be thrown. It’s all to easy when making a change to a method that you end up finding one or more of your tests are passing because they are detecting a different error to the one they were intended to, as the null reference example above highlighted.

Whilst that takes care of our code-based contractual obligations we also have an obligation to those who will be consuming these messages to make sure they form a useful narrative within any diagnostic output. I tackled this side of the issue some years ago in “Diagnostic & Support User Interfaces” and so if you’ve experienced a sense of déjà vu reading this post then that may be a contributing factor.

Tuesday, 13 January 2015

Text Editor Outlining Modes Hide Complexity

I have this theory, but I don’t have a statistically large enough data set [1] with which to publish anything other than a random musing. My theory is that developers who use a text editor with its “Outlining Mode” [2] switched on have a tendency to write more “complex” code.

Of course tarring every developer with the same brush is not exactly endearing myself to my readership and so I fully expect there to be many who use this feature with great aplomb and are aware of its apparent shortcomings. However for those who are unsure as to why I’m proposing this theory let me explain what it is that I’ve seen when working with developers who have this feature switched on.

Duplicate Code

The most obvious thing I’ve noticed is that developers who don’t see all the code on screen at once don’t notice all the duplication. If I’ve spotted a problem in a method then, as a natural by-product of seeing all the code for the class, I’ve also noticed the same pattern appears in a number of other places. In contrast when pairing with someone who can only see the method names, except for the one being currently edited, they do not automatically notice and therefore go looking for any duplication.

Of course the real problem is the duplication itself. I suspect that because I always see all the code I tend to see duplication as it builds up and so will naturally refactor it as soon as the test goes green. When all you see is method names I’m not sure how you are expecting to notice duplication unless you literally cut-and-pasted some other code. At least with code literally cut-and-pasted it looks identical, but when you’ve written code that only appears logically similar to that in another method it can be harder to reason away the duplication and do the necessary refactoring.

Accidental Complexity

In a more general sense what appears to be happening is that, paradoxically, the tool that tries to reduce the noise and clutter whilst editing code has succeeded in creating further complexity by constantly hiding it. If you do not “feel the pain” caused by working with overly complex code you do not feel the need to refactor it, until it becomes a problem, by which point you’ve probably already lost the battle. By having all the code in a source file visible you get to see the big picture so that even if you’re making edits to one method your subconscious can still go to work on the surrounding code and nudge you when it thinks something smelly is appearing.

Debugging is another opportunity where you get to take a walk through code you’ve perhaps not worked on yourself. In effect you have the opportunity to review that code as you’re debugging it, not formally, but you can certainly makes notes and raises any issues later (See “In The Toolbox - Pen & Paper”).

C# Regions

The use of regions in C# is another variation on this problem and it becomes yet another switch to turn off in the IDE to ensure all code is visible all the time. In theory being able to hide auto-generated code from developers when editing is a venerable goal, but the way I’ve seen it being used is to hide big chunks of code which just appears to act as a catalyst for the creation of behemoth source files.

Plausible?

Does this “theory” hold water? When I tweeted about it there were a few murmurs that at least supported vague anecdotal evidence and nobody came straight back and cited a paper, so I don’t think it’s already been debunked. I was pretty sure it’s old news too and amusingly I did get a reply from Len Holgate with a link to a blog post he had written almost exactly 10 years earlier that touched on part of this topic “VS.Net #Region”.

Perhaps a passer-by will leave a comment in future that will provide evidence of a more scientific study of this feature. Or alternatively someone else will point out how the tools being used incorrectly so that I don’t feel the need to turn it off the moment I install Visual Studio.

 

[1] You can read that as “a handful of cases”.

[2] That’s the name used in Visual Studio, but I have also heard the term “folding editor” used for the same feature.

Friday, 9 January 2015

Git-TFS Pull Failure - error running command: git rebase --preserve-merges

I’ve been using git-tfs in anger now for the past 6 months and there was an occasional problem that cropped up that had me scratching my head a couple of times. Somehow I managed to fix it the first time by fumbling around; the second time it happened I decided it was time to fix it “scientifically” and then write it up for when it happens again...

TFS Gated Builds

I’m working in a team that has separate TFS projects, some of which I can push commits to directly with git-tfs rcheckin, and others that go through a TFS gated build. The latter requires use of git-tfs checkintool which pops up the standard TFS check-in dialog, but only after squashing any un-pushed commits in the Git repo into a TFS shelveset. It’s this mismatch between the Git master branch, which has separate commits, and the single squashed commit that comes back down on the TFS remote master branch, that causes the error.

As an aside, if I’m working on a repo that I can’t use git-tfs rcheckin with to keep my Git history intact then I manually push after every Git commit to avoid the squashing behaviour of git-tfs checkintool. As such the only times I’ve seen this problem is when I’ve forgotten to push a commit to TFS and am now pushing two commits (squashed as one) without realising it. I normally do all my code reviewing when I commit to my Git repo, I only take a cursory glance at the changes I’m pushing to TFS because I know the changes themselves are what I want; it’s just the impedance mismatch between Git and TFS causing the extra friction.

Git-TFS Pull Failure

After git-tfs checkintool has run the use of a TFS gated build causes the check-in to appear to abort. However in the background the TFS shelveset is passed onto the gated build where it’s built and then merged if successful. Once the gated build completes we can re-sync the local Git repo with the remote TFS repo using git-tfs pull. But sometimes it fails like this:

> git tfs pull --rebase
Working with tfs remote: default
Fetching from TFS remote 'default'...
C12345 = c3b3f9a1339366b918b2bb5c508590bbdcce4c60
rebase in progress; onto c3b3f9a
You are currently rebasing branch 'master' on 'c3b3f9a'.
 
nothing to commit, working directory clean
The previous cherry-pick is now empty, possibly due to conflict resolution.
If you wish to commit it anyway, use:
 
    git commit --allow-empty
 
Otherwise, please use 'git reset'
error running command: git rebase --preserve-merges refs/remotes/tfs/default
Command exited with error code: 1

Squashing the Git Master

If you compare the Git master and TFS remote master branches you’ll probably notice that the remote branch has a squashed version of your last few commits on master. Consequently what we need to do is squash the commits on master to make the local and remote branches look the same before we do the pull. First we need to abort the current rebase so that we put master back to where it was before we attempted the pull:

> git rebase --abort

Once this is done we can squash the commits on master to match the TFS remote. I happen to use TortoiseGit for this because it’s really easy and I’ve already got it open to compare the tips of the two branches (local and remote masters).

Pull Again

With the master and remote now back in sync it’s a simple matter of executing the git-tfs pull again at which point the rebase should now succeed and master will now match the TFS remote again.

> git tfs pull --rebase
Working with tfs remote: default
Fetching from TFS remote 'default'...
Rebasing (1/1)

Note: There shouldn’t be anything to rebase (unless you’ve added more commits since) so the only observable difference will be that the newly squashed commit will have the extra git-tfs-id property as per norm to marry the Git and TFS revision changeset numbers.

Wednesday, 26 November 2014

My 200th Blog Post

Five and a half years ago during a 6 month sabbatical I decided to start a blog. I did a brief retrospective after a year but I’ve not done another one since. Last week I was curious enough to tot up how many posts I’d written so far, and given that it was close to a nice round number, I thought it was the perfect opportunity to have another retrospective and see how things have panned out since then.

Looking back at that first anniversary blog post I can see that not that much has changed in terms of my mission on what I wanted to write about, except perhaps that I’m continuing to live in the C# world and only dabbling in C++ in my personal codebase. I have added some T-SQL projects and PowerShell scripts in the meantime which have also given me a wider remit. In fact my PowerShell blog posts are the ones most read according to Google Analytics.

I’m still mostly using my blog as a knowledge base where I can document stuff I’ve not obviously found an answer for elsewhere, but more recently I’ve branched out and started to write posts that cover my journey as a programmer too. Many of them are my various fails, such as “Friends Don’t Let Friends Use Varargs” and “A Not-So-Clever-Now String Implementation”, which is my way of showing that we all make mistakes and they should be nothing to be ashamed of. Those particular examples covered more technical aspects of programming whereas I’ve also started to look at the development process itself, such as in “What’s the Price of Confidence” and “The Courage to Question”. Refactoring has been a heavily reoccurring topic too, as of late. On a few occasions I’ve even dared to air a few more programming-related personal issues to try and give a little more background to what goes on inside my head, see “Haunted By Past Mistakes” for example. I don’t suppose anyone is really interested, but it acts as a nice form of therapy for me.

What was amusing to read back in that first anniversary post was the section about how I was planning to write something about unit test naming conventions, but I ended up ditching it because I discovered I was way off the mark. That post eventually did get written, but only last month! As you’ll see in “Unit Testing Evolution Part II – Naming Conventions” and the follow-up “Other Test Naming Conventions” this part of my programming journey has been quite bumpy. I’m not entirely sure I’ve reached a plateau yet there either.

My main aspiration was that writing this blog would help me sharpen my writing skills and give me the confidence to go and write something more detailed that might then be formally published. Given my membership of the ACCU I naturally had my eye both of its journals (C Vu and Overload) as an outlet. It took me a few years to get to that position (I had at least written upwards of 20 reviews of books, the conference and branch meetings) but eventually I resurrected an article I had started just over 3 years earlier and finally finished it off.

That article was accepted for the February 2013 issue of Overload. Since then I’ve started an occasional column in C Vu called “In the Toolbox” and have written many other articles for both C Vu and Overload on both technical and non-technical matters. A few of those even started out life as a blog post which I then cleaned up and embellished further. A few months back I felt I had reached the point where I had enough content to require a couple of blog posts to summarise what I’d published so far: “In The Toolbox - Season One”, “Overload Articles - Season One” and “C Vu Articles - Season One”.

Confidence in writing has undoubtedly led to a confidence in presenting too, by which I mean that at least I feel more confident about the content I’m presenting, even if my presentation style is still rather rough around the edges.

The last thing I noted in my first anniversary post was a quip about not looking at the stats from Google CropperCapture[5]Analytics. Well, if the stats can be believed, and I’m not altogether convinced how accurate they are [1], then I’m definitely well into double figures now. In fact quite recently the number of page views rocketed to the tens-of-thousands per month. This appears to coincide with a blog post I published about my dislike of the term Dependency Injection (See “Terminology Overdose”) and it got a much wider audience than normal via some re-tweets from a couple of prominent programmers. The graph on the right shows the number of page views per month over the time my blogs been active. Obviously a page view does not constitute a “read” but if nothing else it shows that my content is not stuck at the bottom of the search engine results pages. Hopefully some of those “hits” will have resulted in someone finding something useful, which is all I really aspire to.

I decided to use WC on my collection of blog posts, which I get emailed to me whenever I publish them, to see how many words I’ve written in the last five and a half years. The original result seemed way too high as I know that even my longest posts still come in under 1,500 words. In turned out Blogger started sending me a multi-part email with both a plain text and an HTML version in it some months after I’d started. A quick application of SED with an address range to pick out just the plain text versions gave me the more believable sum of ~148,000 words. Based on that, each post is ~740 words long which sounds about right. I had hoped to write more, smaller posts, but I seem incapable of doing that - as now demonstrated…

So, what does the future hold? Well, I still have 77 blog post titles sat in the backlog, so I reckon there is still plenty of content I probably want to explore and I’m sure the backlog is grower quicker than I’m writing which feels healthy for a budding author. Moving teams and projects every so often always seems to generate new thoughts and questions and therefore I expect more to come from that. The one concern I did have was that starting to write more formal articles would mean I spent less time blogging, but I don’t think that’s been the case. My blog will definitely continue to be the place for ad-hoc content and for musings that I have yet to form into a more coherent piece.

See you again in a few years.

 

[1] The Google stats indicate a steady stream of page views from May 2006, but my blog wasn’t even started until April 2009!

Tuesday, 25 November 2014

Derived Class Static Constructor Not Invoked

The other day I got a lesson in C# static constructors (aka type constructors) and the order in which they’re invoked in a class hierarchy that also has generic types thrown in for good measure. Naturally I was expecting one thing but something different happened instead. After a bit of thought and a quick read I can see why it’s so much simpler and behaves the way it does, but I still felt it was a little surprising given the use cases of static constructors. Part of this, I suspect, is down to my C++ roots where everything is passed by value by default and templates are compile-time, which is the opposite of C# where most types are passed by reference and generics are generated at run-time.

The Base Class

We have a base class that is used as the basis for any custom value types we create. It’s a generic base class that is specialised with the underlying “primitive” value type and the concrete derived “value” type:

public abstract class ValueObject<TDerived, TValue>
{
  // The underlying primitive value.
  public TValue Value { get; protected set; }

  // Create a value from a known “safe” value.
  public static TDerived FromSafeValue(TValue value)
  {
    return Creator(value);
  }

  // The factory method used to create the derived
  // value object.
  protected static Func<TValue, TDerived> Creator
      { private get; set }
  . . .
}

As you can see there is a static method called FromSafeValue() that allows the value object to be created from a trusted source of data, e.g. from a database. The existing implementation relied on reflection as a means of getting the underlying value into the base property (Value) and I wanted to see if I could do it without using reflection by getting the derived class to register a factory method with the base class during its static constructor (i.e. during “type” construction).

The Derived Class

The derived class is incredibly simple as all the equality comparisons, XML/JSON serialization, etc. is handled by the Value Object base class. All it needs to provide is a method to validate the underlying primitive value (e.g. a string, integer, date/time, etc.) and a factory method to create an instance after validation has occurred. The derived class also has a private default constructor to ensure you go through the public static methods in the base class, i.e. Parse() and FromSafeValue().

public class PhoneNumber
               : ValueObject<PhoneNumber, string>
{
  // ...Validation elided.
 
  private PhoneNumber
  { }

  static PhoneNumber()
  {
    Creator = v => new PhoneNumber { Value = v };
  }
}

The Unit Test

So far, so good. The first unit test I ran after my little refactoring was the one for ensuring the value could be created from a safe primitive value:

[Test]
public void value_can_be_created_from_a_safe_value()
{
  const string safeValue = “+44 1234 123456”;

  Assert.That(
    PhoneNumber.FromSafeValue(safeValue).Value,
    Is.EqualTo(safeValue));
}

I ran the test... and I got a NullPointerException [1]. So I set two breakpoints, one in the derived class static constructor and another in the FromSafeValue() method and watched in amazement as the method was called, but the static constructor wasn’t. It wasn’t even called at any point later.

I knew not to expect the static constructor to be called until the type was first used. However I assumed it would be when the unit test method was JIT compiled because the type name is referenced in the test and that was the first “visible” use. Consequently I was a little surprised when it wasn’t called at that point.

As I slowly stepped into the code I realised that the unit test doesn’t really reference the derived type - instead of invoking PhoneNumber.FromSafeValue() it’s actually invoking ValueObject.FromSafeValue(). But this method returns a TDerived (i.e. a PhoneNumber in the unit test example) so the type must be be needed by then, right? And if the type’s referenced then the type’s static constructor must have been called, right?

The Static Constructor Rules

I’ve only got the 3rd edition of the The C# Programming Language, but it covers generics which I thought would be important (it’s not). The rules governing the invocation of a static constructor are actually very simple - a static ctor is called once, and only once, before either of these scenarios occurs:

  • An instance of the type is created
  • A static method on the type is called

The first case definitely does not apply as it’s the factory function we are trying to register in the first place that creates the instances. The second case doesn’t apply either because the static method we are invoking is on the base class, not the derived class. Consequently there is no reason to expect the derived type’s static constructor to be called at all in my situation. It also appears that the addition of Generics hasn’t changed the rules either, except (I presume) for a tightening of the wording to use the term “closed type”.

Putting my C++ hat back on for a moment I shouldn’t really have been surprised by this. References in C# are essentially similar beasts to references and pointers in C++, and when you pass values via them in C++ you don’t need a complete class definition, you can get away with just a forward declaration [2]. This makes sense as the size of an object reference (or raw pointer) is fixed, it’s not dependent on the type of object being pointed to.

Forcing a Static Constructor to Run

This puts me in a position involving chickens and eggs. I need the derived type’s static constructor to run so that it can register a factory method with the base class, but the constructor won’t need to be invoked until the factory method is invoked to create an instance of the class. The whole point of the base class is to reduce the burden on the derived class; forcing the client to use the type directly, just so it can register the factory method, defeats the purpose of making the base class factory methods public.

A spot of googling revealed a way to force the static constructor for a type to run. Because our base class takes the derived type as one of its generic type parameters we know what it is [3] and can therefore access its metadata. The solution I opted for was to add a static constructor to the base class that forces the derived class static constructor to run.

static ValueObject()

  System.Runtime.CompilerServices.RuntimeHelpers
    .RunClassConstructor(typeof(TDerived).TypeHandle);
}

Now when the unit test runs it references the base class (via the static FromSafeValue() method) so the base class static constructor gets invoked. This therefore causes the derived class static constructor to be run manually, which then registers the derived class factory method with the base class. Now when the FromSafeValue() static method executes it accesses the Creator property and finds the callback method registered and available for it to use.

This feels like a rather dirty hack and I wonder if there is a cleaner way to ensure the derived class’s static constructor is called in a timely manner without adding a pass-through method on the derived type. The only other solutions I can think of all involve using reflection to invoke the methods directly which is pretty much back to where I started out, albeit with different take on what would be called via reflection.

 

[1] Actually I got an assertion failure because I had asserted that the Creator property was set before using it, but the outcome was essentially the same - the test failed due to the factory method not being registered.

[2] This is just the name of the class (or struct) within its namespace, e.g. namespace X { class Z; }

[3] This technique of passing the derived type to its generic base class is known as the Curiously Recurring Template Pattern (CRTP) in C++ circles but I’ve not heard it called CRGP (Curiously Recurring Generic Pattern) in the C# world.