Monday 21 January 2013

The Perils of Interactive Service Account Sessions

It’s common within the Enterprise to run the services (or daemon processes if you prefer) under a special “service account”. This account is often a domain account that has very special privileges and as such no one is generally expected to use it for anything other than running processes that are part of The System. Sometimes you might need to elevate to that account to diagnose a permissions problem, but those occasions should be very rare.

What you want to avoid doing is logging on interactively to a Windows machine using that account[1], such as remotely via MSTSC. What you should do is logon with your own credentials, or better yet those of a “break glass account” and then elevate yourself using, say, the RUNAS tool. This allows you to open a separate command prompt, or run another process under a separate set of credentials - usually the service account, e.g.

C:\> runas /user:chris@domain cmd.exe

There are various switches to control the loading of the user profile, etc. but that is the basic form. Once you have the command prompt or process open you can do your thing in a limited kind of sandbox.

The first reason for not logging in interactively is that by default Terminal Services will only let you have 2 connections open. Given that some developer’s (and admins) have a habit of leaving themselves logged in for extended periods, you invariably have to hunt down who has the connections open and ask one of them to drop out. If one or other user is logged in interactively using the service account it becomes a much harder job of finding out who “owns” that session and, as we’ll see, just toasting their session can be dangerous.

The main problem I’ve come across with logging in is down to the way scheduled tasks that are configured to run using separate credentials (in this case the service account) end up running in the interactive session (even without the “interactive” box checked). If you’ve ever had seemingly random console windows popping up whilst you’re logged i, this could be what they are. If you’re lucky the keyboard focus won’t be stolen, but if it is or you’re clicking with the mouse at the wrong time you can block the I/O to the process by accidentally enabling Quick Edit mode. Or worse yet you hit the close box as it pops into life.

You might notice those effects, but the more deadly one is logging off. If one of these scheduled tasks is running at the time you logoff, it will be killed. You might not notice it at first (especially if it gets scheduled again soon after) but the scheduled task will have a failed status and the very curious error code of 0xC000013A (application terminated by Ctrl+C).

The second issue I’ve seen relates to the service account not picking up changes in Windows AD group membership. I’ve read various chapters from Keith Brown’s excellent book Programming Windows Security (which is admittedly getting a bit long in the tooth) but can’t see why this would happen. Basically the account was removed from an AD group, and then later reinstated. However at the time the account was re-added to the group there was an interactive session on just one machine in the farm.

The other machines in the environment picked up the change, but the one with the interactive session didn’t. I could understand that an existing process might cache the group membership, but even starting new command prompts didn’t help. The scheduled tasks that were running, which were also new processes each time didn’t pick the change up either. After logging the session off and logging straight back on again everything was fine.

Maybe it was a one off, or perhaps it’s a known problem. Either way my Google Fu was clearly letting me down - that and the fact that the key words to describe the problem are about as vague as you can get in IT (group, windows, cache, etc). Hopefully some kind soul will leave a comment one day which explains my experience and brings me closure.

 

[1] I’m sure there are some edge cases to that advice, but I can’t personally remember a time when I needed to actually logon to a machine with a service account. Well, I can, but that’s because the software used to hide the passwords forced me to do it when all I needed was an elevated command prompt. That aside I haven’t.

Saturday 19 January 2013

Layered Builds

The kind of builds we do on our desktop are optimised for speed. We want that cycle of write test, write code, build, run test to be as fast as possible. In compiled languages such as C++, C# and Java  we can’t do a “clean” build every time as we’d spend most of our time waiting whilst files are deleted and the same libraries are built over and over again. The downside to this is that we can instead occasionally waste time debugging what we think is a problem with our code, only to discover that it was a bad build[1].

In contrast when we’re producing a build on the build server, no doubt via a Continuous Integration (CI) server, we want reliability and repeatability above all else. At least, we do for the build that produces the final artefacts that we’re going to publish and/or deploy to our customers.

Build Sequence

As a rule of thumb these are the high-level steps that any build (of any type of target) generally goes through:-

  1. Clean the source tree
  2. Compile the binaries
  3. Run the various test suites
  4. Package the deployment artefacts
  5. Publish the artefacts

For repeatable builds you need to start with a clean slate. If you can afford it that would be to start from an empty folder and pull the entire source tree from the version control system (VCS). However, if you’re using something like ClearCase with snapshot views[2] that could take longer than the build itself! The alternative is to write a “clean” script that runs through the source tree deleting every file not contained in the VCS, such as .obj, .pdb, .exe, etc. Naturally you have to be careful with what you delete, but at least you have the VCS as a backup whilst you develop it.

Once the slate is clean you can invoke the build to create your binaries (if your language requires them). For a database you might build it from scratch or apply deltas to a copy of the current live schema. Once that’s done you can start running your various test suites. There is a natural progression from unit tests, that have the least dependencies, through component tests to integration tests. The suite run time, which may be determined by the work required to get any dependencies in play, will be a factor in how often you run them.

With the tyres well and truly kicked you can move on to packaging the artefacts up for formal testing and/or deployment to the customer. The final step then is to formally publish the packages such as copying them to a staging area for deployment. You’ll probably also keep a copy in a safe place as a record of the actual built product.

There will of course be many other little steps like publishing symbols to your symbol server and generating documentation from the source code and this will sit in and around the other major steps. What determines how often you run these other steps might be how much grunt you have in your build server. If you’re lucky enough to have a dedicated box you might churn through as much of this as possible every time, but if all you’re allowed is a Virtual Machine, where the hardware is shared with a dozen other VMs you’ll have to pick and choose[3]. And this is where the layered builds come in.

Continuous Build

As I described earlier, your desktop build and test run will likely cut corners on the basis that you want the fastest feedback you can have whilst you’re making your changes. Once you’re ready to publish them (e.g. to an integration branch) you integrate your colleague’s latest changes, build and test again and then commit.

At this point the CI server takes over. It has more time than you do so it can wipe the slate clean, pull all the latest changes, build everything from scratch and then run various tests. The main job of the Continuous Build is to watch your back. It makes sure that you’ve checked everything in correctly and not been accidentally relying on some intermediate build state. Finally it can run some of your test suites. How many and of what sort depends on how long they take and whether any other subsystem dependencies will be in a compatible state (e.g. services/database).

The balance is that the more tests you run the longer your feedback cycle between builds (if you’ve only got a single build agent). Ideally the build server shouldn’t be a bottleneck but sadly it’s also not the kind of thing the bean counters might understand is essential. Corners you can choose to cut are, say, only doing a debug or release build and therefore only running those tests. Given that developers normally work with debug builds it makes sense to do the opposite, as that’s what you’re going to deliver in the end.

Deployment Build

Next up is the deployment build. Whereas the continuous build puts the focus on the development team, the deployment build looks towards what the external testers and customer ultimately needs. Depending on what your deliverables are you’ll probably be looking for the final piece of mind before letting the product out of your sight. That means you’ll build whatever else you missed earlier and then run the remainder of your automated tests.

At this point the system is likely to go into formal testing (or release) and so you’ll need to make sure that your audit trail is in place. That means labelling the build with a unique stamp so that any bugs reported during testing or release can be attributed to an exact revision of the source code and packages. Although you should be able to pull down the exact sources used in the build to reproduce a logic problem, you might still have to deploy the actual package to a test machine if the problem could be with the build or packaging process.

You may still choose to cut some corners at this point, or have a set of automated tests that you simply cannot run because the other necessary subsystems are not part of the same build.

Full System Build

If the entire product contains many subsystems, e.g. database, services, client, etc. you probably partition your code and build process so that you can build and deploy each subsystem independently. Once a codebase starts to settle down and the interfaces are largely fixed you can often get away with deploying just one part of the overall product to optimise your system testing.

The one thing you can’t do easily if your codebase is partitioned into large chunks is run automated tests against the other subsystems if they are not included within the same build. Each commit to an integration branch should ideally be treated as atomic, even if it crosses subsystems (e.g. database and back-end services)[4] so that both sides of the interfaces are compatible. If you’ve built each subsystem from the same revision and they all pass their own test suites then you can reliably test the connections between them. For example, the database that you’ve just built and unit tested can be reused to run the tests that check the integration between any subsystems that talk to it.

My 2012 ACCU conference presentation “Database Development Using TDD” has some slides near the end in the Continuous Integration & Deployment section that shows what this looks like.

Further Reading

Roy Osherove is currently putting together a book called Beautiful Builds and has been posting some useful build patterns on his blog.

 

[1] Ah, “Incremental Linking” and the “Edit & Continue” feature of Visual C++, now there’s something I turn off by default as it has caused me far too much gnashing of teeth in the past. OK, so it was probably fixed years ago, but just as I always turn on /W4 /WX for a new project, I make sure everything ever known to botch builds and crash VC++ is turned off too.

[2] Dynamic views aren’t suitable for repeatable builds as by their very nature they are dynamic and you can pick up unintentional changes or have to force a code freeze. With a snapshot view you get to control when to update the source tree and you can also be sure of what you’re labelling. The alternative would be to adopt a Branch For Release policy and then use due diligence (i.e. code freeze again) to not update the branch when a build is in progress. Personally that sounds a little too volatile and disruptive.

[3] I discussed this with Steve Freeman briefly at the ACCU conference a few years ago and he suggested that perhaps you should just keep performing a full build every time with the expectation that there will be some lag, but then you can always deploy the moment another build pops out. I’d like to think that everyone commits changes with the Always Be Ready to Ship mentality but I’d be less trusting of an intraday build on a volatile branch like the trunk.

[4] When the team is split along technology lines this becomes harder as you might be forced to use Feature/Task Branches to allow code sharing, or your check-ins become non-atomic as you “pass the baton”.

Tuesday 15 January 2013

Problem Domain Expert or Technical Expert or Even Both?

The technology side of software development has always fascinated me far more than The Business Problems that we’re actually trying to solve. This view I suspect is at odds with what many “experts” propose is the way we (software developers) should be. I remember a blog post by Peter Gillard-Moss a couple of years ago called “I am NOT a geek” where he suggests that the geek mentality can be bad for business. I completely agree with the sentiment about the need for a mixed workforce because I’ve seen what the opposite effect is when everyone tries to “align with the business” and it isn’t pretty either. Yes, tactical solutions exist for a reason, but once you start trying to scale that out you need techies too if you want reliability, maintainability and all those other -ilities as well.

I made the conscious decision many years ago that I would put my efforts into learning more about the technology and pay less attention to the business; which in this case has invariably been Finance. Naturally I pick up plenty by osmosis, certainly more than enough to get by on, and more importantly enough to either ask the right sort of questions or to know who to ask. But first and foremost I put my spare time into understanding more about the technology used to solve those problems. And by extension that includes understanding more about how we should go about developing those solutions efficiently.

When I talk to an agent or have a telephone interview I make it quite apparent that I don’t know much about “Finance” per-se and that if what they want is someone who understands the subtitles of all the various instruments, then that’s not me. But, if what they have is a lack of source control, a creaking build system, deadlocks, livelocks, memory leaks, Singletons, poor performance and nasty threading problems then just maybe I am their guy. Of course what they really want is both - an expert in the problem & technical domains[1]. I’ve yet to work with (or interview) anyone who meets that criteria in full. Maybe they exist, I know of a few people that might come close, but as a rule of thumb I find developers sit somewhere on the scale between technical and business excellence.

Sometimes I do question whether I should know more about The Business, especially when there is a problem I don’t immediately see because I don’t know enough to realise something is obviously wrong. Many years ago I was asked to look into a pricing calculation difference and I spent hours single stepping through code side-by-side, and when I asked someone to check over what I found he told me instantly that some market data was rubbish. I always knew I was the wrong man for the task when given it, but perhaps there was no one left to look into it. I still feel my time would have been way better spent fixing stuff I knew how to fix, and it wasn’t like there wasn’t plenty of that to do. I also got to reciprocate with my fellow developer later and help him fix his access violations once we started pushing his code through the heavily COM/multi-threaded trade grinder.

In contrast the other day I spotted that many trades were being excluded because we were missing some market data. Although I can’t name every currency, I do know enough of the major and minor ones to know that this didn’t feel right. A minority currency like the one missing probably shouldn’t be having the kind of effect it was and so I passed it off to the analytics team to see if something was up. There was indeed a bug hiding in there. Interestingly the reason the business didn’t spot it is because the trades were “unimportant” to them.

In this kind of scenario a less business orientated mind can actually be beneficial. For instance where I have strived to ensure that no difference means exactly that (to the nth decimal place), others have taken a looser attitude because they know that in the grand scheme of things the odd number slightly out of place here and there will probably be “acceptable”. The same goes for extrapolating reliability problems - an access violation, out-of-memory condition or timeout to me is a problem just waiting to “jump the cracks” in 64-bit land and take out an entire blade or even create a Denial of Service (DoS).

The argument about Generalists vs Specialists will no doubt go on forever. I like the idea of being a (generalising?) specialist and accept that my openness about not wanting to be a business expert too will reduce the pool of work available to me. But I’m happy with that, I’d rather not pretend to be I’m something I’m not. My CV purposely lists only a handful of skills, the interviewer can either assume that I know VBScript because it’s just “one of those things a guy like that knows” or he can assume I have a very narrow skill set. Conversely the person I want to work for knows how to answer that question by reading about what type of work I’ve done before, perhaps along with perusing my blog and free software tools.

Given that the complexity of the software systems we’re building is only getting larger and not smaller, I think we’re going to need plenty of people who work in the problem domain of "making it possible for the other developers who work in the business domain, to work”.

 

[1] And preferably someone with interpersonal skills too, but they’d probably take a recluse if they had two out of three.

Monday 7 January 2013

Logging & Redirection With PowerShell

My relationship with PowerShell got off to a pretty rocky start. After going through all the noddy examples and running a few one-liners that pulled data from remote machines via WMI I felt I was ready to tackle some Real World Problems. Given its promotion as a replacement for CMD.EXE it seemed wholly appropriate that I pick some of our batch files and convert them to PowerShell scripts to see whether you can convert them relatively easily. Sadly it seems that much like the Intel debacle with the 80286, the designers of PowerShell only expected you to go one way…

Batch File Logging

It’s a common pattern in batch files (i.e. .bat & .cmd files) to write progress and error messages out as you’re going along. So you often see this kind of pattern:-

@echo off
. . .
echo Starting something lengthy...
myprocess.exe
if errorlevel 1 (
  echo ERROR: Bad stuff happened!
  exit /b 1
)

In PowerShell ECHO is just an alias for the cmdlet write-output. Hence in PowerShell you could write something pretty similar:-

echo Starting something lengthy...
myprocess.exe
if ($lastexitcode -ne 0)
{
  echo ERROR: Bad stuff happened!
  exit 1
}

OK, so this isn’t idiomatic PowerShell as you can simplify the error handling. But that’s not what we’re interested in - it’s the logging. If you run this what you’ll find is that the each word is written on a separate line:-

C:\> PowerShell -File test.ps1
Starting
something
lengthy...

Oops! That alias doesn’t exactly work because PowerShell treats each word as a separate argument instead of a single line of text, so we need to surround the entire message in quotes:-

echo ‘Starting something lengthy...’
myprocess.exe
if ($lastexitcode -ne 0)
{
  echo ‘ERROR: Bad stuff happened!’
  exit 1
}

Now let’s try moving that into a function. If you’ve never used functions in batch files before you basically just invoke CALL with a label and use GOTO :EOF to return. This earlier blog post shows you the template I use for batch files which has functions to handle the usage message. Here’s our batch file again rewritten to use a function instead:-

@echo off
. . .
call :do_stuff %*
. . .
exit /b 0

:do_stuff
echo Starting something lengthy...
myprocess.exe
if errorlevel 1 (
  echo ERROR: Bad stuff happened!
  exit /b 1
)
goto :eof

The ECHO still works exactly as before. But when we do that in PowerShell things start to go a bit awry because the output from write-output does not go to the console, but it becomes the output from the function instead:-

function do-stuff()
{
  echo ‘Starting something lengthy...’ 
  myprocess.exe
  if ($lastexitcode -ne 0)
  {
    echo ‘ERROR: Bad stuff happened!’
    exit 1
  }
}

do-stuff

So long as you don’t want to capture the function output (e.g. using $result = do-stuff) it will go to the console. But the moment you want to start writing real functions and you also want diagnostic logging too you have to change tact. To be fair you can’t write “proper” functions in batch files anyway and so the conversion so far has been pretty good; although I personally think the choice of an alias for ECHO was possibly a bad one.

The designers of PowerShell probably thought that the moment you want to do non-batch-file stuff you should start using the correct PowerShell cmdlets - write-host, write-error, etc. But that doesn’t quite work as seamlessly as you’d hope either.

What Platform Is This?

Pop quiz. What is/are the line termination character(s) in Windows? Everyone knows this, it’s CR/LF, or a carriage return followed by a line-feed. But not in PowerShell it seems. The obvious replacement for write-output is to log with write-host, after all, that writes directly to the console. So, try this:-

C:\> PowerShell "write-host hello; write-host world" > stdout.log
C:\> more stdout.log
C:\> type stdout.log
C:\> notepad stdout.log

I’m sure you can’t be bothered to run this so let me show you the output from MORE and TYPE:-

C:\> more stdout.log
hello
world

Of course you get two separate lines. Why, what else did you expect? In notepad though you’ll see this instead:-

helloworld

Put your hand up if your default log file viewer is notepad? On a developer’s workstation you probably have a plethora of text editors, “tailers” and Unix-like tools. But on production Windows hardware it’s almost certainly just notepad. And you’d be surprised how many sysadmins I’ve seen that think nothing of loading a multi-megabyte log file by double-clicking it and firing it up in notepad[1].

If you look at the redirected text file it doesn’t have the expected dual terminators (0x0D, 0x0A), just the single Unix line ending (0x0A). OK, so many text processing tools are aware of the differences between line-endings on Windows and Unix, but as Notepad shows some also still aren’t. This means that using write-host directly isn’t quite enough, what we need to do is manually output the correct line-ending. The following function called “write-line” does exactly that:-

function write-line ($line)
{
  write-host -nonewline "$line`r`n"
}

This function uses write-host and so it also works perfectly well from any functions you write without interfering with any return value. The only quibble might be with the name as Write-LogMessage might perhaps be more in keeping with the PowerShell style.

Truncated Log Output

The final problem I ran into has now been documented fairly well in other blog posts and on Stack Overflow and concerns the truncation of log message output when redirecting PowerShell output with CMD.EXE. When developing batch files, such as for a build process, you generally see the output in the console window which is what you want. But the moment you automate it, such as with the Windows Task Scheduler you need to capture all that output in case something goes wrong. In essence what you run is something like this:-

C:\Dev> BuildAll.cmd > C:\Logs\BuildAll.out.log 2> C:\Logs\BuildAll.err.log

If your BuildAll.cmd script generates a long log message like this:-

@echo off
echo A very long line of text that will not wrap because CMD.EXE pays no attention to the console window width when being redirected

The captured log file will look as expected with CMD.EXE - all the message is on a single line. Now, if we convert that batch file to PowerShell and capture the output again things don’t look so good:-

echo ‘A very long line of text that will likely wrap because PowerShell will use the console window width even when being redirected’

If you open the log file in notepad you’ll see the message has been truncated to the width of the (almost invisible) console window that was used to invoke the script. Here is how the redirected text is wrapped on a small console window:-

A very long line of text that will
likely wrap because PowerShell will
use the console window width even
when being redirected

This will happen with the write-output cmdlet, but doesn’t with write-host. So, if we instead use our new “write-line” function we’re all peachy again:-

function write-line ($line)
{
  write-host -nonewline "$line`r`n"
}

write-line ‘A very long line of text that won’t wrap because PowerShell will ignore the console window width when using write-host’

If you’ve looked into the other solutions to this particular problem you’ll see that they suggest you play with the UI.RawUI.BufferSize setting. When I first tried that it failed because the console window I was using had a height of 100 lines and trying to set it to only 50 failed. Personally I dislike both workarounds but mine does feel marginally “less dirty” to me.

Redirection is the Caller’s Responsibility

One answer to the complaint about the redirection issue is the use of the start-transcript cmdlet. I don’t agree with this answer because I believe it’s the caller of the script that should determine when and where to redirect any output. The power of the Unix tools such as TEE comes from the very fact that they are simple and that you use composition to achieve your goal. PowerShell is a command line tool that tries, and largely succeeds, in exploiting this concept within its language, but sometimes fails to be composed itself in the same way.

 

[1] Naturally I try to educate them into using slightly more efficient tools, after all it’s probably more painful for me to watch than it is for them to do.