Thursday, 20 March 2014

Finding an Alternative to Is- and Has- in PowerShell

I’ve been playing with PowerShell script modules as a way of factoring out a load of boilerplate PowerShell code into a re-usable library. These functions generally revolve around error handling, logging and command-line parsing at the moment. One issue I quickly ran into was the Import-Module call failing with a warning about badly named functions:-

WARNING: Some imported command names include unapproved verbs which might make them less discoverable. Use the Verbose parameter for more detail or type Get-Verb to see the list of approved verbs.

Whilst I was sort of aware that there were a standard set of verbs in PowerShell I haven’t really paid that much attention to the practice as most of my scripts are standalone. Naturally the usage of a PowerShell module implies code-sharing and so I can see why the warning exists. It also forces me to re-evaluate my laziness.

Is- and Has- Verbs

I started by Googling what the standard verbs were and discovered that in a few cases they explicitly provide alternatives (e.g. Search instead of Locate), but there was no PowerShell Thesaurus to help me understand what I should be using in place of the Is-Xxx and Has-Xxx functions I’d written. As an example here is the original function I wrote for detecting if the user has provided a specific command line switch:-

function Is-SwitchSet($options, $switch)
{
    if ( $options | where { $_ -eq $switch }  )
    { $true } else { $false }
}

if (Is-SwitchSet($args, ‘--help’))
{
    . . .
}

I originally played around with the naming and ended up with Has-Switch as that seemed shorter and closer, but I was still aware it wasn’t right because at the time I didn’t know of any other cmdlets that started Is- or Has-.

Noun-Verb versus Verb-Noun

As an aside I think what makes this more difficult is that the naming convention in PowerShell is the reverse of what you’d do in an object-orientated language like C++ or C#. Given that I’ve been doing that for close to 20 years its a hard habit to break. For example, in PowerShell, I always want to write Path-Exists, which would became Exists-Path, which is wrong (on many levels) but also exactly how you’d write it in a batch file.

C# PowerShell
Path.Combine Join-Path
Path.Exists Test-Path

Set Membership

I started looking for the closest verb that I felt would be appropriate by scanning the list from Get-Verb. I was expecting to see something in the Data category related to set membership - some kind of verb relating specifically to existence / containment but I didn’t see anything. There is Find, Search and Get but all these return an object and I just want to return a boolean value indicating its presence or absence. In the Lifecycle category there is Assert and Confirm, but Assert-SwitchExists or Confirm-SwitchExists doesn’t sound right either.

Obviously I noticed early on that the equivalent of the Path.Exists() method was the Test-Path cmdlet. It seems like the only candidate that might be expected to provide a boolean response but its categorisation in the Diagnostics section makes it feel more heavyweight to me than a simple existence check in a container. I know these are only guidelines but if the point of standardising the verbs is to make them more discoverable then I don’t see Test-Switch really works and Test-SwitchExists just feels like a tautology.

Conclusion

In the end I felt dissatisfied with all the alternatives and so I just appended a “-WarningAction SilentlyContinue” to the Import-Module statement to make the warning go away, for now. I’m sure I’m missing something obvious here and I just need to play with the (English*) language a bit more, or perhaps I’m not “getting” the (PowerShell) language and should just expect the user to rely on a boolean conversion from the result of using Where-Object?

[*] Despite it supposedly being my native language I’m under no illusion that I know how to weald it correctly.

Thursday, 9 January 2014

Cleaning the Workspace

I have a habit - I like a clean workspace. No, I don’t mean my office desk or worktop in the kitchen but the workspace where I make my code changes. In ClearCase it was called a view and in Subversion the working copy, i.e. the place where you edit your code, merge, build it, run tests, etc.

Whilst I’m making the edits I don’t mind what state it’s in. But, when it comes time to integrate my colleague’s changes - merge, build and run the smoke tests before committing my own - I want the workspace spotless. I don’t want to see any residual files or folders lying around that might have the effect of tainting the codebase and giving me that “well, it works on my machine” excuse. When I’ve pushed my changes I fully expect the build machine to remain green and my colleagues to suffer no unnecessary interruption.

Back at the start of last year I wrote a post called “Layered Builds” that had the following outline of what I’d expect a Continuous Integration process to use:-

“For repeatable builds you need to start with a clean slate. If you can afford to that would be to start from an empty folder and pull the entire source tree from the version control system (VCS). However, if you’re using something like ClearCase with snapshot views, that could take longer than the build itself! The alternative is to write a “clean” script that runs through the source tree deleting every file not contained in the VCS repo, such as the .obj, .pdb, .exe, etc. files. Naturally you have to be careful with what you delete, but at least you have the VCS repo as a backup whilst you develop it.”

This “clean script” that I’m referring to is nothing more than a batch file that does a bunch of deletes of files and folders, e.g.

del /s *.ncb 2> nul
del /s *.obj 2> nul
del /s *.pdb 2> nul
. . .
for /d /r %d in (obj) do rmdir %d 2> nul
for /d /r %d in (bin) do rmdir %d 2> nul

For a more concrete example look at the script contained in my Build Scripts (or directly on GitHub).

Note that whilst I might do a recursive delete (del /s) of a build artefact like a .obj file, I almost never do a recursive delete of the artefact folders. This ensures I only clean up the actual build output and not any log files, batch files, test results or other stuff I might have created as part of my ongoing changes.

Build Detritus

My current project uses Jenkins and MSBuild, neither of which I’d used before, so someone else set up the skeleton of the build. Although the initial steps blew away the output folders, anyone who has ever used Visual Studio will know that its idea of “clean” falls way short of “spotless”. It caches a lot of data such as metadata like the .ncb files that Visual C++ uses for IntelliSense, intermediate build data like the header files generated via the #import statement right up to entire 3rd party packages pulled from NuGet. None of this stuff gets blown away if you do a “Clean Solution” from Visual Studio.

Of course the metadata, like where my breakpoints point to (.suo) and the IntelliSense data (.ncb) should have absolutely no effect on the compiled binary artefacts. However a product like Visual Studio is a complex beast and it’s not always apparent what detritus contains build dependent data and what doesn’t. Sometimes “stuff” just gets corrupted and you need to blow the lot away and start again [1]. So instead I err on the side of caution and try and remove as much as possible as often as possible without adversely affecting my productivity. Given the modern preference for short feedback loops it turns out there is very little I ever carry over from one integration to the next; except perhaps the occasional breakpoint or debugging command line.

Old Habits Die Hard

Like all habits you need to remind yourself why you’re doing it every now and again to ensure it still has value - perhaps all those weird build problems are now a thing of the past and I don’t need to do it anymore. Depending on your point of view, the good thing about being someone that religiously cleans their workspace before every integration means that you stand a good chance of being the one that finds the build problem the others didn’t know existed. On my current project the build machine also turned out to be ignorant…

One problem that I kept hitting early on was incorrectly configured NuGet package dependencies. ReSharper has this wonderful feature where it will automatically add the assembly reference and namespace “using” declaration when you start writing some code or paste in code from another assembly. The problem is that it’s not clever enough - it doesn’t fix up the NuGet package configuration when the reference is to a NuGet sourced assembly. Consequently other developers would let ReSharper fix up their build, but when I used my clean script and blew away the NuGet package cache the build might then fail as the assembly could be earlier in the build order and the package might not have been cached at that point. Another problem was the C# code analysis step failing with a weird error if you cut-and-pasted the build configuration and ended up pointing to a stale analysis output file that then doesn’t exist the moment you clean up after yourself.

Continuous Integration

The build machine should have been the one catching any problems, but it was never configured to wipe the slate clean before every build. Given that we were using Jenkins, which is pretty mature and has a mountain of plug-ins, I investigated what the options were. There is a plug-in to clean the workspace before running a job that I’ve since started using on all the other non-Git-repo related jobs. This picked up a couple of minor issues where we had been relying on one deployment overwriting another. We’d also been incorrectly analyzing a stale TestResults.xml file from the unit test run which nobody had spotted.

This plug-in works great except when the cost of recreating the workspace is high, such as in the case of the initial compilation step. The “master” Git repo we’re using is hosted remotely and has of late been suffering from a number of DDoS attacks [2]. That, coupled with its ever growing size means it takes some time to keep cloning the entire repo on every build. Consequently we now have our own step at the start of the build that runs our clean script instead. Yes, as I discovered later, I could have used “git clean -x -d -f” [3] to do the same job; however it has always been on the cards that we would move to another VCS where this won’t be an option so I maintain the script instead.

Clean or Spotless?

This need to distinguish between merely “clean”, i.e. all build artefacts and non-user state removed, and “totally spotless”, where it’s the same as if you’d just cloned the repo, helps avoid some other ugly side-effects caused by auto-generated files.

Visual C# lacks the ability to mark a .cs file as generated, which is exactly what we do to inject the build number via a AssemblyVersionInfo.cs file. As a consequence when you open Visual Studio you get little warning symbols on all these files telling you the file is missing, which is irritating. More annoying though is the popup dialog we get every time we open the solution when the web.config is missing because that’s also auto-generated.

This means that some files I’d prefer to be cleaned up by default are now only purged if the “--all” switch is specified.

Habit Justified?

Like many of the habits programmers adopt, the problems this particular one unearths would likely be unearthed later anyway. The question is how much later and consequently how much longer would it take to diagnose and fix when it does finally appear? The longer the time between a mistake and its discovery the harder it can become to pinpoint. This is especially true when you didn’t make the change yourself and you have a mental attitude of always blaming your own recent changes first rather than assuming someone else is at fault.

I still don’t think it costs me much time and ultimately the team wins which is what I think is more important.

 

[1] This used to be a common problem with Visual C++. If Visual C++ started crashing randomly, then binning the .ncb and other cached data files usually did the trick. I eventually found that Visual C++ was generally pretty solid if you disabled the Navigation Bar at the top of the source files and never touched the Class Wizard. Fortunately things are much better these days (VC++ 9.0 was the last version I used on a daily basis).

[2] The beauty of Git is that, in theory, if the remote repo is down for any length of time we could just clone a repo on the build server and treat that as our new master until the remote one is back. Or we could just push changes between ourselves if the need really arose. In practice outages of any kind (local or remote) can been counted in hours which even The Facebook Generation can cope with.

[3] This is the first time I’ve used Git and historically the version control systems I’ve used didn’t even allow you to ignore files (PVCS, SourceSafe, ClearCase) let alone provide support for resetting your workspace!

Wednesday, 4 December 2013

Missing HOMEDRIVE and HOMEPATH Variables Under Jenkins

TL;DR - If you’re looking for the reason why the variables are AWOL then I can’t help, but if you want to know what I tried and how I worked around my specific problem, then read on…

Back when we first set up Jenkins I soon noticed that all the files in the workspace had LF style line endings instead of CR/LF as you would expect on Windows; this was despite following the advice in the often-cited GitHub page to set things up correctly. I’d managed to live with it to date but now we were suddenly involved in a tactical project and were deploying to staging & production where trying to edit application .config files on servers that only had Notepad was like pulling teeth. I really needed to sort it out once and for all.

1st Attempt

The last time I tried to fix this I logged into the Jenkins build server, fired up a command prompt (as the Jenkins user account using RUNAS) and then executed “git config --global core.autocrlf true”. For good measure I also wiped the Jenkins workspace to ensure that I’d get a freshly cloned repo when the build job next ran. It didn’t work and I’d already put a few hours into the problem so had to put my spade down and go do some work instead.

2nd Attempt

Coming to the problem once again all fresh and reinvigorated some months later I surmised that the problem must be down to the Jenkins build account not picking up the --global scoped setting. After a quick recap on the differences between --system, --global and --local and where the files were stored, my next guess was that the account might be being used without a profile. So I checked again, this time using the /noprofile switch with RUNAS:-

C:\> runas /noprofile /user:DOMAIN\BuildAccount cmd.exe
<new command prompt>
E:\Jenkins> git config --config core.autocrlf
true

The setting was exactly as I had left it last time. The next stop was to inject an extra build step into the Jenkins job and see what it thought was going on. For good measure I dumped all the different scoped values to see what exactly was set and where:-

E:\Jenkins> C:\Program Files (x86)\git\bin\git.exe config --local core.autocrlf
(blank)
E:\Jenkins> C:\Program Files (x86)\git\bin\git.exe config --global core.autocrlf
(blank)
E:\Jenkins> C:\Program Files (x86)\git\bin\git.exe config --system core.autocrlf
false

Suspicion confirmed - the Jenkins job doesn’t see the setting. But why?

I started reading a little more closely about where the .gitconfig file would be picked up from and used “git config --global --edit” to see what path the editor [1] had loaded the file from. Sure enough, from my command prompt it loaded the correct file, although the HOMEDRIVE was set to C: and HOMEPATH was set to \Windows\system32 which seemed a little odd. The USERPROFILE variable on the other hand pointed correctly to the profile folder, not that it’s used by Git, but it was a useful check.

So I decided to just go ahead and set the autocrlf setting via the Jenkins job, hoping that at least it would stick even if I didn’t know at that moment where the setting would end up. To my surprise I got the following weird error:-

E:\Jenkins> "c:\Program Files (x86)\Git\bin\git.exe" config --global core.autocrlf true
error: could not lock config file (null)/(null)/.gitconfig: No such file or directory

I didn’t twig at first what the error in the path was telling me so naturally I Googled it. I got a hit that was related to Jenkins (JENKINS-19249) and was quite excited. When I read the issue I found that it was a similar problem, superficially, but there were no comments; not even so much as a “me too!” to at least show me I wasn’t the only one. I did a bit more Googling about how the path to the --global .gitconfig file is derived and it dawned on me what might be happening, so I dump out all the environment variables the Jenkins job sees with a call to SET.

Guess what - the HOMEDRIVE and HOMEPATH variables are not set. The way Git forms the path is with %HOMEDRIVE%/%HOMEPATH%. In C/C++ if you printf(“%s”, NULL); by accident you’ll often see the value “(null)” instead of it crashing [2] - hence there is one “(null)” for the HOMEDRIVE and another “(null)” for the HOMEPATH. For what it’s worth the USERPROFILE variable was still set correctly.

Solving My Problem

Ultimately I just wanted the line endings to be correct and so I took the rather heavy handed approach and used “git config --system core.autocrlf true” from an elevated command prompt as it meant altering the gitconfig file in the C:\Program Files (x86)\git\etc folder. Actually I forgot to use an elevated command prompt first time around and the file-system redirection added back in Vista kicked in and it wrote to a per-user virtualised version of the gitconfig file instead, doh!

I don’t particularly like this solution but I’m modifying the build server which is a carefully controlled environment anyway and it’s highly unlikely to be used for mixing-and-matching Git repos with different line endings.

But Why?

As I mentioned right back at the very start - I don’t know why the variables aren’t set. At first I assumed it might have been down to the way the user profile might have been loaded as Jenkins runs as a service, but I couldn’t find anything in the MSDN Knowledge Base that suggested why this might happen.

Failing that it might be a Jenkins issue. It’s a CI server and so perhaps this is done on purpose to ensure builds are user agnostic or something. If that was the case though I’d expect there to be a very popular StackOverflow post asking why their variables have been silently nobbled, but I didn’t stumble across one.

As always, once you know the answer, Googling for it is so much easier. As a Jenkins newbie too I’ll probably find out what’s really going on as I get into it more.

 

[1] Watch out for this as it’ll use VI as the editor by default. Luckily the one key sequence I can still remember after 25 years is to how exit it!

[2] I’m guessing this behaviour isn’t defined by the C/C++ standard but a common implementation choice?

Tuesday, 26 November 2013

if errorlevel 1 vs if %errorlevel% neq 0

The problem with having used batch files since the days of DOS is that certain constructs are just way too embedded in your brain. For example the way I check whether a command in a batch file has failed is by using the special “errorlevel” version of the “if” statement:-

ExternalProgram.exe
if errorlevel 1 exit /b 1

This style of “if” says that if the exit code from the last run program run was greater than or equal to 1 the script should terminate, with an exit code of 1. The general convention for exit codes is that 0 means success and anything else just means “something else happened” [1] and it’s entirely dependent on the program. For example the ROBOCOPY utility has a variety of different codes that may or may not be an error depending on what you asked it to do. The MSI installer (MSIEXEC.EXE) is another program that has a few exit codes that you soon get to know if you’re trying to script a deployment process, e.g.

msiexec /x <GUID>
if %errorlevel% equ 1605 goto :not_installed

This form of “if” statement (with a variable called errorlevel) is the newer form that was introduced in Windows NT (I think) and it allows you to do an equality comparison with a single exit code, which was less than intuitive before. This form is also required when you have anti-social processes that return negative exit codes [2]. In fact the earlier form should probably be considered defunct (if only my muscle memory would let go) and the newer form used by default instead:-

ExternalProgram.exe
if %errorlevel% neq 0 exit /b %errorlevel%

If you can’t remember what the operators are use “C:\> HELP IF” to list them [3].

[1] C & C++ programmers will of course already be used to using the macros EXIT_SUCCESS and EXIT_FAILURE from <stdlib.h>. I don’t think .Net has an equivalent and so I often end up creating a class called ExitCode with the same two constants.

[2] Yes SCHTASKS (the command line tool for managing scheduled tasks) I’m looking at you. The .Net runtime can also chuck out negative exit codes if something really serious goes wrong, e.g. the process fails to start with an out of memory or assembly loading problem.

[3] They’re different from PowerShell too which is a shame.

Logging Stack Traces Should be Unnecessary

I like a nice clean log file. The left hand margin should be a fixed width and easy on the eye so that my built-in pattern recognition gets to work as effectively as possible. It should also be easy to parse with the age old UNIX command line tools (and LogParser) without me having to pre-process the file first to get shot of the noise.

What really messes with this is when I see stack traces in the log file. They are always huge and contain far too much duplication because modern software design principles suggest we write short, simple methods, with overloads and reuse by forwarding calls instead of cutting-and-pasting functionality:-

. . .
void DoSomething(string[], int, int)
void DoSomething(string[])
void DoSomething(IEnumerable<string>)
. . .

So, seriously, has that level of detail ever enabled you to solve a problem? Without knowing what the parameter values are how much do stack traces even tell you? Agreed, if all you’ve got is a crash dump then a stack trace is invaluable, but I’m talking about logging stack traces which by definition means that you’re already writing other diagnostic information too.

Design Smell?

I’ve always favoured keeping stack traces out of log files on the basis that they are of little value in comparison to other techniques, and so far I’ve found that I don’t miss them. In my experience, if the design of the code is right and the error message (e.g. exception message) is well written it should be fairly easy to reason about where in the code the problem is, which is effectively what a stack traces tells you. In short that means a simple GREP on the source code to find where the message is generated.

You might argue that a stack trace tells you that up front so why make more effort than necessary, which is of course true, but you’re also going to need the context, which a stack trace will not tell you unless it logs all its parameter values too. And for that we need to visit the log file, and if we’re going to do that how much time and effort are we really saving at the cost of extra background noise? More importantly this is the moment when the quality of our log message entries will begin to shine or we find ourselves lost in the trees looking for the wood. Hopefully during development you’ve already been dog-fooding your own logs to get a feel for how well you can solve real support problems when using them.

Test Infrastructure

The second part of the puzzle of avoiding needing all this gratuitous text is an ability to reproduce the problem easily within a debugger. Hopefully from the context you should be able to explore the problem in isolation - varying different inputs to see how the code is reacting. If the design is simple you should easily be able to step through an existing test case and see where the points of trouble might be, e.g. some missing or dodgy error handling.

At this stage, while the overarching goal is to fix the problem at hand, the fact that a bug has crept in means that the development process has failed to some degree and therefore I’d be taking this as an opportunity to compensate by doing a review. It’s likely I won’t action anything there-and-then, instead favouring to make some notes so that any eventual action can be triaged and prioritised separately.

Depending on the complexity of the system, this is the point at which I might rely on any custom tooling I’ve built to help isolate certain aspects of the system so that they can be exercised in a tightly controlled and deterministic environment, e.g. console app test harness that hosts the various services in-process.

Minimal Traces

What I despise most about many stack traces I see in forum posts is the sheer volume of noise. There is clearly some value in them, more so for “generic” unhandled exceptions like a NullReferenceException that have no message, but do they have to extend to reams and reams of text? When I log an exception I write the entire exception chain with both the type and the message; all on a single line. This is done using an extension method for the Exception type in C#. The same could easily be done for the stack trace, the only reason I haven’t done it is because I haven’t needed to, yet. But if I did write one what would it do?

Firstly I’d strip away all the arguments as they are fairly meaningless without their values. I’d also collapse all overloads into a single method as forwarding calls are uninteresting too. The bottom part of any stack trace is a load of boilerplate system code, so I’d chuck that away and make the entry point to my code the first interesting point of reference, which I should be able to find because assembly names and namespaces tend to follow a corporate convention. The same is probably true for the top of the stack, but the very first frame may be meaningful so perhaps I’d keep that, although if I had to keep just one method name it would be the last method of my code I’d keep as that is the first point that has any real meaning. Finally, I’d rotate what’s left and stitch it together with pipes, probably something like this (ignoring the unfortunate word-wrap):-

2001-01-01 ERR Unhandled exception - OrderController.GetOrder|>ProductService.FindProduct {NullReferenceException}

I definitely don’t need the full namespace names, just the class and method, although I’d argue that with decent method names even the classes might easily be inferred from just the method name and context. Doesn’t that look a whole lot less noisy?

Whilst I might not convince you to drop stack traces entirely from your logs, at least entertain the idea that you can represent them in a far more support-friendly fashion than what the runtime throws out by default.

Friday, 22 November 2013

The 3 Faces of PowerShell Collections - 0, 1 & Many

There is a classic rule of thumb in programming that says there are only  three useful numbers - zero, one and many. I’ve found this concept very useful when writing tests as code that deals with collections or item counts sometimes need to handle these 3 cases in different ways. As a simple example imagine generating a log message about how many items you’re going to process. The lazy approach would be to just print the number and append a “(s)” to the noun to make it appear as though you’ve made an effort:-

Found 2 file(s) to process…

If you wanted to spell it out properly you’d write 3 separate messages:-

  1. No files need processing
  2. Found 1 file to process…
  3. Found 2 files to process…

A PowerShell Gotcha

This idea of 0, 1 & many is also the way I remember how PowerShell collections work when they are returned from a cmdlet. I was reminded of this idiom once again after debugging a colleague’s script that was failing because they had written this:-

$items = Get-SomeItems . . .

if ($items.Count -gt 0) {
. . .

For those not well versed in PowerShell this kind of construct will generate an error when no item or just 1 item is returned. The error will tell you “Count” is not a property on the variable - something like this in fact:-

Property 'Count' cannot be found on this object. Make sure that it exists.
At line:1 char:55
+ . . .
    + CategoryInfo : InvalidOperation: . . . 
    + FullyQualifiedErrorId : PropertyNotFoundStrict

You won’t see this error unless you have Strict Mode turned on (hence the PropertyNotFoundStrict in the error message). For one-liners this might be acceptable, but when I’m writing a production grade PowerShell script I always start it with these two lines (plus a few others that I covered in “PowerShell, Throwing Exceptions & Exit Codes”):-

Set-StrictMode -Version Latest
$ErrorActionPreference="stop"

For those used to the family of Visual Basic languages the former is akin to the “Option Explicit” statement you probably learned to add after misspelling variables names a few times and then scratched your head as you tried to work out what on earth was going on.

PowerShell Collections

To help illustrate these three manifestations of a collection you might come across we can create 3 folders - an empty one, one with a single file and one with many files [1]:-

PS C:\temp> mkdir Empty | Out-Null
PS C:\temp> mkdir Single | Out-Null
PS C:\temp> echo single > .\Single\one-file.txt
PS C:\temp> mkdir Many | Out-Null
PS C:\temp> echo many > .\Many\1st-file.txt
PS C:\temp> echo many > .\Many\2nd-file.txt

Now, using Get-ChildItem we can explore what happens by invoking the GetType() method in the resulting value from the cmdlet to see exactly what we’re getting [2]:-

PS> $items = Get-ChildItem Empty; $items.GetType()
You cannot call a method on a null-valued expression.

PS> $items = Get-ChildItem Single; $items.GetType()
IsPublic IsSerial Name     BaseType
-------- -------- ----     --------
True     True     FileInfo System.IO.FileSystemInfo

PS> $items = Get-ChildItem Many; $items.GetType()
IsPublic IsSerial Name     BaseType
-------- -------- ----     --------
True     True     Object[] System.Array

As you can see in the first case we get a null reference, or in PowerShell terms, a $null. In the second case we get a single item of the expected type, and in the third an array of objects. Only the final type, the array, will have a property called “Count” on it. Curiously enough, as you might have deduced from earlier, you don’t get a warning about a “null-valued expression” if you try and access the missing Count property on a $null value, you get the “invalid property” error instead:-

PS C:\temp> $null.Count
Property 'Count' cannot be found on this object. Make sure that it exists.

Forcing a ‘Many’ Result

The idiomatic way to deal with this in PowerShell is not to try and do it in the first place. It is expected that you will just create a pipeline and pass the objects along from one stage to the next letting the PowerShell machinery hide this idiosyncrasy for you:-

PS C:\temp> Get-ChildItem Empty | Measure-Object |
            Select Count
Count
-----
    0

However, if you do need to store the result in a variable and then act on it directly [3] you’ll want to ensure that the variable definitely contains a collection. And to do that you wrap the expression in “@(...)”, like so:-

PS> $items = @(Get-ChildItem Empty);
    Write-Output $items.Count
0
PS> $items = @(Get-ChildItem Single);
    Write-Output $items.Count
1
PS> $items = @(Get-ChildItem Many);
    Write-Output $items.Count
2

 

[1] Apologies for the old-skool syntax; I still work with a lot with batch files and the PowerShell syntax for creating directories just hasn’t bedded in yet. The blatant use of ECHO instead of Write-Output was me just being perversely consistent.

[2] Whilst Get-Member is the usual tool for inspecting the details of objects coming through a pipeline it will hide the different between a single value and a collection of values.

[3] For example diagnostic logging, which I tackled in “Logging & Redirection With PowerShell”.

Wednesday, 13 November 2013

Self-Terminating From Inside a TCL Script DDE Handler

Like all good bugs, the one you discover is not always the one that is actually causing the real problem the user is reporting. In my last post “DDE XTYP_EXECUTE Command Corruption” I described a problem I ran into when sending a DDE XTYP_EXECUTE message from a Unicode server to an ANSI client. Whilst this had became a problem, it turned out this wasn’t the actual problem that the user had originally reported.

Hanging on Exit

The real problem the user was experiencing was that when they sent a DDE command to their TCL script to terminate itself, the calling script which was written in VBScript was timing out with a DMLERR_EXECACKTIMEOUT error. What made things more curious was that the user had managed to find a workaround using another DDE tool (a command line application) that did seem to terminate the TCL script without generating an error.

Although I knew nothing about TCL at all at that point, my spider-senses were already tingling when I saw this bit of the TCL script in their email:-

proc TclDdeServerHandler {args} {
  . . .
  switch -exact -- $cmd {
    . . .
    exit {
      exit
    }
  }
}

The code path for the “exit” command was causing the TCL script to terminate whilst still instead the DDE handler. Although it may not actually be a bad thing to do in a script I’ve always tended to try and let any stacks unwind before terminating a process or script to make sure the “runtime” remains in “a good place”. Maybe I’ve just spent too many hours porting native libraries that use “exit()” instead of “return” as an error handling strategy [1].

I raised this as a concern, but given the other developer knew TCL and I didn’t I was happy to accept their answer that this wasn’t an issue.

My First TCL Script

After taking a crash course in TCL, which really just involved hacking around the script I’d already been given, I managed to create a simple one that acted as a trivial DDE server to print a popular message:-

proc RunServer {} { 
  package require dde
  dde servername TestTopic
}

proc SayHello {} { 
  puts "Hello world"
}

RunServer
vwait forever

I ran this using TCLSH.EXE MyScript.tcl and poked it remotely using a similar nugget of VBScript:-

Dim client
Set client = CreateObject("DDECOMClient.DDEClient")

client.ExecuteTextCommand "TclEval","TestTopic",
                          "SayHello"

The hardest bit about getting this working was the making the script sit in a loop processing Windows messages instead of terminating, and that’s what the “
vwait forever” does. The only way to exit this though it to use Ctrl+C in the console.

To test the configurable timeout behaviour I’d added to my COM component I added a sleep in the SayHello function like so.

global alarm
proc sleep {time} { 
  after $time set alarm 1 
  vwait alarm
}
. . .
proc SayHello {} { 
  puts "Waiting..." 
  sleep 2000 
  puts "Hello world"
}

Reproducing the Real Issue

My “improved” DDE COM Component went back to the original developer so they could then bump the timeout themselves to something sensible in their script. They came straight back to to say that increasing the timeout didn’t work. They had bumped it up to 60 secs, which, after further discussion revealed was 59.99 secs longer than the operation should really take.

With a bit more TCL knowledge under my belt I started to hack around with their script, which took a while as I couldn’t even get it to run under TCLSH.EXE. A bit of random commenting out and I was finally able to reproduce the timeout problem. At this point I assumed the real issue might lie with some interaction between myself and the VBScript engine or be a subtle bug in my COM component as it was being hosted in-process and had already made the other DDE calls.

However what I couldn’t reproduce was their ability to terminate the script using another process. At first I used my own DDECmd tool as I like to dog-food my own stuff when possible. No go. Luckily they shipped me the execdde tool they were using and lo-and-behold it also failed with a timeout exception too. Huh?

Trust Nothing

At this point I was totally bemused and was sure I was a victim of some kind of environmental difference, after all, how else could the bug reporter experience one thing whilst I saw another unless there was a difference in the tools themselves or the way they were being used? Time to start again and re-evaluate the problem…

Luckily I had been sent a copy of the scripts being run by the other developer and so I started to look more closely at what they were doing. Whilst reviewing this I noticed that the call to this 3rd party execdde tool was being done in such a way as to ignore the return code from it. In fact the lines above had the error reporting in, but it was now commented out. In the heat of battle it’s all too easy to just keep trying lots of different things in the hope that something eventually works and then we quickly lose sight of what we have and haven’t tried up to that point.

This caused me to also re-evaluate my theory about calling “exit” from inside the TCL DDE handler and so I did some more Googling and came up with this variation of the TCL DDE handler that sets an exit “flag” instead, unwinds, and then finally exits by dropping out the bottom:-

set forever 1

proc TclDdeServerHandler {args} {
  . . .
  switch -exact -- $cmd {
  . . .
    exit    {
      global forever
      set forever 0 
    } 
  }
}

package require dde 1.3
dde servername -handler ::TclDdeServerHandler TestTopic
vwait forever

This worked a treat. Although there was much gnashing of teeth wondering why the “forever” variable wasn’t changing. Apparently I need to tell TCL that the “forever” variable I was changing was the global one, instead of changing a local one.

TCLSH vs WISH

That was the end of it, or so I thought. All along there had actually been an “environmental” difference between what I was working with and what the other developer was using - the TCL interpreter. I didn’t realise that TCLSH.EXE and WISH.EXE are two slightly different variations of the TCL scripting host. I had wondered why I needed to comment out the line “console show” when trying to run their script, but just as they had done I moved on and forgotten to evaluate how I had got there.

The main difference, at least as far as what I proposed, was that when my script is run under WISH.EXE it doesn’t actually terminate, oops! Although I didn’t manage to confirm it, my suspicion was that WISH behaves more GUI like and so it probably has an impliedvwait __forever__” at the end (but waiting on some internal variable instead). The solution of course is as simple as appending the manual call to “exit” that we took out of the DDE handler to the bottom of the script:-

. . .
package require dde 1.3
dde servername -handler ::TclDdeServerHandler TestTopic

vwait forever
exit

Is there a Bug?

I managed to find a workaround that allows the person that reported the problem to do what it was they needed to do. Sadly I don’t know whether their original idea was sound (being able to call exit directly) or if it’s a bug in the TCL interpreter or DDE package. Presumably being open source I have the opportunity to download the sources, build and debug it myself. Maybe one day I’ll have the time.

 

[1] Back in the mid 1990’s when the JPEG library was in its infancy I had the pleasure of taking this Unix-style C library that just called exit() all over the place when anything failed and tried to turn it into a DLL to run under 16-bit Windows. Because it was going to be part of a bigger application it had to return an error code instead of just bailing out; along with dealing with NEAR and FAR pointers, etc.