Wednesday 13 November 2013

Self-Terminating From Inside a TCL Script DDE Handler

Like all good bugs, the one you discover is not always the one that is actually causing the real problem the user is reporting. In my last post “DDE XTYP_EXECUTE Command Corruption” I described a problem I ran into when sending a DDE XTYP_EXECUTE message from a Unicode server to an ANSI client. Whilst this had became a problem, it turned out this wasn’t the actual problem that the user had originally reported.

Hanging on Exit

The real problem the user was experiencing was that when they sent a DDE command to their TCL script to terminate itself, the calling script which was written in VBScript was timing out with a DMLERR_EXECACKTIMEOUT error. What made things more curious was that the user had managed to find a workaround using another DDE tool (a command line application) that did seem to terminate the TCL script without generating an error.

Although I knew nothing about TCL at all at that point, my spider-senses were already tingling when I saw this bit of the TCL script in their email:-

proc TclDdeServerHandler {args} {
  . . .
  switch -exact -- $cmd {
    . . .
    exit {
      exit
    }
  }
}

The code path for the “exit” command was causing the TCL script to terminate whilst still instead the DDE handler. Although it may not actually be a bad thing to do in a script I’ve always tended to try and let any stacks unwind before terminating a process or script to make sure the “runtime” remains in “a good place”. Maybe I’ve just spent too many hours porting native libraries that use “exit()” instead of “return” as an error handling strategy [1].

I raised this as a concern, but given the other developer knew TCL and I didn’t I was happy to accept their answer that this wasn’t an issue.

My First TCL Script

After taking a crash course in TCL, which really just involved hacking around the script I’d already been given, I managed to create a simple one that acted as a trivial DDE server to print a popular message:-

proc RunServer {} { 
  package require dde
  dde servername TestTopic
}

proc SayHello {} { 
  puts "Hello world"
}

RunServer
vwait forever

I ran this using TCLSH.EXE MyScript.tcl and poked it remotely using a similar nugget of VBScript:-

Dim client
Set client = CreateObject("DDECOMClient.DDEClient")

client.ExecuteTextCommand "TclEval","TestTopic",
                          "SayHello"

The hardest bit about getting this working was the making the script sit in a loop processing Windows messages instead of terminating, and that’s what the “
vwait forever” does. The only way to exit this though it to use Ctrl+C in the console.

To test the configurable timeout behaviour I’d added to my COM component I added a sleep in the SayHello function like so.

global alarm
proc sleep {time} { 
  after $time set alarm 1 
  vwait alarm
}
. . .
proc SayHello {} { 
  puts "Waiting..." 
  sleep 2000 
  puts "Hello world"
}

Reproducing the Real Issue

My “improved” DDE COM Component went back to the original developer so they could then bump the timeout themselves to something sensible in their script. They came straight back to to say that increasing the timeout didn’t work. They had bumped it up to 60 secs, which, after further discussion revealed was 59.99 secs longer than the operation should really take.

With a bit more TCL knowledge under my belt I started to hack around with their script, which took a while as I couldn’t even get it to run under TCLSH.EXE. A bit of random commenting out and I was finally able to reproduce the timeout problem. At this point I assumed the real issue might lie with some interaction between myself and the VBScript engine or be a subtle bug in my COM component as it was being hosted in-process and had already made the other DDE calls.

However what I couldn’t reproduce was their ability to terminate the script using another process. At first I used my own DDECmd tool as I like to dog-food my own stuff when possible. No go. Luckily they shipped me the execdde tool they were using and lo-and-behold it also failed with a timeout exception too. Huh?

Trust Nothing

At this point I was totally bemused and was sure I was a victim of some kind of environmental difference, after all, how else could the bug reporter experience one thing whilst I saw another unless there was a difference in the tools themselves or the way they were being used? Time to start again and re-evaluate the problem…

Luckily I had been sent a copy of the scripts being run by the other developer and so I started to look more closely at what they were doing. Whilst reviewing this I noticed that the call to this 3rd party execdde tool was being done in such a way as to ignore the return code from it. In fact the lines above had the error reporting in, but it was now commented out. In the heat of battle it’s all too easy to just keep trying lots of different things in the hope that something eventually works and then we quickly lose sight of what we have and haven’t tried up to that point.

This caused me to also re-evaluate my theory about calling “exit” from inside the TCL DDE handler and so I did some more Googling and came up with this variation of the TCL DDE handler that sets an exit “flag” instead, unwinds, and then finally exits by dropping out the bottom:-

set forever 1

proc TclDdeServerHandler {args} {
  . . .
  switch -exact -- $cmd {
  . . .
    exit    {
      global forever
      set forever 0 
    } 
  }
}

package require dde 1.3
dde servername -handler ::TclDdeServerHandler TestTopic
vwait forever

This worked a treat. Although there was much gnashing of teeth wondering why the “forever” variable wasn’t changing. Apparently I need to tell TCL that the “forever” variable I was changing was the global one, instead of changing a local one.

TCLSH vs WISH

That was the end of it, or so I thought. All along there had actually been an “environmental” difference between what I was working with and what the other developer was using - the TCL interpreter. I didn’t realise that TCLSH.EXE and WISH.EXE are two slightly different variations of the TCL scripting host. I had wondered why I needed to comment out the line “console show” when trying to run their script, but just as they had done I moved on and forgotten to evaluate how I had got there.

The main difference, at least as far as what I proposed, was that when my script is run under WISH.EXE it doesn’t actually terminate, oops! Although I didn’t manage to confirm it, my suspicion was that WISH behaves more GUI like and so it probably has an impliedvwait __forever__” at the end (but waiting on some internal variable instead). The solution of course is as simple as appending the manual call to “exit” that we took out of the DDE handler to the bottom of the script:-

. . .
package require dde 1.3
dde servername -handler ::TclDdeServerHandler TestTopic

vwait forever
exit

Is there a Bug?

I managed to find a workaround that allows the person that reported the problem to do what it was they needed to do. Sadly I don’t know whether their original idea was sound (being able to call exit directly) or if it’s a bug in the TCL interpreter or DDE package. Presumably being open source I have the opportunity to download the sources, build and debug it myself. Maybe one day I’ll have the time.

 

[1] Back in the mid 1990’s when the JPEG library was in its infancy I had the pleasure of taking this Unix-style C library that just called exit() all over the place when anything failed and tried to turn it into a DLL to run under 16-bit Windows. Because it was going to be part of a bigger application it had to return an error code instead of just bailing out; along with dealing with NEAR and FAR pointers, etc.

No comments:

Post a Comment