Monday 28 November 2011

Null String Reference vs Empty String Value

One of the first things I found myself in two minds about when making the switch from C++ to C# was what use to represent an empty string - a null reference or an empty string (i.e. String.Empty).

C++ - Value Type Heaven

In C & C++ everything is a value type by default and you tend to avoid dynamically allocated memory as a matter of course, especially for primitive types like an int. This leads to the common technique of using a special value to represent NULL instead:-

int pos = -1;

Because of the way strings in C are represented (a pointer to an array of characters) you can use a NULL pointer here instead, which chalks one up for the null reference choice:-

int pos = -1;
const char* name = NULL;

But in C++ you have a proper value type for strings - std::string - so you don’t have to delve into the murky waters of memory management[*]. Internally it still uses dynamic memory management[+] to some degree and so taking the hit twice (i.e. const std::string*) just so you can continue to use a NULL pointer seems criminal when you can just use an empty string instead:-

int pos = -1;
std::string name(“”);

I guess that evens things up and makes the use of a special value consistent across all value types once again; just so long as you don’t need to distinguish between an empty string value and “no” string value. But hey, that’s no different to the same problem with -1 being a valid integer value in the problem domain. The c++ standard uses this technique too (e.g. string::npos) so it can’t be all bad…

If you’re less concerned about performance, or special values are a problem, you can happily adopt the use of the Swiss-army knife that is shared_ptr<T>, or the slightly more discoverable boost::optional<T> to handle the null-ness consistently as an attribute of the type:-

boost::shared_ptr<int> pos;
boost::optional<std::string> name;

if (name)
  std::string value = *name;
  . . .

C# - Reference Type Heaven

At this point in my transition from C++ to C# I’m still firmly in the empty string camp. My introduction to reference-based garbage-collected languages where the String type is implemented as a reference-type, but has value-type semantics does nothing to make the question any easier to answer. Throw the idea of “boxing values” into the equation (in C# at least) and about the only thing that you can say is that performance is probably not going to matter as much as you’re used to and so you should just let go. But, you’ll want something a little less raw than doing this though for primitive types:-

object pos = null;
string name = null;

if (pos != null)
  int value = (int)pos;
  . . .

If you’re using anything other than an archaic[$] version of C# than you have the generic type Nullable<T> at your disposal. This, with its syntactic sugar provided by the C# compiler, provides a very nice way of making all value types support null:-

int? pos;
string name;

if (pos.HasValue())
  int value = pos.Value;
  . . .

So that pretty much suggests we rejoin the “null reference” brigade. Surely that’s case closed, right?

Null References Are Evil

Passing null objects around is a nasty habit. It forces you to write null-checks everywhere and in nearly all cases the de-facto position of not passing null’s can be safely observed with the runtime picking up any unexpected violations. If you’re into ASSERTs and Code Contracts you can annotate your code to fail faster so that the source of the erroneous input is highlighted at the interface boundary rather than as some ugly NullReferenceException later. In those cases where you feel forcing someone to handle a null reference is inappropriate, or even impossible, you you can always adopt the Null Object pattern to keep their code simple.

In a sense an empty string is the embodiment of the Null Object pattern. The very name of the static string method IsNullOrEmpty() suggests that no one is quite sure what they should do and so they’ll just cover all bases instead. One of the first extension methods I wrote was IsEmpty() because I wanted to show my successor that I was nailing my colours to the mast and not “just allowing nulls because I implemented using a method that happens to accept either”.

Sorry, No Answers Here… Move Along

Sadly the only consistency I’ve managed to achieve so far is in and around the Data Access Layer where it seems sensible to map the notion of NULL in the database sense to a null reference. The Nullable<T> generic fits nicely in with that model too.

But in the rest of the codebase I find myself sticking to the empty string pattern. This is probably because most of the APIs I use that return string data also tend to return empty strings rather than null references. Is this just Defensive Programming in action? Do library vendors return empty strings in preference to null references because their clients have a habit of not checking properly? If so, perhaps my indecision is just making it harder for them to come to a definitive answer…


[*] Not that you would anyway when you have the scoped/shared_ptr<T> family of templates to do the heavy lifting and RAII runs through your veins.

[+] Short string optimisations notwithstanding.

[$] I know I should know better than to say that given there are people still knee deep in Visual C++ 6 (aka VC98)!

Wednesday 23 November 2011

I Know the Cost but Not the Value

[I wrote the vast majority of this post “the morning after the night before”. I hope that 9 months later I’ve managed to finish it in a more coherent manner than I started it…]

Back in February I made a rare appearance at the eXtreme Tuesday Club and got to talk to some of the movers and shakers in the Agile world. Professionally speaking I’ve mostly been on the outside of the shop looking in through the window wondering what all the commotion is inside. By attending XTC I have been hoping to try and piece together what I’ve seen and read about the whole “movement” and how that it works in practice. As I documented in an earlier post “Refactoring – Do You Tell Your Boss?” I’ve long been struggling with the notion of Technical Debt and how I can help it get sold correctly to those who get to decide when we draw on it and when it’s paid back.

That night, after probably a few too many Staropramen, I finally got to enter into a discussion with Chris Matts on the subject and what I can’t decide is whether we were actually in agreement or, more likely, that I didn’t manage to explain correctly what I believe my relationship is with regards to identifying Cost and Value; in short, I can the former (Cost), but not the latter (Value). So maybe my understanding of what cost and value are is wrong and that’s why I didn’t quite get what he was saying (although I can’t discount the effects of the alcohol either). This post is therefore an opportunity for me to put out what I (and I’m sure some of my colleagues agree) perceive to be our place in the pecking order and the way this seems to work. I guess I’m hoping this will either provide food for thought, a safe place for other misguided individuals or a forum in which those in the know can educate the rest of us journeymen...

From the Trenches

The following example is based on a real problem, and at the time I tried to focus on what I perceived to be the value in the fixes so that the costs could be presented in a fair way and therefore an informed[*] choice would then be made.

The problem arose because the semantics of the data from an upstream system changed such that we were not processing as much data as we should. The problem was not immediately identified because the system was relatively new and so many upstream changes had been experienced that it wasn’t until the dust started to settle that the smaller issues were investigated fully.

The right thing to do would be to fix the root problem and lean on the test infrastructure to ensure no regressions occurred in the process. As always time was perceived to be a factor (along with a sprinkling of politics) and so a solution that involved virtually no existing code changes was also proposed (aka a workaround). From a functional point of view they both provide the same outcome, but the latter would clearly incur debt that would need to be repaid at some point in the future.

Cost vs Value

It’s conceivable that the latter could be pushed into production sooner because there is notionally less chance of another break occurring. In terms of development time there wasn’t much to choose between them and so the raw costs were pretty similar, but in my mind there was a significant difference in value.

Fixing the actual problem clearly has direct value to the business, but what is the difference in value between it being fixed tomorrow, whilst incurring a small amount of debt, and it being fixed in 3 days time with no additional debt? That is not something I can answer. I can explain that taking on the debt increases the risk of potential disruption to subsequent deliveries but only they are in a position to quantify what a slippage in schedule would cost to them.

Perhaps I’m expected to be an expert in both the technology and the business? Good luck with that. It feels to me that I provide more value to my customer by trying to excel at being a developer so that I can provide more accurate estimates, which are very tangible, than trying to understand the business too in the hope of aiding them to quantify the more woolly notion of value. But that’s just the age old argument of Generalists vs Specialists isn’t it? Or maybe that’s the point I’m missing - that the act of trying to quantify the value has value in itself? If so, am I still the right person to be involved in doing that?

I’m clearly just starting out on the agile journey and have so much more to read, mull over and discuss. This week saw XP Day which would have been the perfect opportunity to further my understanding but I guess I’ll have to settle for smaller bites at the eXtreme Tuesday Club instead - if I could only just remember to go!


The workaround cited in the example above is finally going to be removed some 10 months later because it is hopelessly incompatible with another change going in. Although I can’t think of a single production issue caused by the mental disconnect this workaround created, I do know of a few important test runs the business explicitly requested that were spoilt because the workaround was never correctly invoked and so the test results were useless. Each one of these runs took a day to set up and execute and so the costs to the development team has definitely been higher. I wonder how the business costs have balanced out?


[*] Just writing the word “informed” makes me smile. In the world of security there is the Dancing Pigs problem which highlights human nature and our desire for “shiny things”. Why should I expect my customer to ever choose “have it done well” over “have it done tomorrow” when there are Dancing Pigs constantly on offer?

Tuesday 22 November 2011

Cookbook Style Programming

Changing a tyre on a car is reasonably simple. In fact if I wasn’t sure how to do it I could probably find a video on YouTube to help me work through it. Does that now make me a mechanic though? Clearly not. If I follow the right example I might even do it safely and have many happy hours of motoring whilst I get my tyre fixed. But, if I follow an overly simplistic tutorial I might not put the spare on properly and cause an imbalance, which, at best will cause unnecessary wear on the tyre and at worst an accident. Either way the time lapse between cause and effect could be considerable.

If you’re already thinking this is a rehash of my earlier post “The Dying Art of RTFM”, it’s not intended to be. In some respects it’s the middle ground between fumbling in the dark and conscious incompetence...

The Crowbar Effect

The StackOverflow web site has provided us with a means to search for answers to many reoccurring problems, and that’s A Good Thing. But is it also fostering a culture that believes the answer to every problem can be achieved just by stitching together bunches of answers? If the answer to your question always has a green tick and lots of votes then surely it’s backed by the gurus and so is AAA grade; what can possibly be wrong with advice like that?

while (!finished)
  var answer = Google_problem(); 

At what point can you not take an answer at face value and instead need to delve into it to understand what you’re really getting yourself into?

For example, say I need to do some database type stuff and although I’ve dabbled a bit and have a basic understanding of tables and queries I don’t have any experience with an ORM (Object Relational Mapper). From what I keep reading an ORM (alongside an IoC container) is a must-have these days for any “enterprise” sized project and so I’ll choose “Super Duper ORM” because it’s free and has some good press.

So I plumb it in and start developing features backed by lots of automated unit & integration tests. Life is good. Then we hit UAT and data volumes start to grow and things start creaking here and there. A quick bit of Googling suggests we’re probably pulling way too much data over the wire and should be using Lazy Loading instead. So we turn it on. Life is good again. But is it? Really?

Are You Qualified to Make That Decision?

Programmers are problem solvers and therefore it’s natural for them to want to see a problem through to the end, hopefully being the one to fix it on the way. But fixing the problem at hand should be done in a considered manner that weighs up any trade-offs; lest the proverbial “free-lunch” rears its ugly head. I’m not suggesting that you have to analyse every possible angle because quite often you can’t, in which case one of the trade-offs may well be the very fact that you cannot know what they are; but at least that should be a conscious decision. Where possible there should also be some record of that decision such as a comment in the code or configuration file, or a maybe a task in the bug database to revisit the decision at a later date when more facts are available. If you hit a problem and spent any time investigating it you can be pretty sure that your successor will too; but at least you’ll have given them a head start[*].

One common source of irritation is time-outs. When a piece of code fails due to a timeout the natural answer seems to be to just bump the timeout up to some ridiculous value without regard to the consequences. If the code is cut-and-pasted from an example it may well have an INFINITE timeout which could make things really interesting further down the line. For some reason configuration changes do not appear to require the same degree of consideration as a code change and yet they can cause just as much head-scratching as you try to work out why one part of the system ended up dealing with an issue that should have surfaced elsewhere.

Lifting the Bonnet

I’ll freely admit that I’m the kind of person who likes to know what’s going on behind the scenes. If I’m going to buy into some 3rd party product I want to know exactly how much control I have over it because when, not if, a problem surfaces I’m going to need to fix it or work around it. And that is unlikely to be achievable without giving up something in return, even if its psychological[+] rather than financial.

However I accept that not everyone is like that and some prefer to learn only enough to get them through the current problem and onto the next one; junior programmers and part-time developers don’t know any better so they’re [partly] excused. In some environments where time-to-market is everything that attitude is probably even highly desirable, but that’s not the environment I’ve ended up working in.

It’s Just a Lack of Testing...

It’s true that the side-effects of many of the problems caused by this kind of mentality could probably be observed during system testing - but only so long as your system tests actively invoke some of the more exceptional behaviour and/or are able to generate the kinds of excessive loads you’re hoping to cope with. If you don’t have the kind of infrastructure available in DEV and/or UAT for serious performance testing, and let’s face it most bean counters would see it as the perfect excuse to eliminate “waste”, you need to put even more thought into what you’re doing - not less.


[*] To some people this is the very definition of job security. Personally I just try hard not to end up on the The Daily WTF.

[+] No one likes working on a basket-case system where you don’t actually do any development because you spend your entire time doing support. I have a mental note of various systems my colleagues have assumed me I should avoid working on at all costs...

Saturday 19 November 2011

PowerShell & .Net - Building Systems as Toolkits

I first came across COM back in the mid ‘90s when it was more commonly known by the moniker OLE2[*]. I was doing this in C, which along with the gazillion of interfaces you had to implement for even a simple UI control, just made the whole thing hard. I had 3 attempts at reading Kraig Brockschmidt’s mighty tome Inside OLE 2 and even then I still completely missed the underlying simplicity of the Component Object Model itself! In fact it took me the better part of 10 years before I really started to appreciate what it was all about[+]. Not unsurprisingly that mostly came about with a change in jobs where it wasn’t C++ and native code across the board.

COM & VBScript

Working in a bigger team with varied jobs and skillsets taught me to appreciate the different roles languages and technologies play. The particular system I was working on made heavy use of COM internally, purely to allow the VB based front-end and C++ based back-ends communicate. Sadly the architecture seemed upside down in many parts of the native code and so instead of writing C++ that used STL algorithms against STL containers with COM providing a layer on top for interop, you ended up seeing manual ‘for’ loops that iterated over COM collections of COM objects that just wrapped the underlying C++ type. COM had somehow managed to reach into the heart of the system.

Buried inside that architecture though was a bunch of small components just waiting to be given another life - a life where the programmer interacting with it isn’t necessarily one of the main development team but perhaps a tester, administrator or support analyst trying to diagnose a problem. You would think that automated testing would be a given in such a “component rich” environment, but unfortunately no. The apparent free-lunch that is the ability to turn COM into DCOM meant the external services were woven tightly into the client code - breaking these dependencies was going to be hard[$].

One example of where componentisation via COM would have been beneficial was when I produced a simple tool to read the compressed file of trades (it was in a custom format) and filter it based on various criteria, such as trade ID, counterparty, TOP 10, etc. The tool was written in C++, just like the underlying file API, and so the only team members that could work on it were effectively C++ developers even though the changes were nearly always about changing the filtering options; due to its role as a testing/support tool any change requests would be well down the priority list.

What we needed was a component that maintains the abstraction - a container of trades - but could be exposed, via COM, so that the consumption of the container contents could just as easily be scripted; such as with VBScript which is a more familiar tool to technical non-development staff. By providing the non-developers with building blocks we could free up the developers to concentrate on the core functionality. Sadly, the additional cost of exposing that functionality via COM purely for the purposes of non-production reasons is probably seen as too high, even if the indirect benefits may be an architecture that lends itself better to automated testing which is a far more laudable cause.

PowerShell & .Net

If you swap C#/.Net for C++/COM and PowerShell for VBScript in the example above you find a far more compelling prospect. The fact that .Net underpins PowerShell means that it has full access to every abstraction we write in C#, F#, etc. On my current project this has caused me to question the motives for even providing a C# based .exe stub because all it does is parse the command line and invoke a bootstrap method for the logging, configuration etc. All of this can, and in fact has been done in some of the tactical fixes that have deployed.

The knock-on effects of automated testing right up from the unit, through integration to system level means that you naturally write small loosely-coupled components that can be invoked by a test runner. You then compose these components into ever bigger ones, but the invoking stub, whether a test harness or production process rarely changes because all it does is invoke factories and stitch together a bunch of service objects before calling the “logical” equivalent of main.

Systems as Toolkits

Another way of building a system then is not so much as a bunch of discrete processes and services but more a toolkit from which you can stitch together the final behaviour in a more dynamic fashion. It is this new level of dynamism that excites me most because much of the work in the kind of batch processing systems I’ve worked on recently has focused on the ETL (Extract/Transform/Load) and reporting parts of the picture.

The systems have various data stores that are based around the database and file-system that need to be abstracted behind a facade to protect the internals. This is much harder in the file-system case because it’s too easy to go round the outside and manipulate files and folders directly. By making it easier to invoke the facade you remove the objections around not using it and so retain more control on the implementation. I see a common trend of moving from the raw file-system to NoSQL style stores that would always require any ad-hoc workarounds to be cleaned up. This approach provides you with a route to refactoring your way out of any technical debt because you can just replace blobs of script code with shiny new unit-tested components that are then slotted in place.

On the system I’m currently working with the majority of issues seem to have related to problems caused by upstream data. I plan to say more about this issue in a separate post, but it strikes me that the place where you need the utmost flexibility is in the validation and handling of external data. My current project manager likens this to “corrective optics” ala the Hubble Space Telescope. In an ideal world these issues would never arise or come out in testing, but the corporate world of software development is far from ideal.

Maybe I’m the one wearing rose tinted spectacles though. Visually I’m seeing a dirty great pipeline, much like a shell command line with lots of greps, sorts, seds & awks. The core developers are focused on delivering a robust toolkit whilst the ancillary developers & support staff plug them together to meet the daily needs of the business. There is clearly a fine line between the two worlds and refactoring is an absolute must if the system is to maintain firm foundations so that the temptation to continually build on the new layers of sand can be avoided. Sadly this is where politics steps in and I step out.


Although I’ve discussed using PowerShell on the scripting language side the same applies to any of the dynamic .Net based languages such as IronPython and IronRuby. As I wrote back in “Where’s the PowerShell/Python/IYFSLH*?” I have reservations about using the latter for production code because there doesn’t appear to be the long-term commitment and critical mass of developers to give you confidence that it’s a good bet for the next 10 years. Even PowerShell is a relative newcomer on the corporate stage and probably has more penetration into the sysadmin space than the development community which still makes it a tricky call.

The one I’m really keeping my eye on though is F#. It’s another language whose blog I’ve followed since its early days and even went to a BCS talk by its inventor Don Syme back in 2009. It provides some clear advantages to PowerShell such as its async behaviour and now that Microsoft has put its weight behind F# and shipped it as part of the Visual Studio suite you feel it has staying power. Sadly its functional nature may keep it out of those hands we’re most interested in freeing.

I’ve already done various bits of refactoring on my current system to make it more amenable for use within PowerShell and I intend to investigate using the language to replace some of the system-level test consoles which are nothing but glue anyway. What I suspect will be the driver for a move to a more hybrid model will be the need automate further system-level tests, particularly of the regression variety. The experience gained here can then feed back into the main system development cycle to act as living examples.


[*] In some literature it stood for something - Object Linking & Embedding, and in others it just was the letters OLE. What was the final outcome, acronym or word?

[+] Of course by then .Net had started to take over the world and [D]COM was in decline. This always happens. Just as I really start to get a grip on a technology and finally begin to understand it the rest of the world has moved on to bigger and better things…

[$] One of the final emails I wrote (although some would call it a diatribe) was how the way COM had been used within the system was the biggest barrier to moving to a modern test driven development process. Unit testing was certainly going to be impossible until the business logic was separated from the interop code. Yes, you can automate tests involving COM, but why force your developers to be problem domain experts and COM experts? Especially when the pool of talent for this technology is shrinking fast.

Tuesday 15 November 2011

Merging Visual Studio Setup Projects - At My Wix End

When the Windows Installer first appeared around a decade ago (or even longer?) there was very little tooling around. Microsoft did the usual thing and added some simple support to Visual Studio for it (the .vdproj project type); presumably keeping it simple for fear of raising the ire of another established bunch of software vendors - the InstallShield crowd. In the intervening years it appears to have gained a few extra features, but nothing outstanding. As a server-side guy who has very modest requirements I probably wouldn’t notice where the bells and whistles have been added anyway. All I know is that none of the issues I have come across since day one have ever gone away...

Detected Dependencies Only Acknowledges .DLLs

In a modern project where you have a few .exes and a whole raft of .dlls it’s useful that you only have to add the .exes and Visual Studio will find and add all those .dll dependencies for you. But who doesn’t ship the .pdb files as well*? This means half the files are added as dependencies and the other half I have to go and add manually. In fact I find it easier to just exclude all the detected dependencies and just manually add both the .dll and .pdb; at least then I can see them listed together as a pair in the UI and know I haven’t forgotten anything.

Debug & Release Builds

In the native world there has long been a tradition of having at least two build types - one with debug code in and one optimised for performance. The first is only shipped to testers or may be used by a customer to help diagnose a problem, whereas the latter is what you normally ship to customers on release. The debug build is not just the .exe, but also the entire chain of dependencies as there may be debug/release specific link-time requirements. But the Setup Project doesn’t understand this unless you make it part of your solution and tie it right into your build. Even then any third party dependencies you manually add causes much gnashing of teeth as you try and make the “exclude” flag and any subsequent detected dependencies play nicely together with the build specific settings.

Merging .vdproj Files

But, by far my biggest gripe has always been how merge-unfriendly .vdproj files are. On the face of it, being a text file, you would expect it to be easy to merge changes after branching. That would be so if Visual Studio stopped re-generating the component GUID’s for apparently unknown reasons, or kept the order of things in the file consistent. Even with “move block detection” enabled in WinMerge often all you can see is a sea of moved blocks. One common (and very understandable) mistake developers make is to open the project without the binaries built and add a new file which can cause all the detected dependencies to go AWOL. None of this seems logical and yet time and time again I find myself manually merging the changes back to trunk because the check-in history is impenetrable. Thanks goodness we put effort into our check-in comments.

WiX to the Rescue

WiX itself has been around for many years now and I’ve been keen to try it out and see if it allows me to solve the common problems listed above. Once again for expediency my current project started out with a VS Setup Project, but this time with a definite eye on trying out WiX the moment we got some breathing space or the merging just got too painful. That moment finally arrived and I’m glad we switched; I’m just gutted that I didn’t do the research earlier because it’s an absolute doddle to use! For server-side use where you’re just plonking a bunch of files into a folder and adding a few registry keys it almost couldn’t be easier:-

<?xml version="1.0"?>
<Wix xmlns=""> 
  <Product Id="12345678-1234-. . ." 
           Manufacturer="Chris Oldwood" 
           UpgradeCode="87654321-4321-. . .">

    <Package Compressed="yes"/>

    <Media Id="1" Cabinet="" 

    <Directory Name="SourceDir" Id="TARGETDIR"> 
      <Directory Name="ProgramFilesFolder"
        <Directory Name="Chris Oldwood" Id="_1"> 
          <Directory Name="vdproj2wix" Id="_2"> 
            <Component Id="_1" Guid="12341234-. . ."> 
              <File Source="vdproj2wix.ps1"/> 
              <File Source="vdproj2wix.html"/> 

<Feature Id="_1" Level="1"> 
      <ComponentRef Id="_1"/> 


The format of the .wxs file is beautifully succinct, easy to diff and merge, and yet it supports many advanced features such as #includes and variables (for injecting build numbers and controlling build types). I’ve only been using it recently and so can’t say what versions 1 & 2 were like but I can’t believe it’s changed that radically. Either way I reckon they’ve got it right now.

Turning a .wxs file into an .msi couldn’t be simpler either which also makes it trivial to integrate into your build process:-

candle vdproj2wix.wxs
if errorlevel 1 exit /b 1

light vdproj2wix.wixobj
if errorlevel 1 exit /b 1

My only gripe (and it is very minor) is with the tool naming. Yes it’s called WiX and so calling the (c)ompiler Candle and the (l)inker Light is cute the first few times but now the add-ons feel the need to carry on the joke which just makes you go WTF? instead.

So does it fix my earlier complaints? Well, so far, yes. I’m doing no more manual shenanigans than I used to and I have something that diffs & merges very nicely. It’s also trivial to inject the build number compared with the ugly VBScript hack I’ve used in the past to modify the actual .msi because the VS way only supports a 3-part version number.

Converting Existing .vdproj Files to .wxs Files

Although as I said earlier the requirements of my current project were very modest I decided to see if I could write a simple script to transform the essence (i.e. the GUID’s & file list) of our .vdproj files into a boiler-plate .wxs file. Not unsurprisingly this turned out to be pretty simple and so I thought I would put together a clean-room version on my web site for others to use in future - vdproj2wix. Naturally I learnt a lot more about the .wxs file format in the process** and it also gave me another excuse to learn more about PowerShell.

If you’re now thinking that this entire post was just an excuse for a shameless plug of a tiny inconsequential script you’d be right - sort of. I also got to vent some long standing anger too which is a bonus.

So long .vdproj file, I can’t say I’m going to miss you...

* OK, in a .Net project there is less need to ship the .pdbs but in a native application they (or some other debug equivalent such as .dbg files) are pretty essential if you ever need to deal with a production issue.

** Curiously the examples in the tutorial that is linked to from the WiX web site have <File> elements with many redundant attributes. You actually only need the Source attribute as a bare minimum.