Wednesday, 19 June 2013

A Tail of Two Lists

On the one hand I like to celebrate the fact that I come from an engineering background (I did a degree in Electronic System Engineering) and prefer to put an emphasis on the applied rather than the theoretical. But as I get older and more competent in the practical side I am becoming more curious about the theoretical side of programming so I can appreciate more about where so much of it comes from. That’s another post in itself though, whereas this one is specially about how that lack of Computer Science knowledge hindered me from properly understanding a fundamental construct - lists.

Linked Lists

Like many programmers I learnt C as my first high-level language (if you ignore the introductions to Pascal at A Level and Modula 2 in University) where the only built-in container type is the array. If you wanted something more dynamic, or you were “learning programming”, then the first structure you’re probably shown is the list; or as it’s often commonly known, the linked list. In fact a singly-linked list is often glossed over because it’s generally more awkward to use and so a doubly-linked list becomes the default (implementation of) choice. What could be simpler than this:-

typedef struct Item
{
    struct Item* m_next;
    struct Item* m_prev;
    const char*  m_data;
} Item;

typedef struct StringList
{
    struct Item* m_head;
    struct Item* m_tail;
} StringList;

Convert StringList to a class and throw templates into the mix and you’ve got yourself the (somewhat naive) C++ equivalent. This kind of list also serves very nicely as the implementation for a queue (FIFO) or stack (LIFO), with items being added and removed at the head and/or tail depending on what container you’re aiming for. What is important to realise for this post is that the head and tail only refer to a single item, e.g. in ASCII art something like this:-

A <-> B <-> C <-> D
^                 ^
|                 |
head           tail

If the first and last elements are the head and tail respectively, then what is the bit in the middle called - the body?

Type Lists

Fast forward a few years and I’ve been “doing” C++ for a while now. Naturally I’ve stopped using custom list classes and am instead relying on the library implementations, most notably the one that come with MFC [1]. Of course being the journeyman that I (hope I) am I discover this really amazing book called Modern C++ Design by Andrei Alexandrescu which describes these weird new concepts, like “template meta-programming” that relies on heavy template programming voodoo. At this point I’m still only just about getting my head around the “iterator model” and the standard containers. Oh yeah, and I’m on v5 or v6 of Visual C++ which barely seems to understand any more about Standard C++ than I do, so nothing in the book compiles.

In chapter 3 Andrei introduces a concept called Type Lists, which is a precursor to implementing some more “computer science-y” type stuff called Tuples. Try as I might I just can’t seem to understand what’s going on. I originally had 3 attempts at getting my head around the way he manipulates these “type lists” and each time I understood how he splits off the head and the tail but I just can’t see where the rest (the body?) of the list goes. I found myself getting really frustrated because I mentally stepped through his template code again and again and again, but I just couldn’t see why he didn’t end up with a head, body and a tail! But I also knew that wasn’t right either because the definition of a type list is trivial:-

<template typename H, typename T>
struct TypeList
{
    H head;
    T tail;
}

Of course there were a few other minor details that added slightly to the noise, like the last element in the list being the NullType that he introduced in an earlier chapter, but essentially a type list laid out long-hand would be like you might expect:-

typedef MyList TypeList< Widget, TypeList< int, TypeList< char, NullType> > >;

So, where was I going wrong?

Mental Models

I can’t remember exactly what it was that finally caused the light bulb to go on, I think it was whilst starting to play with Python and trying to understand the differences between lists and tuples. Anyway, what surprised me was that the documentation I was reading seemed to be suggesting that when you slice a list you get a head, which contains one or more items, and a tail which contains the rest of the list. Eh? So, the tail is not a single item but everything except the head! My mental model was more like that of a mammal that has a head, body and tail, when I should have been thinking of something more like a snake [2].

When you’re working on an architecture like DOS or 16-bit Windows where your data and stack share a measly 64K, you quickly learnt that recursion was something to be avoided at all costs. Consequently the only model for traversing arrays, lists, etc. is iteratively, from head to tail - one item at a time. Conversely the “functional programming” model [3] suggests a more recursive way of working where you effectively keep picking the head off the list at each step until the tail becomes empty.

As our computing resources continue to lose such performance constraints and our tools get smarter we have to remember that what we have been working with so far is sometimes just a pale imitation of a bigger concept brought about by the usual engineering trade-offs.

 

[1] When you spend all day writing MFC based applications it’s easy to forget that the default container type is std::vector<> and you end up using CArray<> instead. There was enough of an impedance mismatch between the STL and MFC containers that meant the path of least resistance was simply to just stick with the MFC implementations. Sadly that also meant the beauty of the container/iterator/algorithm model went unnoticed by me for many years.

[2] I’m sure someone well versed in reptile physiology will no doubt point out that snakes are more than a head and tail, pardon my ignorance.

[3] I’m sure I’m going to get some flack for abusing such a term and/or over simplifying it, but like many others I’m only just starting out on the road to understanding more about the functional and declarative paradigms.

1 comment:

  1. Perhaps more important is *why* the tail is "the rest of the items". As you say this promotes a recursive approach to manipulating lists, although you can flatten it (or write a tail-recursive function - although subtly different tail there).

    Assuming immutable structures (as would be typical in a functional context) how do you construct a list in the first place? When you realise that each list is composed of a head and *another list* it's easy to see how a singly-linked list does not mutate the intermediate state but always adds to it.

    However the gains come when you start to share state. If list A has ten items and list B has eleven items, but the last nine are common, then there are only twelve items in total. That can be quite useful in a 64k system ;-)

    It gets a little more complex when you want to insert items in the middle (body?) of the list but the same principles apply (the shared tail is shorter and some items are duplicated - potentially much less memory efficient).

    This is not the only way the portions of immutable state can be shared and the whole area of persistent data types is quite fascinating - and we're using them (especially persistent maps and sets) to great effect on a large C++ project at the moment.

    ReplyDelete