Sunday, December 20, 2009

Reducing a tree

One of the basic functions that pops up in functional programming is reduce. It takes a sequence of items and applies some operation to each of them in turn, to combine them into a final result. So if you start with a list of numbers, say:

[1, 2, 3, 4]

and fold it with an addition operator (+), you get:

((1+2)+3)+4
= 10

Addition is associative, so rearranging the parentheses to start grouping from the right doesn't change the answer.

1+(2+(3+4))
= 10

The result is the same, but the performance may not be. In C#, like most languages, the first version will run in constant stack space and probably be faster; the second will take stack space proportional to the list length, and probably be slower by some constant factor.

The .NET framework provides the first version in the System.Linq.Enumerable class, as a family of extension methods named "Aggregate":

var xs = new List { 1, 2, 3, 4 };

int i = xs.Aggregate( (a,b) => a+b );

Console.WriteLine(i);

-> 10

Enumerable.Aggregate works with any type and binary function. You can concatenate strings this way:

string[] ss = new string[] { "foo", "bar", "baz" };

string s = ss.Aggregate( (a,b) => a+"-"+b };

Console.WriteLine(s);

-> foo-bar-baz

Or compute a factorial:

var bigs = Enumerable.Range(1,111111).Select(i => new BigInteger(i));
var fact = bigs.Aggregate( (a,b) => a*b );

Console.WriteLine(fact);

-> 1380602974...<500000 digits>...0000000000

Handy. But already we have a performance problem. If the list of strings to concatenate or numbers to multiply is very long, it will take a long time to compute the answer. The factorial example takes abount one minute to run on my laptop.

Why is it slow? In both cases, it's because each time we append two strings or multiply two BigIntegers, the result is about as big as both operands together. By the time we're halfway through the list, we're multiplying a 250000-digit number at every step.

What if we reordered the operations a little bit? Instead of this:

((((((1*2)*3)*4)*5)*6)*7)*8

We could do something like this:

((1*2)*(3*4)) * ((5*6)*(7*8))

We'd still get the same answer, but we would avoid multiplying 250000-digit numbers until the very last step, when we combine two of them to create the 500000-digit result. number until the very last step, when we combine two ~250000-digit numbers, each come from four 125000-digit numbers, and so on. So the total amount of work would be much less.

It seems like a function to handle this case generically would be useful. I would have the exact same signature as Enumerable.Aggregate, and produce the same result.1 But instead of grouping the operation left-to-right or right-to-left, it would group the elements in a binary tree.

How can we implement such a function? Two approaches spring to mind. You could work from the bottom up, pairing off elements in multiple passes. Or, you could do a single pass, building a single tree that grows larger and larger as you work through the sequence. I opted for the latter:

///
/// Reduce a sequence of T's to a single T, using the given reducer function.
/// This is the same as Enumerable.Aggregate(seq, func) except that it
/// associates the operations in tree order.
///
/// So:
/// Aggregate([1..8], (+)) -> ((((((1+2)+3)+4)+5)+6)+7)+8
///
/// TreeAggregate([1..8], (+)) -> ((1+2)+(3+4)) + ((5+6)+(7+8))
///
public static T TreeAggregate(this IEnumerable seq, Func func)
{
    using (IEnumerator cursor = seq.GetEnumerator())
    {
        // Make sure there's at least one element
        if (!cursor.MoveNext())
        {
            throw new InvalidOperationException("Sequence contains no elements");
        }

        T val = cursor.Current;
        int depth = 1;
        bool more = true;
        while (more)
        {
            val = TreeAggregatePartial(cursor, val, depth, func, ref more);
            depth++;
        }
        return val;
    }
}

// Pull more items off the sequence to increase the depth of the given
// initial tree by one.
private static T TreeAggregatePartial(IEnumerator cursor, T lhs, int depth, Func reducer, ref bool more)
{
    more = cursor.MoveNext();
    if (more)
    {
        T rhs = cursor.Current;
        for (int i = 1; i < depth && more; ++i)
        {
            rhs = TreeAggregatePartial(cursor, rhs, i, reducer, ref more);
        }
        lhs = reducer(lhs, rhs);
    }
    return lhs;
}

It's a little ugly because of the need to take care of handling empty lists, and because we don't know in advance how big of a tree to create, so we have to sort of build it up from the lower left corner. But now that it's written, it's as easy to use as the regular Aggregate function:

var bigs = Enumerable.Range(1,111111).Select(i => new BigInteger(i)).ToArray();

var fact2 = bigs.TreeAggregate( (a,b) => a*b );

Console.WriteLine(fact2);

Same answer, but now the computation takes only 20 seconds. This is better, but still doesn't seem like that much of an improvement. It should take just a few milliseconds. It turns out the culprit here is .NET's new BigInteger class, which doesn't multiply large numbers as quickly as it could. Let's try another example.

strings = Enumerable.Range(1,100000).Select(i => i.ToString());
strings.Aggregate( (s1,s2) => s1+s2 );

That concatenates the string representation of the first 100,000 integers, producing the string "123456789101112131415...9999899999100000". It takes over three minutes on my laptop. Let's try TreeAggregate:

strings.TreeAggregate( (s1,s2) => s1+s2 );

Same answer in 55 milliseconds. Much better!

Of course, in this case I could have used good old StringBuilder, which is custom-made for such problems and takes only 14 milliseconds.

So, in conclusion, it might have been better to come up with a compelling example before writing a blog post. Oh well.


  1. Provided that the reducer function is associative.

Wednesday, September 23, 2009

Ruminations on programming languages

For a long while I was reasonably content with my main programming languages--mostly C++ and a bit of Python. They're both great languages (although C++ is also a terrible language, of course.) I've used a lot of different ones, and they all have their strengths and flaws. But none of them are perfect.

So as a side project, I attempted many times to design my idea of the ultimate programming language. It would combine the power of Lisp, the readability and whippitupitude of Python, and the performance of C with advanced static type safety and formally provable correctness. And a pony.

Somewhere along the way, though, I was temporarily struck by a bit of humility. I thought I should learn more about the state of the art in other programming languages, rather than risk reinventing the square wheel. So I learned about Oz and Erlang and Haskell and SML. And I discovered that, basically, the state of the art in programming languages is pretty damned good. So good, in fact, that my attempts to advance it started to look pretty lame.

But now I had a new problem: having spent some time using Erlang and Haskell for side projects, I couldn't stand programming in C++ any more. There were just too many things that were painful and verbose that would have been trivial in Haskell. But Haskell, beautiful though it is, still makes some things difficult that are easy in a stateful language. I wasn't totally happy there either. Python was still nice, but for larger projects I much prefer static typing. Basically no matter what language I was using, I was frustrated by things that should have been trivial, and would have been trivial in some other language.

This sorry state persisted for a few years, and it made programming at work not all that enjoyable any more. But lately I've been primarily using C#, and I've found to my surprise that it's actually making programming fun again. Yes, C# started life as a Java clone, but then it got a really good generics design in 2.0 and iterators, and a little bit of type inference and nice Lambda-function syntax in 3.0. Throw in the Enumerable extensions in the libraries, and you start to have pretty nice support for functional programming. And while it's not Haskell, generators and enumerators give you a little taste of that crazy control-flow power without going so far as to actually turn your brain inside out. It's quite nice.

In another post I'll explore some functional-style code in C#.

Sunday, December 7, 2008

Show me your ISO

If you've ever gone shopping for digital cameras, you'll know that the resolution of the image sensors keeps going up, and keeps being featured prominently on the packaging. Way back on the early days, this was good: you need at least one or two megapixels if you want to print photos at large sizes, and if you do any cropping before printing you'll lose a bit of resolution there. So the jump up from sub-megapixel resolution was a very good thing.

But we've long since passed the point where any more resolution is useful. Take a look at the PowerShot G10 (picked more or less at random). It takes a 14.7 megapixel picture (4416 x 3312 pixels). You could take a shot of a crowd, zoom in to one person's face, blow it up to poster size and it would still look fine. Except that it wouldn't, because by now you're probably past the ability of the optics to focus anyway. Enough with the megapixels already!

I know this isn't a new complaint. Nearly every review at dpreview.com complains about the manufacturers increasing resolution with every new camera, often at the expense of other aspects of image quality. The problem is that camera makers need a big number to put on the box to make the new camera seem better than the last one, so even if the extra resolution is useless for photography, it's good for marketing.

So here's a suggestion for the camera marketers: Put the light sensitivity on the box. In big print. Bigger print than the resolution. It's easy to measure, it gives you a nice big number to stick on there, and it would actually be useful information.

See, when I go shopping for a camera, that's the one thing I want to know: how good is it for low-light shots? I'm not a serious photographer; I just want to take pictures of my kids and vacation spots and the things like that. But I'm tired of having to use the flash in anything less than full daylight. I can guarantee that if could somehow find out which cameras were better in low-light conditions, that would influence my purchasing. But I can't, so it doesn't.

Thursday, December 4, 2008

Random things that annoy me

I've been putting off this post for ages, because even though it's a straightforward problem, I find it difficult to explain clearly. But I guess I'm never going to get the hang of that unless I start posting once in a while. So here goes.

The modulus operation in C, C++, C#, F#, Java, and a host of other programming languages is broken and stupid.

Here's how modulus mathematics works: you do some integer operation mod N, and the result stays in range [0,N). Where it would normally fall outside that range, it just "wraps around" to the other side. Simple.

0 + 1 (mod 3) = 1
1 + 1 (mod 3) = 2
2 + 1 (mod 3) = 0
and so on...

2 - 1 (mod 3) = 1
1 - 1 (mod 3) = 0
0 - 1 (mod 3) = 2

Now let's translate that into C:

(0 + 1) % 3 == 1
(1 + 1) % 3 == 2
(2 + 1) % 3 == 0
looks good so far...

(2 - 1) % 3 == 1
(1 - 1) % 3 == 0
(0 - 1) % 3 == -1
what the heck?

What's happening is that in C, (a % b) takes the sign of a, rather than the sign of b. There are historical reasons for C to act this way. For all the other languages, I think it falls more under "lack of thought." Or at least, "lack of giving a damn about your programming language."

The reason it matters is that we almost always want to constrain the result to a certain range, just like the mathematical modulus does. For example, suppose we want to add an offset to an array index:


// no good! offset might be negative
i = (i + offset) % array.size

// here's what you have to use instead.
i = ((i + offset) % array.size + array.size) % array.size

// or this:
i = (i + offset) % array.size
if(i < 0) i += array.size

The same problem shows up if you're manipulating days of the week, or angles that you want to constrain to a circle, or any other number you want to constrain to a certain range.

Now, one might argue that it's a tradeoff: sometimes you want one behavior, sometimes you want the other, and the language designer has to pick one. Except that in over twenty years of programming, and hundreds of places I've seen or written the modulus operator, I haven't yet encountered one case where the C behavior simplified the code. Sometimes it makes the code more ugly and complex and slow, sometimes it doesn't matter one way or other other; it's never actually better.

For C this behavior was forgivable, because it was just doing a direct mapping to the native division/modulus operation of whatever the underlying platform was, and in hardware it's easier to implement that way. But for all the later languages, boo hiss.

Just for the record, Python gets it right:


>>> (-1) % 3
2


Hooray!

And Haskell gives you both the useful version and the stupid one:

Prelude> (-1) `rem` 3
-1
Prelude> (-1) `mod` 3
2

Also, IEEE 754 (float) arithmetic can give you either behavior, depending in a sensible way on the rounding mode. Unfortunately most languages go to some pain to hide this, and make sure the fmod() function jumps through extra hoops to always return a stupid result.

Here's another way to look at it. Plotted, the modulus function looks like this:
plot of (a mod 10) for a in [30,30]
Nothing much to see, just little diagonals over and over to infinity. Here's the C mod function:
plot of (a mod 10) for a in [30,30] with stupid mod
Little diagonals repeated to infinity again, except... an arbitrary change at the origin. Why? Just to cause pain.

And that's all.

Friday, August 29, 2008

Pause this!

Try this: start watching video with the youtube video player (or other streaming video player) over a moderately slow internet connection—slow enough that the video playback goes faster than the download.

The video playback will of course catch up to the buffering every few seconds, so there's no more data and playback has to pause. There are at least a couple of reasonable options for the player UI now:
  1. Pause the video, just as if I'd hit the 'pause' button myself. Leave it paused until I hit play again.
  2. If you're not going to leave it paused, then don't change the damned play button to claim that you are. Leave the button alone, and leave it active so I can press it to pause the video for real.
But the youtube player doesn't go for either of those reasonable choices. Instead, it appears to pause the movie, changing the button to the 'paused' state, but doesn't really pause it; when more data comes in, playback resumes. Of course at this point, as a user, what I inevitably want to do is pause it and go do something else until the whole movie comes in so I can just watch it without interruption. But this is now impossible, because the button claims to be paused, so I can't pause it; but it's waiting for data, so I can't unpause it; all I can do is wait until it starts on its own, and then use my ninja-like reflexes to try to pause it myself before it runs out of data and fake-pauses again. This game is not fun.

The pause issue would at least be ameliorated if I could just drag the scrubber back a few seconds, to get a bit of playback time with which to battle the UI. But that's broken too: if I try that, odds are excellent that the playback will freeze for some mysterious network hijinx and I'll have to start all over. Why? The data should already be there. Just play it. Why on earth should that have any effect on the download process?

And finally: why is it that sometimes when the scrubber shows that the movie is buffered ahead of the current play position, it won't play until it buffers even farther? The whole point of the buffer indicator is to show me how many seconds of playback have been downloaded. If they aren't available, then don't show them.

These are rhetorical questions, by the way. There is no good reason for any of these behaviors.

(I pick on youtube only because they're popular. Most, though not quite all, other video players have the same problems.)