Weasel Breath: Random things that annoy me

I've been putting off this post for ages, because even though it's a straightforward problem, I find it difficult to explain clearly. But I guess I'm never going to get the hang of that unless I start posting once in a while. So here goes.

The modulus operation in C, C++, C#, F#, Java, and a host of other programming languages is broken and stupid.

Here's how modulus mathematics works: you do some integer operation mod N, and the result stays in range [0,N). Where it would normally fall outside that range, it just "wraps around" to the other side. Simple.


0 + 1 (mod 3) = 1
1 + 1 (mod 3) = 2
2 + 1 (mod 3) = 0

and so on...


2 - 1 (mod 3) = 1
1 - 1 (mod 3) = 0
0 - 1 (mod 3) = 2

Now let's translate that into C:


(0 + 1) % 3 == 1
(1 + 1) % 3 == 2
(2 + 1) % 3 == 0

looks good so far...


(2 - 1) % 3 == 1
(1 - 1) % 3 == 0
(0 - 1) % 3 == -1

what the heck?

What's happening is that in C, (a % b) takes the sign of a, rather than the sign of b. There are historical reasons for C to act this way. For all the other languages, I think it falls more under "lack of thought." Or at least, "lack of giving a damn about your programming language."

The reason it matters is that we almost always want to constrain the result to a certain range, just like the mathematical modulus does. For example, suppose we want to add an offset to an array index:


// no good! offset might be negative
i = (i + offset) % array.size

// here's what you have to use instead.
i = ((i + offset) % array.size + array.size) % array.size

// or this:
i = (i + offset) % array.size
if(i < 0) i += array.size

The same problem shows up if you're manipulating days of the week, or angles that you want to constrain to a circle, or any other number you want to constrain to a certain range.

Now, one might argue that it's a tradeoff: sometimes you want one behavior, sometimes you want the other, and the language designer has to pick one. Except that in over twenty years of programming, and hundreds of places I've seen or written the modulus operator, I haven't yet encountered one case where the C behavior simplified the code. Sometimes it makes the code more ugly and complex and slow, sometimes it doesn't matter one way or other other; it's never actually better.

For C this behavior was forgivable, because it was just doing a direct mapping to the native division/modulus operation of whatever the underlying platform was, and in hardware it's easier to implement that way. But for all the later languages, boo hiss.

Just for the record, Python gets it right:


>>> (-1) % 3
2

Hooray!

And Haskell gives you both the useful version and the stupid one:


Prelude> (-1) `rem` 3
-1
Prelude> (-1) `mod` 3
2

Also, IEEE 754 (float) arithmetic can give you either behavior, depending in a sensible way on the rounding mode. Unfortunately most languages go to some pain to hide this, and make sure the fmod() function jumps through extra hoops to always return a stupid result.

Here's another way to look at it. Plotted, the modulus function looks like this:

Nothing much to see, just little diagonals over and over to infinity. Here's the C mod function:

plot of (a mod 10) for a in [30,30] with stupid mod

Little diagonals repeated to infinity again, except... an arbitrary change at the origin. Why? Just to cause pain.

And that's all.

Weasel Breath

Thursday, December 4, 2008

Random things that annoy me

No comments:

Archive

About Me