Undefined behavior

When the C (or C++) standard says "doing X is undefined behavior", they're not telling you that your program will exit with an error, corrupt your memory, or even crash for that matter. These are only a few of ways in which undefined behavior can manifest.

There is a reason they are called "nasal demons". In principle, undefined behavior could wipe your entire hard drive.

Of course, in the real world, compilers are too lazy to actually make your program erase the disk. Instead, they tend to take simpler route and assume that undefined behavior is "impossible".

It doesn't matter if it's actually not impossible. The compilers don't care. They want the most optimal code, and they will get the most optimal code even if it means wreaking havoc on your program.

Consider the following example:

#include <stdio.h>

int *getnumptr(int condition)
{
    static int n = 42;
    if (condition) {
        return NULL;
    }
    return &n;
}

int main(int argc, char **argv)
{
    int condition = argc - 1;
    int num = *getnumptr(condition);
    printf("%i\n", num);
    return 0;
}

Suppose you don't pass any arguments to this program (i.e. argc == 1). In this case, the pointer returned by getnumptr points to n, which contains 42, so you should expect 42 to be printed.

However, if you do pass arguments to the program (i.e. argc > 1), then you're in the realm of undefined behavior. On some compilers, the program will crash with a segmentation fault, which is probably what most people expect.

On less forgiving compilers however, the program may simply print 42.

How could one get a 42 by dereferencing a NULL pointer?

Well, the compiler was free to assume that the situation of dereferencing a NULL pointer was "impossible", and hence didn't bother to emit the code for that case at all. The if block was entirely elided from the generated assembly. All that remained was:

# generated by clang 3.7 with -O2
    movl    getnumptr.n(%rip), %esi

When they say undefined behavior means anything goes, they really mean it.

This article is brought to you by undefined behavior.

Comments

Haskell library: bound

Learning how Edward Kmett's bound library worked was quite a challenge for me. Even though there is a nice tutorial, I found it rather difficult to penetrate – the puns, while humorous, didn't help either. Moreover, polymorphic recursion itself is a rather strange concept that takes some getting used to.

In the end, I had to read through code very carefully in order to understand the inner workings of the library. In the process, I made some sketches to guide myself. Not surprisingly, the pictures themselves explain the concept much more clearly than the code or the text! (At least I found that to be the case, YMMV.)

Here's a picture that shows an expression tree in untyped lambda calculus. Click to zoom in! (You can also get the SVG version.)

A random expression tree represented using the bound library.

There are a couple interesting things to note:

  • Important observation: for a given expression there are multiple distinct representations! This is how the library manages to be efficient (what Edward refers to as "you can lift whole trees"). However, this is for all intents and purposes an implementation detail: whenever you test for equality using (==), the representation is automatically canonicalized. Note that the tree show above is not canonical.

  • The F node negates the effect of the enclosing Scope on its children . Nested F nodes will negate additional nested Scopes, until there are no more Scopes to negate. This is enforced by the type system, so you can't screw this up by accident.

    F nodes can be pushed towards the leaves: in doing so, it replicates itself whenever it encounters a fork in the tree. Pushing every F node to the leaves produces the canonical representation.

  • The B leaf binds to the prevailing Scope − i.e. the innermost scope that isn't negated. If all Scopes have been negated, then B leaves cannot appear (also enforced by the type system).

    If the type variable b is not (), then one needs to also specify the which subvariable of the prevailing Scope it should bind to. This is used to bind multiple variables simultaneously.

  • Monadic bind (>>=) applies the given function to every variable and grafts the result in its place. This is used in substitution, a.k.a. instantiation.

  • Lifting creates a Scope from an existing expression. In traditional implementations based on de Bruijn indices, this would require a traversal of the entire expression, but in the bound library this is done simply by wrapping the entire tree inside F, which negates the effect of the enclosing Scope.

    Lifting occurs whenever an expression needs to be substituted into something that is deeper in scope.

Comments

Löb and möb in JavaScript

Strange loops in Haskell, but now in JavaScript! :D

The name of this function probably comes from Löb's theorem, which is the mathematical analogue of the loeb function.

You may or may not want to read up on the original Haskell version and commentary by David Luposchainsky first.

For the adventurous, here's a JSFiddle of this entire article (results will only appear in the developer console).

Remark about lazy evaluation

Since JavaScript uses strict/eager evaluation, lazy values must be indicated explicitly. We denote them here using the Lazy prefix:

-- Haskell
type Lazy a = () -> a

This makes the type signatures slightly more complicated than usual.

Löb

First, let's look at the loeb function, specialized for arrays. The original lazy Haskell definition looked like this when written out fully:

-- Haskell
loeb :: [[a] -> a] -> [a]
loeb formulas = results
  where evaluate formula = formula results
        results = map evaluate formulas

In an eager language, this needs to be modified slightly:

-- Haskell
loeb :: [[Lazy a] -> a] -> [Lazy a]
loeb formulas = results
  where evaluate formula () = formula results
        results = map evaluate formulas

This has the slight downside of having to manually force out the results afterwards.

Translating into JavaScript is straightforward, but the code is uglier:

function loeb(formulas) {
    function evaluate(formula) { return function() {
        return formula(results);
    } }
    var results = formulas.map(evaluate);
    return results;
}

The typical use for this function is as a spreadsheet evaluator. Here's an example of a simple 3-cell spreadsheet:

A1:       2
A2:       3
A3:  =A1+A2
console.log("spreadsheet result: ", loeb([
    function(cells) { return 2; },
    function(cells) { return 3; },
    function(cells) { return cells[0]() + cells[1](); }
]).map(function(x){ return x(); }))

The .map(function(x){ return x(); }) forces every cell to be evaluated. You should see [2, 3, 5] in the console.

Möb

The loeb function can be generalized by adding an explicit map parameter, allowing our choice of the mapping function.

The Haskell type signature looks quite scary, even though the implementation is virtually identical:

-- Haskell
moeb :: (((t -> a) -> Lazy a) -> s -> t) -> s -> t
moeb map formulas = results
  where evaluate formula () = formula results
        results = map evaluate formulas

But since we're in JavaScript-land we don't care about type signatures. We just copy-paste our implementation, but allow the map function to be chosen arbitrarily.

function moeb(map) { return function (formulas) {
    function evaluate(formula) { return function() {
        return formula(results);
    } }
    var results = map(evaluate)(formulas);
    return results;
} }

It's easy to recover loeb from moeb. We just need to define a helper function (literally just a curried version of .map).

function map(func) { return function(array) {
    return array.map(func);
} }

Then, we have that:

var the_same_loeb = moeb(map);

See for yourself :p

We can also define the fixpoint combinator (a.k.a. Y combinator). But first we need another helper function, the identity function.

function id(x) {
    return x;
}

Then the fixpoint combinator is just:

var fix = moeb(id);

Here's an example of using the fixpoint combinator to implement the factorial function. Not the sanest way of implementating factorials, I might add.

var factorial = fix(function(fac) { return function(x) {
    return x > 0 ? x * fac()(x - 1) : 1;
} })();
console.log("factorial of 3 is", factorial(3));

Comments