Computational interpretation of double negation elimination

Today's shower thought is: Is there a way to interpret double negation elimination as a program?

((a -> Void) -> Void) -> a

(Here, Void denotes the empty type.)

At first glance, it seems preposterous: how does one expect to produce a return value of a completely arbitrary type a merely from a function of type (a -> Void) -> Void, which doesn't even return a value of type a?

A hint comes from Peirce's law, also known as call-with-current-continuation, which has a similar form:

callCC :: ((a -> Cont r) -> Cont a) -> Cont a

(We use Cont to allow the code to be tested in a Haskell program.)

Peirce's law gives the program full control over its own state (i.e. stack frame). The application of callCC to a callee g :: (a -> Cont r) -> Cont a bestows upon it the power of a nonlocal jump:

example :: Cont String
example =
  callCC $ \ yield -> do
    yield "escaped"
    pure "stayed"

This would return "escaped" rather than "stayed". It's as if the inner function performed a goto/return/break/longjmp to the outer scope, passing the result along the way.

Turns out doubleNegationElimination can be expressed in terms of callCC:

doubleNegationElimination :: ((a -> Cont Void) -> Cont Void) -> Cont a
doubleNegationElimination doubleNegation =
  callCC $ \ yield -> do
    void <- doubleNegation $ \ x ->
      yield x -- goodbye!
    absurd void

From this, we can see the loophole: by passing in a continuation a -> Cont Void into the doubleNegation :: (a -> Cont Void) -> Cont Void, the program can capture and send the result to the caller through a non-local jump. The remaining part involving absurd is just to satisfy the type checker.

For example, if the doubleNegation was constructed like this:

doubleNegation :: (String -> Cont Void) -> Cont Void
doubleNegation negation = negation "strings do exist"

then doubleNegationElimination would be able to manifest the evidence "strings do exist" by capturing the value and then immediately jumping to the outer scope, avoiding the crash that would have arisen from evaluating a Void.

On a sidenote, callCC is crazy powerful. You can implement Python-style generators with them!

import Control.Monad.Cont (MonadCont, MonadIO, (<=<),
                           callCC, forever, liftIO, runContT)
import Data.Foldable (for_)
import Data.IORef (IORef)
import Data.Void (Void, absurd)
import qualified Data.IORef as IO

main :: IO ()
main = runContT generatorExample pure

generatorExample :: (MonadCont m, MonadIO m) => m ()
generatorExample = callCC $ \ stop -> do
  g <- newGenerator [0 .. 9 :: Integer]
  forever $ do
    result <- g
    case result of
      Nothing -> stop ()
      Just x  -> liftIO (print x)

newGenerator :: (MonadCont m, MonadIO m) => [a] -> m (m (Maybe a))
newGenerator xs = callCC' $ \ ret -> do
  -- rNext stores the function used to jump back into the generator
  rNext <- newIORef ($ Nothing)
  -- rYield stores the function used to jump out of the generator
  rYield <- newIORef <=< callCC' $ \ next -> do
    writeIORef rNext next
    ret $ do
      -- the generator itself is just a function that saves the current
      -- continuation and jumps back into the generator via rNext
      callCC' $ \ yield -> do
        readIORef rNext >>= ($ yield)
  for_ xs $ \ i -> do
    -- save the current continuation then jump out of the generator
    writeIORef rYield <=< callCC' $ \ next -> do
      writeIORef rNext next
      readIORef rYield >>= ($ Just i)
  readIORef rYield >>= ($ Nothing)

-- | This is just double negation elimination under a different name, which
-- turns out to be a little more convenient than 'callCC' here.
callCC' :: MonadCont m => ((a -> m Void) -> m Void) -> m a
callCC' action = callCC ((absurd <$>) . action)

newIORef :: MonadIO m => a -> m (IORef a)
newIORef value = liftIO (IO.newIORef value)

readIORef :: MonadIO m => IORef a -> m a
readIORef ref = liftIO (IO.readIORef ref)

writeIORef :: MonadIO m => IORef a -> a -> m ()
writeIORef ref value = liftIO (IO.writeIORef ref value)


The need to associate function pointers with environments

A friend once asked me a question like this:

void register_event_handler(void *f_ctx, void (*f)(void *ctx));

I'm a little confused about the purpose of f_ctx. Here, f is a handler function that gets called when the event is triggered, and f_ctx is − according to the documentation – some pointer argument that gets passed to f whenever it gets called. Why do we need f_ctx? Wouldn't f alone suffice?

This is a trick for low-level languages like C where functions are represented using a raw function pointer, which does not store an enclosing environment (sometimes called a context). It is not needed in higher-level languages with support for first-class functions, such as Python, as these languages allow functions to be nested inside other functions and will automatically store the enclosing environment within the function objects in a combination called a closure.

The need for an environment pointer f_ctx arises when you want to write a function that depends on external parameters not known at compile time. The f_ctx parameter allows you to smuggle these external parameters into f however you like.

It might be best to illustrate this with an example. Consider a 1-dimensional numerical integrator like this:

double integrate_1(
    double (*f)(double x), /* function to be integrated */
    double x1,
    double x2

This works fine if you know the complete form of the function f ahead of time. But what if this is not the case -- what if the function requires parameters? Say we want to calculate the gamma function using an integral:

double integrand(double x)
    double t = /* where do we get "t" from?? */;
    return pow(x, t - 1.0) * exp(-x);

double gamma_function(double t)
    /* how do we send the value of "t" into "integrand"? */
    return integrate_1(&integrand, 0.0, INFINITY) / M_PI;

Using integrand_1 there are only three ways to do this:

  1. Store t into a global variable, sacrificing thread safety. It would be bad to simultaneously call gamma_function from different threads as they will both attempt to use the same global variable.

  2. Use a thread-local variable, a feature not available until C11. At least it is thread-safe now, but it is still not reentrant.

  3. Write raw machine code to create an integrand on the fly. This can be implemented in a thread-safe and reentrant manner, but it is both inefficient, unportable, and inhibits compiler optimizations.

However, if the numerical integrator were to be re-designed like this:

double integrate_2(
    double (*f)(void *f_ctx, double x),
    void *f_ctx, /* passed into every invocation of "f" */
    double x1,
    double x2

Then there is a much simpler solution that avoids all of these problems:

double integrand(void *ctx, double x)
    double t = *(double *)ctx;
    return pow(x, t - 1.0) * exp(-x);

double gamma_function(double t)
    return integrate_2(&integrand, &t, 0.0, INFINITY) / M_PI;

This is thread-safe, reentrant, efficient, and portable.

As mentioned earlier, this problem does not exist in languages like Python where functions can be nested inside other functions (or rather, it is automatically taken care of by the language itself):

def gamma_function(t):
    def integrand(x):
        return x ** (t - 1.0) * math.exp(-x)
    return integrate(integrand, 0.0, math.inf) / math.pi


Sorting with a random comparator

Most sorting algorithms rely on the correct implementation of a comparison function that returns the ordering between two elements -- that is, a function that takes two elements and returns whether the first is less than, equal to, or greater than the second:

function compare(x, y) {
    if (x < y) {
        return LESS_THAN;
    } else if (x > y) {
        return GREATER_THAN;
    } else {
        return EQUAL;

The comparison function is required to define a valid total ordering on the elements in the sequence in order for the sorting algorithm to make any sense.

It is tempting to ask whether using a random comparison function would offer a clever way to shuffle a sequence. The short answer is no, generally speaking. It's really not a good idea to do that even if it does somehow magically work. The correct solution is to use the Fisher-Yates shuffling algorithm.

From an engineering perspective, sorting functions are designed to sort things. Abusing it to shuffle an array is kind of just asking for trouble, because you never know if a change in the sorting algorithm down the road might break the "shuffling" algorithm. Worse, the breakage could be subtle, creating biases in the output that might be hard to detect unless the tests rigorously verify the statistics. Moreover, some sorting algorithms can run unusually slow with a random comparator, or take a very long time if the randomness was not in their favor.

From a mathematical perspective, bounded-time deterministic comparison sorts just can't give you a high quality distribution of random permutations! In fact, if you use a random comparator with an equal probability of returning LESS_THAN or GREATER_THAN (a "uniformly random comparator"), it's provably impossible for a such an algorithm to shuffle correctly.

Suppose you have an array of size N. The probability of any element (say, the first element) ending up at any position (say, the last position) is always 1/N in a perfect shuffle.

At every step of a deterministic comparison sort algorithm, it must make some sort of binary decision based on the ordering between two selected elements, leading to a binary decision tree. The outcome of these decisions determine the final sorted output.

Now if you feed it a uniformly random comparator, the decisions made are always random. This means every decision has a 1/2 probability of going one way or another. This leads to a path p down the decision tree that eventually terminates after n[p] rounds, since we are assuming bounded time. Certain paths in this decision tree will lead to the scenario where the first element ends up at the last position. Thus, the overall probability of this scenario is the sum of the probability of every path p that leads to the desired scenario:

∑[p ∈ all_desired_paths] (1/2)^(n[p])

Adding up powers of 1/2 will always yield a fraction where the denominator is some power of 2. Since the correct probability is 1/N, unless N is also a power of two, there is simply no way for the result to be correct! Hence, there's no way to get the exact probability 1/N in general.

Edit: Chris P. pointed out that the algorithm must have bounded time for this proof to work. Thanks!