# Rufflewind

A notebook of random ideas.

# Tarball mix-up in the Haskell directory library

Release management can be a real pain.

I normally do all of these steps in a single pass with the aid of a script:

• Tag the commit using git-tag
• Create a GitHub Release on this tag.
• Upload the tarball generated by cabal sdist to Hackage.

However, this one instance I decided to do the Hackage release later out of caution. But this "cautionary measure" ended up causing more problems than it solved, because I went off-script and disrupted my usual tried-and-tested workflow.

Normally a GitHub Release includes:

• A tarball automatically generated by GitHub from the Git tree
• A tarball generated by running cabal sdist, attached as a "binary"

But somehow for version 1.2.6.1 my upload script failed to attach the sdist-generated tarball.

Later when I wanted to upload to Hackage, I thought it'd be a good idea to use the original sdist-generated tarball that contained the original timestamps on the files. So, I habitually downloaded the first tarball link I saw on the GitHub Release page for 1.2.6.1. Obviously that was not the sdist-generated tarball, it was the one auto-generated by GitHub.

At first, Hackage complained that the tar format was not ustar. That itself should've aroused suspicion, since cabal sdist always generates them in ustar format, but I didn't pay attention.* Instead, I extracted the tarball and re-made the archive in ustar format.

The end result was that I accidentally uploaded a GitHub-generated tarball instead of an sdist-generated tarball, which didn't work because the Autoconf-generated configure script was missing.

* I thought maybe GitHub was altering ("improving") tarballs behind our backs. In retrospect that was a very silly idea.

# A basic introduction to unique_ptr

C++11 introduced the unique_ptr template in the <memory> header. It provides a convenient way of managing memory (and more generally, lifetimes of arbitrary objects). It's a bit odd to use, so this article provides a very basic introduction to its semantics.

The article will not attempt explain its features in detail. You can find more info in the API documentation.

(The astute reader will notice that I'm using the same ownership terminology as Rust.)

## Scalar vs array

There are two kinds of unique_ptr, one for scalars (i.e. non-arrays) and one for arrays:

• unique_ptr<double> can hold a scalar of type double;
• unique_ptr<double[]> can hold an array of double values with an unknown number of elements.

Of course, one may substitute double with any arbitrary type.

## Construction

A std::unique_ptr variable can constructed using make_unique:

auto p1 = std::make_unique<double>(3.14);
auto p2 = std::make_unique<double[]>(n);

• The first line creates a unique_ptr to a double value 3.14 and saves it in the p1 variable.
• The second line creates a unique_ptr to a double array of n elements and saves it to in the p2 variable.

Note that make_unique requires C++14. If you want to stick to plain C++11 you can substitute those with:

std::unique_ptr<double>   p1(new double(3.14));
std::unique_ptr<double[]> p2(new double[n]());


## Ownership and uniqueness

A unique_ptr variable is said to be the owner of the object it points to: if the owner dies without passing its ownership to another variable, the object is deleted (and hence the memory is deallocated), thereby invalidating all pointers and references to it.

There can only be one owner at any one time: one does not simply copy a unique_ptr. For example:

// compile with: c++ -std=c++14
#include <memory>

void taker(std::unique_ptr<double[]>);

void uniqueness1()
{
// p is created and owns a section of memory containing 42 numbers
std::unique_ptr<double[]> p = std::make_unique<double[]>(42);

// compile error! (attempted to copy p to q)
std::unique_ptr<double[]> q = p;

// compile error! (attempted to copy p to the argument of taker)
taker(p);
}


Typical errors when this happens:

(g++)
error: use of deleted function ‘std::unique_ptr<…>::unique_ptr(const std::unique_ptr<…>&)
error: use of deleted function ‘… std::unique_ptr<…>::operator=(const std::unique_ptr<…>&)

(clang++)
error: call to deleted constructor of 'std::unique_ptr<…>'
error: overload resolution selected deleted operator '='


If an owner wants to lend the contents of unique_ptr to another function without relinquishing its ownership, it can give a pointer/reference to it:

// compile with: c++ -std=c++14
#include <memory>

void borrower1(double*);

void borrower2(std::unique_ptr<double[]>*);

void borrower3(std::unique_ptr<double[]>&);

void uniqueness2()
{
// p is created and owns a section of memory containing 42 numbers
std::unique_ptr<double[]> p = std::make_unique<double[]>(42);

// these are all okay but the first method is the most general
// as it accepts any kind of pointer, not just unique_ptr
borrower1(p.get());
borrower2(&p);
borrower3(p);

// p dies, so the array is deleted automatically
}


If an owner wants to yield its ownership of a unique_ptr to another function or variable, it can perform a move:

// compile with: c++ -std=c++14
#include <memory>
#include <utility> // required for std::move

void taker(std::unique_ptr<double[]>);

void uniqueness3()
{
// p is created and owns a section of memory containing 42 numbers
std::unique_ptr<double[]> p = std::make_unique<double[]>(42);

// move p to q
std::unique_ptr<double[]> q = std::move(p);

// moved q into the argument of taker
taker(std::move(q));

// p and q both die, but they no longer own anything so nothing happens here
}


## Resetting

Although normally a unique_ptr will automatically delete its object once it dies (or when it gains ownership of something else), one can hasten the process by manually calling .reset():

p.reset();


This will immediately delete the object pointed to by p and cause the p to become empty.

## Emptiness

A unique_ptr can be empty, in which case it owns nothing – the analog of a nullptr. This is the default state if is not explicitly initialized with something. It is also the state after it loses its ownership of something, or is forcifully emptied via .reset().

## Releasing

Normally, a unique_ptr automatically loses its ownership when it is moved to another unique_ptr. Alternatively, one can force it give up its ownership in the form of a raw pointer via .release():

double* raw_ptr = p.release();


After releasing, the pointer will not be freed automatically by the unique_ptr as it is now empty. The programmer is then responsible for deleting the pointer manually.

## Example

Here's a more complete example demonstrating the use of unique_ptr:

// compile with: c++ -std=c++14 -lopenblas
#include <cmath>
#include <cstdlib>
#include <iostream>
#include <memory> // required for std::unique_ptr, std::make_unique, std::size_t
#include <utility> // required for std::move
#include <cblas.h> // required for cblas_daxpy

// create an owned vector initialized to zero
std::unique_ptr<double[]> new_vector(std::size_t n)
{
return std::make_unique<double[]>(n);
}

// borrow one vector (v1), seize another (v2), and return an owned vector
// equal to their sum
std::size_t n,
const double* v1,
std::unique_ptr<double[]> v2)
{
// sum <- v2
std::unique_ptr<double[]> sum = std::move(v2);

// sum <- v1 + sum
cblas_daxpy(n, 1., v1, 1, sum.get(), 1);

// for obscure reasons (C++11 §12.8/32), using an explicit std::move here
// is actually optional, but we use it anyway for consistency and clarity;
return std::move(sum);
}

// borrow two vectors and return an owned vector equal to their sum
std::size_t n,
const double* v1,
const double* v2)
{
// v2_copy <- 0
std::unique_ptr<double[]> v2_copy = new_vector(n);

// v2_copy <- v2
cblas_dcopy(n, v2, 1, v2_copy.get(), 1);

}

double fibby(double i)
{
using std::pow;
using std::sqrt;
const double a = (1. + sqrt(5.)) / 2.;
const double b = (1. - sqrt(5.)) / 2.;
return (pow(a, i) - pow(b, i)) / sqrt(5);
}

// create an owned vector initialized to something fun
std::unique_ptr<double[]> example_vector(std::size_t n)
{
std::unique_ptr<double[]> v = new_vector(n);
for (std::size_t i = 0; i != n; ++i) {
v[i] = fibby(i);
}
return v;
}

// borrow a vector and check that the result is correct
void check_result(std::size_t n, const double* v)
{
for (std::size_t i = 0; i != n; ++i) {
// always use !(difference < epsilon)
// rather than (difference >= epsilon)
// so NaN can't sneak past the check
if (!(std::abs(v[i] - fibby(i) * 2.) < 1e-8)) {
std::cerr << "what a cruel, cruel world" << std::endl;
std::abort();
}
}
}

int main()
{
const std::size_t n = 1024;

std::unique_ptr<double[]> v1 = example_vector(n);
std::unique_ptr<double[]> v2 = example_vector(n);

// lend v1 and v2 to add_vectors
std::unique_ptr<double[]> sum = add_vectors(n, v1.get(), v2.get());

// lend sum to check_result
check_result(n, sum.get());

// reset sum (actually optional, since the following line will
// automatically cause this to happen anyway)
sum.reset();

// lend v1 and relinquish v2 to destructively_add_vectors

// lend sum to check_result
check_result(n, sum.get());

// v1 and sum are deleted automatically;
// yay no memory leaks \o/
}


When did my code break!?

That was my first reaction when I found this bug. Most of the tests failed. The program itself produced wrong results. No crashes though, and valgrind didn't detect any problems.

Why do I still not have continuous integration for this thing?!

It was strange that such an obvious bug went unnoticed for so long and broke a rather core part of the program. I don't even remember having touched that part of the code recently!

Where do I even begin to debug this problem?

I suppose there's always git-bisect. But I was in the middle of implementing something so I didn't really want to mess up the work tree (yes, I know there's git-stash, but I'm also lazy). Hence, I ran git-bisect on the laptop instead. I picked a random point in the not-too-distant past and ran the tests … "PASSED", and then started bisecting from there.

Alas, bisecting proved futile: all of the commits passed the tests. Strange … is it because of a faulty library? No, this part of the code doesn't even use any of the external libraries. Moreover, both my desktop and laptop run Arch Linux, so AFAIK the compilers and system libraries are the same too.

While bisecting, I made a subconscious note of the compiler used on the laptop: it was Clang, as specified in the makefile. I don't remember why I did that. Does it matter? Well, on the desktop – where I initially found this bug – I used GCC. Could that be the cause?

The answer turned out to be yes. So this is clearly a bug caused by optimization. But I have used GCC before, so it must be a recent thing.

"Recent" in a loose sense. The compiler was upgraded 2 months ago, as Pacman's logs indicate.

Finding code that worked for a few years suddenly fail because of a compiler upgrade is rather disheartening – who knows what other bugs lurk in the code? Then again, it could be a compiler bug too … but it seems unlikely that no-one noticed this same bug within the past 2 months.

Still, I thought it might be useful to at least pin down which optimization pass caused the problem. Let's start with the big hammer: tone down the -On level. The bug disappeared at -O0, so even a mere -O1 was enough to trigger the problem.

Now to figure out what kinds of optimizations -O1 turns on. Hopefully there's some useful info on the GCC documentation. It does list several optimizations that are enabled by -O1, but there's a fine print:

Depending on the target and how GCC was configured, a slightly different set of optimizations may be enabled at each -O level than those listed here. You can invoke GCC with -Q --help=optimizers to find out the exact set of optimizations that are enabled at each level.

So the list shown on the docs is actually not exhaustive, and also has a few that don't apply to my specific platform. Using the -Q --help=optimizers flag, I managed to produce a list of optimizations – there's apparently quite a lot! With a little of grep -v and some Emacs regexes I reduced the list down to a single, very long line of flags. It looked a little bit like this:

g++ … -faggressive-loop-optimizations -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim …


And that wasn't even a quarter of the flags!

But the bug did not appear. Did I miss a flag somewhere?

Oh, wait:

Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified.

Well then how am I supposed to specify explicitly which passes to turn on? The only way I could think of is to set -O1, and the explicitly negate all the optimizations that I don't want. That seems … backwards, but there seemed no other way. So now what I did was to write a long command like this:

g++ … -O1 -fno-aggressive-loop-optimizations -fno-asynchronous-unwind-tables -fno-auto-inc-dec -fno-branch-count-reg -fno-combine-stack-adjustments -fno-compare-elim …


and then comment out the optimizations that I want turned on. Mentally, this was kind of confusing and led to several mistakes as I tried to manually bisect the optimization flags to see who is to blame for breaking my code. But eventually I found that -finline and -ftree-dse both have to be turned on for the bug to manifest.

Oddly enough, the bug manifested in a different way. It caused a segmentation fault. It didn't used to do that. In retrospect, getting a segfault was much better than getting the wrong result, because it very likely crashes at exactly the location where the program went horribly wrong. Trying to figure out why I got the wrong result would've been much harder because it's hard to tell which part of a mathematical calculation goes wrong. OTOH, narrowing down the precise optimizations that led to this bug did not do much to actually guide me in finding the source of the bug. It was only in retrospect that I was able to explain why those two optimizations caused the code to crash.

One idea I tried – which did not work – was to compare the assembly with and without optimization. First off, I have no real assembly experience – I can barely remember the order of arguments in Intel vs AT&T syntax! Secondly, the assembly was over 30MiB in size! Even running a diff takes a while, and kdiff3 as usual is hopeless against such a large file. (Graphical diffs are a lot more pleasant to read.)

What did work was to try and pare down the program until only the buggy part was left. My main concern was to make sure the bug remains reproducible. Optimization bugs could be finnicky: they might disappear if you remove seemingly unrelated parts of the program. But I was lucky that the bug didn't go away. In fact it seemed very "robust" (in a twisted sense of that word).

As I tore up the program, I had to make an important decision: I found two independently buggy regions in the code; one of them caused a wrong result, while the other segfaulted. Which one should I keep?

Segfault seemed like a more obvious bug, so I went with that one instead. It was probably a good choice. The other one would probably've left me repeating the calculations on paper.

After tearing up the program, I tried examining the assembly again, but it was still quite big (~3MB). Kdiff3 took several minutes to run. While it was loading, I ran GDB with the hopes of catching the segfault in action.

It showed me a specific line of code in the source file where it crashed – that was quite helpful. But when I tried to print out the local variables, they were all optimized out. Trying to decipher the assembly led me nowhere either. All I figured out was that it tried read a double from memory and failed. Here's an artist's impression of the code:

auto&& x = std::get<0>(unmove(get_index_thing(p, a)))
arr1[a * m + b] = arr2[p][x * n + y]; // *CRASH*


I guess I could just do the old-fashioned approach of printfing the variables. It's worth a shot. I hoped that it would not cause the bug to disappear.

auto&& x = std::get<0>(unmove(get_index_thing(p, a)))
printf("%lu\n", x);
arr1[a * m + b] = arr2[p][x * n + y]; // *CRASH*


It did not, thankfully. The array index x was clearly garbage: a number that was over several million. Considering that I'm running an extremely small test case, that is far too high. Where did this garbage originate?

One thing I tried right away was to print the result (and also arguments) from inside the function get_index_thing. But in haste, I forgot that it's actually returning an std::tuple, which caused printf to print garbage as well (it was not the same garbage as before though, but I somehow missed that initially). This mistake led me on a wild-goose chase far elsewhere – I thought one of the earlier initialization steps was done incorrectly.

Eventually I found that that (a) printf with %lu does not work on a std::tuple, even if it has only a single-element, and (b) it's morning and I desperately need sleep.

But I'm so close, can't give up now! After correctly printing the value that is being returned in get_index_thing, I find that it is actually sane. So, somewhere between returning that result and printing it in the caller, something went horribly awry.

The unmove looked suspicious. I distinctly recall telling myself that unmove can be somewhat dangerous to use, but having not worked with this part of the code for so long, I'm a bit foggy on the intricate details of references.

Wait a second, it's being stored in an auto&&? Wouldn't that mean the value's lifetime ends immediately before the next line?

Uh-oh.

All I had to do was to change auto&& to auto and the code magically started working again. (Well, there are a few other places in the code where I committed the same offense, so I had to fix them too.)

Turns out, unmove is a bit of a red herring in this particular example, as removing it did not fix the problem. However, it definitely makes similar mistakes easier. Take, for example, this simple code which reproduces this bug:

template<class T>
T& unmove(T&& x)
{
return static_cast<T&>(x);
}

int main(void)
{
auto&& p = unmove(42);    // line #1
return p;                 // line #2
}


To make this more obvious, we can specialize the types for int. Note that the return type is int& which, combined with auto&&, leads to a deduced type of int&.

int& unmove(int&& x)
{
return static_cast<int&>(x);
}

int main(void)
{
int& p = unmove(42);      // line #1
return p;                 // line #2
}


When unmove is called, 42 is bound as a temporary that lasts until the end of that particular line. Thus, by the time line #2 is reached, the value 42 is gone. There are some exceptions to this rule that allow the lifetime of a reference to be extended, but none of those apply here because the value is being passed through an intermediary.

On the GCC 5.3.0, this causes the program to exit with 0, as opposed to 42. So it looks that GCC finally tightened the rules around the lifetime of a variable stored by reference. The -ftree-dse (dead store elimination) optimization probably decided that line #1 can be deleted since the variable p expires right way. In other words, it eliminated the buggy line of code entirely, in a literal sense!

As you know, undefined behavior can seriously ruin your day. In this case, it has ruined my night of sleep. It is now morning as I write this postmortem.

I still wonder though, why this problem go unnoticed for so long?