Introduction to the GNU Build System (Autotools)

This is intended to be a brief introduction to the GNU build system (Autotools) for beginners. The reader is assumed to have a basic understanding of the process of compilation and linking.

Outline

What is a build system?

Running the compiler and linker by hand can be a tedious chore when there are numerous files, as is typical in large projects. This is where the build system comes in – it automates the workflow to aid the developers as well as the the users in the process of building software. The process can be roughly divided into three stages:

  1. configuration: adapting the build process to the target system;
  2. compilation and linking: translating source code into machine code;
  3. installation: copying the final product to its final location.

Build systems assist in some or all of these steps.

What is Autotools?

Build systems come in all shapes and varieties. For this article we focus on the GNU build system, also known as Autotools, which is one of the oldest build systems and remains quite popular on Unix-like platforms.

An easy way to tell if the project uses Autotools is to look for the presence for its primary input files:

  • configure.ac, and
  • Makefile.am.

Note that Autotools is not a monolithic build system. Rather, it is more like a collection of different tools that work together to help build a project. Of these tools, the two most critical ones are:

Autoconf is a tool that generates a configure script for the configuration step. The script detects the available features of the target system and adjusts the build process where needed. The script is written in the shell language, but in a somewhat unusual way to maximize portability across different platforms. It is automatically generated from the input file (configure.ac) via the m4 macro language.

Automake is a tool that aids in the construction of makefiles. It consumes input files (Makefile.am) to generate makefile precursors, known as Makefile.in. These precursors are later consumed by the configure script to construct the complete makefiles, which may then be used to build and install the program.

Here is a highly simplified sketch of the typical Autotools workflow:

diagram of the Autotools workflow
diagram of the Autotools workflow

(A more detailed diagram as well as description be found on Wikipedia.)

  1. Initially, the developer runs autoreconf to generate the configure script from configure.ac and a Makefile.in from every Makefile.am (of which there may be several, hidden in the subdirectories):

    autoreconf -i

  2. Afterwards, the user runs ./configure to generate a Makefile from every Makefile.in.

    ./configure

Generally, when the user downloads a package, step 1 has already been done, hence the user only needs to run the configure script. The user does not need to invoke autoreconf unless the configure script is absent.

If you are interested in learning how to use Autotools in your own projects, consider reading the official docs or one of the many Autotools tutorials.

How do I build programs that use Autotools?

First, make sure you have changed the working directory to the project directory that contains the configure script.

Next, skim through the README and/or INSTALL documents to make sure that any required dependencies are installed. Note that if the dependencies are installed in an unusual location you may need to adjust the search paths for

This must be done before running configure (the next step), otherwise it may fail or link against the wrong library.

Finally, you can then build the program by following the traditional 3-step procedure:

./configure

make

make install

However, sometimes this fails, or perhaps, the default configuration is not what you want.

What should I do if ./configure fails?

The first thing to do is to look for the same error message in config.log. Just above the error message there will be a record of the last few actions configure tried to do but failed.

Most of the steps in a configure script are feature tests: they typically run a test program to find out if some feature is available and whether it has any specific quirks. If a test fails and there is no fallback plan, then the entire configure script is aborted. There will usually be an error message associated with this (e.g. a compiler error) but it is buried inside config.log.

Sometimes the reasons are not obvious, or the configure script may even give misleading reasons! Therefore, always dig into the log to uncover the true cause. It could be an improperly set environment variable, a missing library/tool, or a flag that needs to be overriden, etc.

In the worst case, the configure.ac may have been faulty and as result you may have to debug the configure script itself.

How do I customize configuration process?

To see what customizations are available, run

./configure --help

The options available will vary from project to project, so read it carefully. In my experience, the two most commonly used flags are:

  • --prefix=/some/path: sets the path to which the project will be installed. If not provided, the default is /usr/local, which means

    • executables are installed to /usr/local/bin,
    • libraries are installed to /usr/local/lib,
    • manuals are installed to /usr/local/man, and
    • additional files are installed to /usr/local/share.

    A commonly used alternative is the $HOME directory, useful if you don't have superuser privileges.

  • --with-somepackage=someflags: overrides the linker flag(s) used for somepackage. This is useful if you want to use a substitute library that the script is not aware of. Be sure to quote the argument if it contains special characters or consists of more than one flag.

    For example, if a project expects the BLAS library and you wish to use Intel MKL instead, you can specify --with-blas=-lmkl_rt.

Comments

Uninstalling Cabal packages

Edit (2015-08-10): the tools have been updated to fix a few bugs as well as incompatibilities with the latest GHC; they now also work on Cabal sandboxes.

Problems

A. There is no way to uninstall packages in Cabal

Currently, cabal-install doesn't know how to uninstall packages. The best you can do is to unregister the package with the GHC package-database manager, eliminating the package metadata from the database without removing the actual files.

There is a package called cabal-uninstall (and probably a few others) that remedies this to some extent, but it doesn't work if the package has already been unregistered.

Additionally, while ghc-pkg does warn you of dependency breakage, it doesn't provide an easy way to remove a package and all its dependents transitively.

B. Removing broken packages is a tedious chore

If for some reason your packages get broken, there's no easy way to clean up the mess.

On my system, this usually happens because Arch Linux upgrades some of the globally-installed packages, breaking swaths of user-installed packages in the process (or it could be due to a GHC upgrade).

To fix this you'd have to unregister all of the broken packages. And if you want to reclaim the space used by the dead packages, you'd have to dive into ~/.cabal and remove them manually.

Tools

To help fix the issues outlined above, here are some scripts to assist the process:

  • cabal-gc: a "garbage collector" that searches for any lingering packages whose files still remain but are no longer registered. (Note: it does not remove any executables in the bin directory.)

  • ghc-pkg-check: a thin wrapper around ghc-pkg check that suppresses the haddock warnings that no-one seems to care about.

  • ghc-pkg-unregister: a thin wrapper around ghc-pkg unregister that allows multiple packages to be unregistered in one call.

While it's possible to fully automate this process, it's best to do them one at a time to make sure nothing bad happens.

Tutorial

First off, if you want to manage packages in a sandbox, be sure to run this command prior to everything else below:

cabal [--sandbox-config-file=./cabal.sandbox.config] exec -- "$SHELL"

(The --sandbox-config-file=… flag is optional.)

Now,

  • if your goal is to uninstall a specific package as well as its dependents, then follow all the steps below;
  • if your goal is to only remove broken packages, then jump straight to step 2.

1. Remove the target packages

To remove packages, the first thing to do would be to run:

ghc-pkg-unregister my-package-0.5.0 some-package-1.0.0

If there are any dependent packages, you will receive an error telling you that those will be broken:

ghc-pkg: unregistering some-package-1.0.0 would break the following packages: other-package-2.0.0 another-package-3.0.0 (use --force to override)

If you're alright with this, add the --force flag to remove it for good. We'll eliminate the broken packages shortly.

ghc-pkg-unregister --force some-package-1.0.0

unregistering some-package-1.0.0 would break the following packages: other-package-2.0.0 another-package-3.0.0 (ignoring)

Despite the conditional mood in the message, the package will indeed be unregistered.

2. Remove the dependent packages

Now, we need to get rid of the broken packages. Let's see which packages are broken:

ghc-pkg-check

There are problems in package other-package-2.0.0: dependency "some-package-1.0.0-…" doesn't exist There are problems in package another-package-3.0.0: dependency "some-package-1.0.0-…" doesn't exist The following packages are broken, either because they have a problem listed above, or because they depend on a broken package. other-package-2.0.0 another-package-3.0.0

To remove all of these broken packages, run:

ghc-pkg-check --simple-output | xargs ghc-pkg-unregister --force

To make sure all of the broken packages are taken care of, run ghc-pkg-check again. If there are more broken packages, you'll need to keep repeating this step until all of them have been eliminated.

3. Delete the associated files

This step is optional, although it can be useful to reclaim some disk space.

Since we will be deleting files, it's best to perform a dry run first:

cabal-gc

can be removed: /home/user/.cabal/lib/x86_64-linux-ghc-7.10.1/anoth_d9K7jTHCBftCdWJFHesDel can be removed: /home/user/.cabal/lib/x86_64-linux-ghc-7.10.1/other_8j0AdSQVkKosS3ev8r3pBu can be removed: /home/user/.cabal/lib/x86_64-linux-ghc-7.10.1/mypac_XYf894wKraySpNs9RUwwMV can be removed: /home/user/.cabal/lib/x86_64-linux-ghc-7.10.1/somep_x3XTaLFeSmcMHNpv7Ng8xw can be removed: /home/user/.cabal/share/doc/x86_64-linux-ghc-7.10.1/another-package-3.0.0 can be removed: /home/user/.cabal/share/doc/x86_64-linux-ghc-7.10.1/other-package-2.0.0 can be removed: /home/user/.cabal/share/doc/x86_64-linux-ghc-7.10.1/my-package-0.5.0 can be removed: /home/user/.cabal/share/doc/x86_64-linux-ghc-7.10.1/some-package-1.0.0

If everything looks okay, go ahead and delete them for real:

cabal-gc -r

removed: /home/user/.cabal/lib/x86_64-linux-ghc-7.10.1/anoth_d9K7jTHCBftCdWJFHesDel removed: /home/user/.cabal/lib/x86_64-linux-ghc-7.10.1/other_8j0AdSQVkKosS3ev8r3pBu removed: /home/user/.cabal/lib/x86_64-linux-ghc-7.10.1/mypac_XYf894wKraySpNs9RUwwMV removed: /home/user/.cabal/lib/x86_64-linux-ghc-7.10.1/somep_x3XTaLFeSmcMHNpv7Ng8xw removed: /home/user/.cabal/share/doc/x86_64-linux-ghc-7.10.1/another-package-3.0.0 removed: /home/user/.cabal/share/doc/x86_64-linux-ghc-7.10.1/other-package-2.0.0 removed: /home/user/.cabal/share/doc/x86_64-linux-ghc-7.10.1/my-package-0.5.0 removed: /home/user/.cabal/share/doc/x86_64-linux-ghc-7.10.1/some-package-1.0.0

Comments

Bash on Windows is really slow

Bash is really slow on Windows. See for yourself:

# builtin command
time { :; }
# Linux:   0.000s
# Windows: 0.000s

# builtin command in a subshell
time { ( : ); }
# Linux:   0.001s
# Windows: 0.054s

# non-builtin command
time { date; }
# Linux:   0.002s
# Windows: 0.108s

Notice that running a non-builtin command can be two orders of magnitude more expensive on Windows than on Linux. Creating a subshell is also very slow. Running 10 commands could cost as much as a full second, if not more! This is why shell scripts such as configure scripts are incredibly slow on Windows.

The results here are from Cygwin, but those from Mingw are in the same ballpark. This is probably due to the fact that forking is inherently slow on Windows due to the lack of direct support from the OS, so there's little that can be done to avoid the performance penalty.

Moral of the story? If you are writing shell scripts for Windows, try to avoid making calls to non-builtin commands and avoid making subshells when possible. Unfortunately, there's not much you can do with only builtin commands. Furthermore, many operations, such as command substitution $(…) and pipelines … | … will implicitly spawn subshells. :(

On the bright side, printf is a builtin command in Bash, so at least you don't have to choose between printf and echo for performance reasons (the latter being a portability nightmare).

Comments