Using libraries

Note: This article is somewhat biased toward Linux-like environments.

Native vs non-native

  • Native libraries: these are the ones you get through native compilation. This applies to languages such as C, C++, Fortran. These libraries are interoperable.

    This category also applies to modern natively-compiled languages such as Haskell or Rust, but I’m not as familiar with those so I won’t discuss them here.

  • Non-native libraries: these are made of source code or bytecode that isn’t executed directly on the CPU. Examples include C#, Java, Perl, Python, Ruby, etc. Non-native libraries are usually highly language-specific and not interoperable with each other.

This article is only concerned with native libraries, specifically of the older family (C, C++, Fortran).

Parts of a library

Libraries are generally divided into two components:

  • The library file, which contains mostly machine code. This is the essential library implementation.

    The file extensions are usually:

    • .a/.so (ELF OSes like Linux),
    • .a/.dylib (OS X),
    • .lib/.dll (Windows).

    The former of each pair are file extensions for static libraries, whereas the latter of each pair are for shared libraries a.k.a. dynamically-linked libraries (DLLs). The distinction is explained in the next section.

  • The library interface, which is a protocol that users of the library must abide by in order to use the library correctly. Failing to do so will often lead to crashes (segfaults) and brokenness.

    For C and C++, this is usually distributed as a header file (.h, .hpp, etc). There are ways to use a library even without a header file, but it is usually discouraged as it is error-prone. There is one exception though: non-C/C++ programs (e.g. a Rust program) that want to use a C/C++ library often don’t require header files.

    For Fortran, the library interface is usually just the source code itself, or interchangeably the .mod files.

Static vs shared libraries

  • In static libraries, the library’s own code is copied and merged with the user’s code, creating one combined file. This means after compilation, the static library file itself is no longer needed nor used. (Of course, you probably want to keep it in case you want to compile more programs with that same library.)

    Note that static libraries are quite dumb: you cannot link more than one copy of the same static library into the same program. You might wonder “Why would anyone do that anyway?” Well, it’s often unintentional: say you use library A and library B, both of which use library C. If this whole thing is linked statically, then both A and B will bring in their own copy of C!

  • In shared libraries, the library’s own code remains a separate file. This means later when the program runs, it must load the library as a separate step. If the shared library can’t be found the program will fail to launch. If the shared library is upgraded or changed, then the behavior of any program that depends on that library may change (this can be both useful and annoying).

    Even though shared libraries are never incorporated into the final program, they are still needed during compilation because the compiler still needs to be aware of the library’s interface. On non-Windows systems, the compiler simply reads the shared library itself (.so or .dylib) during compilation. On Windows, compilation requires a so-called import library, which has the extension .lib (not to be confused with static libraries on Windows, which also have the same extension).

Installing a library

The first step to use a library is to install it, obviously. The process varies a lot. It’s best to read the documentation for that specific library. One of the most crucial things to figure out is where the library is going to be installed.

(Another crucial thing is to find out if the library has optional features, because they might be off by default, or turned off if the configuration step fails to find its dependencies!)


On Unix-like OSes, libraries are usually installed to the /usr/local prefix by default. This prefix means that all of the library’s relevant files will be installed to /usr/local/lib (library files), /usr/local/include (header files), /usr/local/bin (executable programs), etc. To change the prefix, the process varies depending on the build system used by the library:

  • For Autoconf-like build systems, this can often be changed through the ./configure script. The command would be something like

    ./configure --prefix=/my_custom_prefix
  • For the CMake build system, this can be changed using

    cmake -DCMAKE_INSTALL_PREFIX=/my_custom_prefix .

It is considered bad practice to install libraries to the /usr prefix as that lies within the territory of the system package manager.

Including a library

To use the library in your C or C++ code, you’d likely have added

#include <some_library_header.h>

to various places in your source code to inform the compiler about the library interface (by forward declaring various types, functions, and variables). These header files are found under the include directory of wherever the library was installed, usually.

If the library header files exist in a standardized location such as /usr/include, then the compiler will find them just fine. But if you installed them in an unusual location like /my_custom_prefix/include, then you’ll have to give a hint to your compiler so it knows where to look. This can be accomplished via the -I flag:

cc -c -I/my_custom_prefix/include my_program.c

(Replace cc with whatever compiler you use.) An alternative approach is to use the C_INCLUDE_PATH (C) and/or CPLUS_INCLUDE_PATH (C++) environment variables:

export C_INCLUDE_PATH=/my_custom_prefix/include
cc -c my_program.c

You can have multiple paths in the variable using colon (:), analogous to PATH. These environment variables may not be recognized by all compilers, however. I know it works for clang and gcc at least.

Linking with a library

When all the source files have been compiled with -c, you now need to link everything together, including any libraries you use. This is done using the -l flag:

cc my_program.o my_blah.o my_foo.o -lalpha -lbeta

The word that follows -l is the name of the library. If the library file is, then its name is just alpha.

If alpha and/or beta are at an unconventional location /my_custom_prefix/lib, then you have to pass in a flag -L to tell the compiler where they could be found:

cc -L/my_custom_prefix/lib my_program.o my_blah.o my_foo.o -lalpha -lbeta

You can also do this using another colon-separated environment variable LIBRARY_PATH:

export LIBRARY_PATH=/my_custom_prefix/lib
cc my_program.o my_blah.o my_foo.o -lalpha -lbeta

Again, I think only some compilers recognize this variable.

Loading a shared library

The last step is to make sure the program can actually load the shared library. Usually this is automatic, but if the library is in an unconventional place like /my_custom_prefix/lib, then you gotta give it another hint using yet-another colon-separated environment variable LD_LIBRARY_PATH (or DYLIB_LIBRARY_PATH if you’re using OS X):

export LD_LIBRARY_PATH=/my_custom_prefix/lib

Alternatively, you can bake the library path directly into the program so you don’t need an environment variable to run the program. This is the -rpath flag:

cc -Wl,-rpath=/my_custom_prefix/lib my_program.o my_blah.o my_foo.o -lalpha -lbeta

You can specify a path relative to the location of the program itself using the ${ORIGIN} placeholder. Note that this is not a shell variable, so be sure to single-quote it in the shell:

cc -Wl,-rpath='${ORIGIN}/lib'


Two common problems with Git submodules

Git submodules are useful, but their UX is a bit intrusive for users who aren’t even interacting with the submodules. Here’s a list of the two common ones I often run into. (Let me know if there’s anything else that folks often run into!)

I cloned a repo containing submodules, but there’s nothing in them!

This happens if you clone a submodule without the --recursive flag. The solution is to run

git submodule update --init --recursive

which will initialize everything.

I figure the main reason this is not the default is that cloning submodules can waste a lot of time, disk space, and/or bandwidth if you don’t actually need the submodules.

Why are submodules showing up in git status even though I never touched them?

This can happen after git pull, git checkout, or git reset, which change the working tree but do not update the submodules, causing them to lag behind. In git status, the problem manifests as:

modified:   mysubmodule (new commits)

If you are sure you didn’t change any file within the submodules, you can update them using

git submodule update --recursive

I’m not sure why this isn’t the default. Perhaps it’s to avoid accidentally losing changes within the submodules?


Graphical depiction of ownership and borrowing in Rust

Below is a graphical depiction of moving, copying, and borrowing in the Rust language. Most of these concepts are fairly specific to Rust and are therefore a common stumbling block for many learners.

To avoid clutter in the graphics, I have tried to keep the text to a minimum. It isn’t meant to be a replacement for the various tutorials out there but more of a different perspective for programmers who prefer to grok concepts visually. If you are learning Rust and find these graphics helpful, I would recommend annotating your own code with such diagrams to help solidify the concepts :)

Ownership and borrowing in Rust

You can zoom in by clicking the image. You can also get it as an SVG or PDF.

The upper two figures depict the two main kinds of semantics for data that you own: either move semantics or copy semantics.

  • The picture on move semantics (⤳) looks almost too simple. There is no deception here: move semantics are strange only because most languages allow variables to be used as many times as the programmers please. This stands in contrast to much of the real world: I can’t just give someone my pen and still use it for writing! In Rust, any variable whose type does not implement the Copy trait has move semantics and would behave as shown.
  • Copy semantics (⎘) are reserved for types that do implement the Copy trait. In this case, every use of the object would result in a copy, as shown by the bifurcation.

The central two figures depict the two ways in which you can borrow an object you own, and what each one offers.

  • For mutable borrowing, I used a lock symbol (🔒) to signify that the original object is effectively locked for the duration of the borrow, rendering it unusable.
  • In contrast, for non-mutable borrowing I used a snowflake symbol (❄) to indicate that the original object is only frozen: you can still take more non-mutable references, but you cannot move or take mutable references of it.

In both figures, 'ρ is a name I have chosen for the lifetime of the references. I used a Greek letter on purpose because there is no syntax for concrete lifetimes in Rust, currently.

The last two figures summarize the key differences and similarities between the two kinds of references, both pictorally and in text form. The “exteriorly” qualifier is important, since you can still have interior mutability through Cell-like things.