Linking Rust crates to native libraries

2017-11-19

rust

As far as I know, there’s two ways to link native libraries in a Rust package:

Attributes: #[link(name = "…")]
Build scripts: cargo:rustc-link-lib=dylib=…

You can also pass the linker flags directly to rustc, but that’s a bit too low level for packages.

The attribute approach is the easiest, but it’s also fairly inflexible. Once the attribute is set in the upstream FFI library, dependents downstream have no control over it. This is usually OK for most libraries because the name is fixed and known ahead of time.

However, in cases where the name is not known ahead of time (e.g. platform-dependent libraries), or cases where you want to offer the choice of library to the final application, build scripts offer far more flexibility.

One might be tempted to omit the #[link(…)] attribute in the upstream FFI library, and let the downstream application set the attribute, but this doesn’t work as one might expect. This is due to rust-lang/rust#28605, which causes the linker flags to be ordered like this:

cc downstreamapp -l somenativelib upstreamrustlib

Because the native library appears before the upstream library, symbols in the native library are not available for the upstream library.

Fortunately, in build scripts the custom linking flags are always placed at the end. For convenience, one would often place this in a separate crate that contains nothing but a build script like this:

fn main() {
    println!("cargo:rustc-link-lib=dylib=somenativelib");
}

Comments

Cheatsheet for Futures

2017-05-15

rust

A simplified listing of the various Future combinators in futures-rs (expanded version):

// Constructing leaf futures
fn empty ()             -> Future<T, E>
fn ok    (T)            -> Future<T, E>
fn err   (E)            -> Future<T, E>
fn result(Result<T, E>) -> Future<T, E>

// General future constructor
fn poll_fn(FnMut(thread_local!(Task)) -> Poll<T, E>) -> Future<T, E>

// Mapping futures
fn Future::map     (Future<T, E>, FnOnce(T) -> U) -> Future<U, E>
fn Future::map_err (Future<T, E>, FnOnce(E) -> F) -> Future<T, F>
fn Future::from_err(Future<T, Into<E>>)           -> Future<T, E>

// Chaining (sequencing) futures
fn Future::then    (Future<T, E>, FnOnce(Result<T, E>) -> IntoFuture<U, F>) -> Future<U, F>
fn Future::and_then(Future<T, E>, FnOnce(T)            -> IntoFuture<U, E>) -> Future<U, E>
fn Future::or_else (Future<T, E>, FnOnce(E)            -> IntoFuture<T, F>) -> Future<T, F>
fn Future::flatten (Future<Future<T, E>, Into<E>>)                          -> Future<T, E>

// Joining (waiting) futures
fn Future::join (Future<T, E>, IntoFuture<U, E>)                                                       -> Future<(T, U),          E>
fn Future::join3(Future<T, E>, IntoFuture<U, E>, IntoFuture<V, E>)                                     -> Future<(T, U, V),       E>
fn Future::join4(Future<T, E>, IntoFuture<U, E>, IntoFuture<V, E>, IntoFuture<W, E>)                   -> Future<(T, U, V, W),    E>
fn Future::join5(Future<T, E>, IntoFuture<U, E>, IntoFuture<V, E>, IntoFuture<W, E>, IntoFuture<X, E>) -> Future<(T, U, V, W, X), E>
fn join_all     (IntoIterator<IntoFuture<T, E>>)                                                       -> Future<Vec<T>,          E>

// Selecting (racing) futures
fn Future::select (Future<T, E>, IntoFuture<T, E>) -> Future<(T, Future<T, E>), (E, Future<T, E>)>
fn Future::select2(Future<T, E>, IntoFuture<U, F>) -> Future<Either<(T, Future<U, F>), (U, Future<T, E>)>, Either<(E, Future<U, F>), (F, Future<T, E>)>>
fn select_all     (IntoIterator<IntoFuture<T, E>>) -> Future<(T, usize, Vec<Future<T, E>>), (E, usize, Vec<Future<T, E>>)>
fn select_ok      (IntoIterator<IntoFuture<T, E>>) -> Future<(T, Vec<Future<T, E>>), E>

// Utility
fn lazy         (FnOnce() -> IntoFuture<T, E>)             -> Future<T, E>
fn loop_fn      (S, FnMut(S) -> IntoFuture<Loop<T, S>, E>) -> Future<T, E>
fn Future::boxed(Future<T, E>+Send+'static)                -> Future<T, E>+Send+'static

// Miscellaneous
fn Future::into_stream   (Future<T, E>)            -> Stream<T, E>
fn Future::flatten_stream(Future<Stream<T, E>, E>) -> Stream<T, E>
fn Future::fuse          (Future<T, E>)            -> Future<T, E>
fn Future::catch_unwind  (Future<T, E>+UnwindSafe) -> Future<Result<T, E>, Any+Send>
fn Future::shared        (Future<T, E>)            -> Future<SharedItem<T>, SharedError<E>>+Clone
fn Future::wait          (Future<T, E>)            -> Result<T, E>

Comments

Using libraries

2017-02-25

Note: This article is somewhat biased toward Linux-like environments.

Native vs non-native

Native libraries: these are the ones you get through native compilation. This applies to languages such as C, C++, Fortran. These libraries are interoperable.

This category also applies to modern natively-compiled languages such as Haskell or Rust, but I’m not as familiar with those so I won’t discuss them here.
Non-native libraries: these are made of source code or bytecode that isn’t executed directly on the CPU. Examples include C#, Java, Perl, Python, Ruby, etc. Non-native libraries are usually highly language-specific and not interoperable with each other.

This article is only concerned with native libraries, specifically of the older family (C, C++, Fortran).

Parts of a library

Libraries are generally divided into two components:

The library file, which contains mostly machine code. This is the essential library implementation.

The file extensions are usually:
- .a/.so (ELF OSes like Linux),
- .a/.dylib (OS X),
- .lib/.dll (Windows).
The former of each pair are file extensions for static libraries, whereas the latter of each pair are for shared libraries a.k.a. dynamically-linked libraries (DLLs). The distinction is explained in the next section.
The library interface, which is a protocol that users of the library must abide by in order to use the library correctly. Failing to do so will often lead to crashes (segfaults) and brokenness.

For C and C++, this is usually distributed as a header file (.h, .hpp, etc). There are ways to use a library even without a header file, but it is usually discouraged as it is error-prone. There is one exception though: non-C/C++ programs (e.g. a Rust program) that want to use a C/C++ library often don’t require header files.

For Fortran, the library interface is usually just the source code itself, or interchangeably the .mod files.

Static vs shared libraries

In static libraries, the library’s own code is copied and merged with the user’s code, creating one combined file. This means after compilation, the static library file itself is no longer needed nor used. (Of course, you probably want to keep it in case you want to compile more programs with that same library.)

Note that static libraries are quite dumb: you cannot link more than one copy of the same static library into the same program. You might wonder “Why would anyone do that anyway?” Well, it’s often unintentional: say you use library A and library B, both of which use library C. If this whole thing is linked statically, then both A and B will bring in their own copy of C!
In shared libraries, the library’s own code remains a separate file. This means later when the program runs, it must load the library as a separate step. If the shared library can’t be found the program will fail to launch. If the shared library is upgraded or changed, then the behavior of any program that depends on that library may change (this can be both useful and annoying).

Even though shared libraries are never incorporated into the final program, they are still needed during compilation because the compiler still needs to be aware of the library’s interface. On non-Windows systems, the compiler simply reads the shared library itself (.so or .dylib) during compilation. On Windows, compilation requires a so-called import library, which has the extension .lib (not to be confused with static libraries on Windows, which also have the same extension).

Installing a library

The first step to use a library is to install it, obviously. The process varies a lot. It’s best to read the documentation for that specific library. One of the most crucial things to figure out is where the library is going to be installed.

(Another crucial thing is to find out if the library has optional features, because they might be off by default, or turned off if the configuration step fails to find its dependencies!)

Prefixes

On Unix-like OSes, libraries are usually installed to the /usr/local prefix by default. This prefix means that all of the library’s relevant files will be installed to /usr/local/lib (library files), /usr/local/include (header files), /usr/local/bin (executable programs), etc. To change the prefix, the process varies depending on the build system used by the library:

For Autoconf-like build systems, this can often be changed through the ./configure script. The command would be something like
```
./configure --prefix=/my_custom_prefix
```
For the CMake build system, this can be changed using
```
cmake -DCMAKE_INSTALL_PREFIX=/my_custom_prefix .
```

It is considered bad practice to install libraries to the /usr prefix as that lies within the territory of the system package manager.

Including a library

To use the library in your C or C++ code, you’d likely have added

#include <some_library_header.h>

to various places in your source code to inform the compiler about the library interface (by forward declaring various types, functions, and variables). These header files are found under the include directory of wherever the library was installed, usually.

If the library header files exist in a standardized location such as /usr/include, then the compiler will find them just fine. But if you installed them in an unusual location like /my_custom_prefix/include, then you’ll have to give a hint to your compiler so it knows where to look. This can be accomplished via the -I flag:

cc -c -I/my_custom_prefix/include my_program.c

(Replace cc with whatever compiler you use.) An alternative approach is to use the C_INCLUDE_PATH (C) and/or CPLUS_INCLUDE_PATH (C++) environment variables:

export C_INCLUDE_PATH=/my_custom_prefix/include
cc -c my_program.c

You can have multiple paths in the variable using colon (:), analogous to PATH. These environment variables may not be recognized by all compilers, however. I know it works for clang and gcc at least.

Linking with a library

When all the source files have been compiled with -c, you now need to link everything together, including any libraries you use. This is done using the -l flag:

cc my_program.o my_blah.o my_foo.o -lalpha -lbeta

The word that follows -l is the name of the library. If the library file is libalpha.so, then its name is just alpha.

If alpha and/or beta are at an unconventional location /my_custom_prefix/lib, then you have to pass in a flag -L to tell the compiler where they could be found:

cc -L/my_custom_prefix/lib my_program.o my_blah.o my_foo.o -lalpha -lbeta

You can also do this using another colon-separated environment variable LIBRARY_PATH:

export LIBRARY_PATH=/my_custom_prefix/lib
cc my_program.o my_blah.o my_foo.o -lalpha -lbeta

Again, I think only some compilers recognize this variable.

Loading a shared library

The last step is to make sure the program can actually load the shared library. Usually this is automatic, but if the library is in an unconventional place like /my_custom_prefix/lib, then you gotta give it another hint using yet-another colon-separated environment variable LD_LIBRARY_PATH (or DYLIB_LIBRARY_PATH if you’re using OS X):

export LD_LIBRARY_PATH=/my_custom_prefix/lib
./a.out

Alternatively, you can bake the library path directly into the program so you don’t need an environment variable to run the program. This is the -rpath flag:

cc -Wl,-rpath=/my_custom_prefix/lib my_program.o my_blah.o my_foo.o -lalpha -lbeta

You can specify a path relative to the location of the program itself using the ${ORIGIN} placeholder. Note that this is not a shell variable, so be sure to single-quote it in the shell:

cc -Wl,-rpath='${ORIGIN}/lib' …

Comments