Bash on Windows is really slow

Bash is really slow on Windows. See for yourself:

Notice that running a non-builtin command can be two orders of magnitude more expensive on Windows than on Linux. Creating a subshell is also very slow. Running 10 commands could cost as much as a full second, if not more! This is why shell scripts such as configure scripts are incredibly slow on Windows.

The results here are from Cygwin, but those from Mingw are in the same ballpark. This is probably due to the fact that forking is inherently slow on Windows due to the lack of direct support from the OS, so there’s little that can be done to avoid the performance penalty.

Moral of the story? If you are writing shell scripts for Windows, try to avoid making calls to non-builtin commands and avoid making subshells when possible. Unfortunately, there’s not much you can do with only builtin commands. Furthermore, many operations, such as command substitution $(…) and pipelines … | … will implicitly spawn subshells. :(

On the bright side, printf is a builtin command in Bash, so at least you don’t have to choose between printf and echo for performance reasons (the latter being a portability nightmare).


The usual arithmetic conversions

In C and C++, the usual arithmetic conversions are a set of implicit conversions performed to match the operand types in built-in binary operations such as addition, subtraction, less-than, etc (note that bit-shifts are not included). These conversions can occasionally result in subtle bugs, such as:

A somewhat contrived example, but the point is that mixing signed and unsigned integers can often result in unexpected conversions. In the expression n - i of this example, because n is unsigned and has the same conversion rank as i, i is implicitly converted into unsigned. The result is therefore also unsigned, causing the loop condition to be trivially satisfied. Therefore, it’s always a good idea to enable compiler warnings for sign-to-unsigned conversions (-Wsign-conversion in GCC and Clang).

The exact rules are rather long and arcane so I won’t duplicate them here, but the key is recognize that there is actually a rationale behind it, which can be summarized as the following rules:

  • Rule A. The final conversion rank is always the larger of the two, no more, no less. The rank is related to the size of integer types, so you’ll never get a type whose size is larger than expected.
  • Rule B. The conversion will never cause a signed integer overflow, so undefined behavior won’t occur due to the conversion itself (it can still occur due to the operation, however).

Note: Although Rule A holds for the usual arithmetic conversions, integer promotion does violate Rule A. In addition, the rationale presented here is a “best guess” and doesn’t necessarily correlate with the rationale of the standards committee.

The unfortunate corollary of the two rules is that frequently you will end up with signed-to-unsigned conversions because the compiler is actively avoiding a potential signed overflow while preserving Rule A. When can this happen? When both of the following hold:

  • Naturally, one operand needs to be unsigned and the other operand needs to be signed.
  • The type of the signed operand can’t represent every possible value in the type of the unsigned operand. That is, when an overflow of the signed operand would otherwise occur.

In fact, this is really the only scenario that you have to worry about as far as the usual arithmetic conversions are concerned (for integers). In all other cases, the value is preserved by this conversion and hence you won’t see any surprises.

Note: I haven’t actually said anything about floating-point numbers. They are like a trump card: any time a floating-point number is present, the conversion will favor the most precise floating-point number. This is not quite as bad as the signed-to-unsigned conversions, but it can still be lossy if you have very large integers.

Official reference: N1570 Section


An algorithm for comparing large rational numbers in scientific notation

Say you’re given two numbers in the following representation:

x / y × 10 ^ a
u / v × 10 ^ b

with all the variables being integers and, in particular, y and v being always positive. How would you compare these two numbers efficiently in software? For sufficiently small numbers, it’s not a problem since we can rely on floating-point hardware. But what about exact, arbitrarily large numbers?

You could convert them both into fractions with the same denominator and then compare the integers, but that can be rather inefficient if a and b are wildly different. (Similarly, addition and subtraction can be quite expensive if a and b differ greatly, but sadly there’s no way to get around that.)

The algorithm

We can write this in the form of an inequality:

x / y × 10 ^ a  ~  u / v × 10 ^ b

where the “unknown variable” to be solved is ~ and can be >, =, or <. We can manipulate this just like an equation, but with one caveat: we mustn’t multiply or divide by a number unless we are absolutely sure that it’s positive.

Since y and v are both positive, we can safely multiply by those to obtain:

x × v × 10 ^ a  ~  u × y × 10 ^ b

Now let’s define some variables to clean up the inequality:

m ≡ x × v
n ≡ u × y

It’s helpful to split the variables m and n into their signs and moduli:

σ ≡ sgn m
τ ≡ sgn n

M ≡ |m|
N ≡ |n|

The inequality then becomes:

σ × M × 10 ^ a  ~  τ × N × 10 ^ b

The signs σ and τ can already be used to determine the ordering in 7 out of the 9 cases:

 σ   τ  ~
−1  −1  ?
−1   0  <
−1  +1  <
 0  −1  >
 0   0  =
 0  +1  <
+1  −1  >
+1   0  >
+1  +1  ?

That leaves us with the two nontrivial cases where both signs are positive or both signs are negative. In these cases:

σ = τ = +1   ⟹   M × 10 ^ a  ~  N × 10 ^ b
σ = τ = −1   ⟹   N × 10 ^ b  ~  M × 10 ^ a

We’ll consider only the first case. The second case is identical, just reversed.

The trick to making the comparisons more efficient is to use logarithms: since the logarithmic function is monotonic, they preserve the ordering and can be safely applied to both sides of the inequality:

log M + a  ~  log N + b

The problem is that logarithms aren’t that easy to compute. But we can substitute this by computing floored logarithms (ilog) and its fractional part (rlog), which satisfy the following for any positive z:

log z ≡ ilog z + rlog z

0 ≤ rlog z < 1

ilog z ∈ ℕ

Floored logarithms are easy to calculate: simply combine trial division with exponential search. Here is an example of such in Haskell, based on the implementation of integerLogBase in section 14.4 of The Haskell 98 Language Report:

Applying ilog and rlog to the inequality, we obtain:

ilog M + rlog M + a  ~  ilog N + rlog N + b

Notice that the fractional parts of the inequality satisfies:

|rlog N − rlog M| < 1

thus, as long as the integral parts of the inequality differ, the comparison can be done without even worrying about the the fractional parts:

ilog M + a  ~  ilog N + b

Lastly, what about the situation where the integral parts are equal? We could use the fractional part of the logarithm somehow but that would introduce unnecessary complexities here. Instead, we can consult the previous equation:

M × 10 ^ a  ~  N × 10 ^ b

To perform this comparison without using rational arithmetic, we can divide both sides by 10 ^ min (a, b) to make the exponent nonnegative:

M × 10 ^ (a − min(a, b))  ~  N × 10 ^ (b − min(a, b))

Now, this comparison can be computed directly using only integer arithmetic.

Algorithmic complexity

The algorithm involves performing some integer arithmetic and comparisons, as well as handling several subcases. Let

X = max(|x|, |u|)
Y = max(y, v)
A = max(|a|, |b|)

The complexity of this algorithm has no “unexpected surprises”: in the worst case, one would need to calculate the floored logarithm, which is roughly O(max (log X, log Y)). Therefore, the overall complexity is roughly:

O(max(log X, log Y, log A))

Compare this to the naive technique that involves converting both numbers into fractions, which in the worst case could be as bad as O(A).