Instantaneous Codes (including the "Fibonacci Code")

Swammerdami · Jun 24, 2021

(A recent thread on Fibonacci numbers reminded me of an interesting tidbit about those numbers: I thought of the "Fibonacci code." But I didn't want to hijack that thread, and, anyway, talk of the "Fibonacci Code" segues into a more general discussion of Instantaneous Codes.)

And I am reminded of how fast my aging brain is dwindling.

I once write a paper about these objects and derived certain algebraic relationships. But I don't have a copy of that paper and am now at a loss to recover the ideas.

Consider the following identity regarding the Fibonacci numbers (F₁ = 0; F₂ = 1; F_k = F_k-1 + F_k-2):

1 = F₁·2^-1 + F₂·2^-2 + F₃·2^-3 ... + F_k·2^-k + ...

Below I explain why I know this identity to be true. ... But I don't see how to prove it directly!

This post offers a simple practical look at encoding discrete tokens into bit strings for data compression. I consider only "Instantaneous" codes; this means arithmetic and quasi-arithmetic codes are off limits. (To keep discussion brief and focused, I omit a rigorous definition of "instantaneous" and other such matters.)

Huffman coding is the usual method for approaching this coding problem, but I avoid that here for three reasons: (1) We may want to support arbitrarily large sets of tokens; (2) We may want a "universal" mathematically-based code rather than the arbitrary histogram-driven Huffman method; (3) The universal codes we thus consider may be interesting to look at!

We shall assume that the tokens to be encoded are either positive integers (drawn from {1,2,3,4,5,...} or non-negative integers (drawn from {0,1,2,3,4,5,...}). Sometimes one of these is more convenient than the other and for a simple practical reason I will usually not bother with the distinction: deduce from context whether or not zero is included in the domain! (The reason why this is unimportant is that we can easily convert one code type to the other. If C(x) is an instantaneous code which encodes positive integers x, then we can simply write C'

= C(y+1) to define a code which encodes non-negative integers y.)

So, our goal is to replace a sequence of whole numbers, for example ( 1, 8, 6, 2 ), with a string of bits, say the 16 bits ( 1100001110011011 ). (That string is NOT arbitrary; it is the actual encoding using a method of instantaneous coding defined below and called Fibonacci coding.)

One way to understand our task is that We need to get rid of commas!

See that ( 1, 8, 6, 2 ) has a very simple representation in bits using ordinary binary: ( 1, 1000, 110, 10 ) This is only ten bits compared with 16 bits for the Fibonacci code; but what are we going to do with those commas? If we simply discard them we get ( 1100011010 ) and won't know how to parse the output into 4 tokens. We're only allowed two symbols (0 and 1); a comma is thus an illegal third symbol! (A coding alphabet size larger than 2 is possible, though the cost per code letter increases, but we don't consider such in this note.)

A simple solution for this case would seem to be to render all numbers with 4 bits, i.e. ( 0001, 1000, 0110, 0010 ). Now we can erase all the commas -- the boundaries are defined by simply separating the output bit string into groups of 4 bits. We achieve the same 16-bit cost as the Fibonacci code did achieve. But this will fail if one of the numbers we want to encode is larger than 15.

We want a "universal" method that allows arbitrarily large integers to be encoded. The fixed 4-bit integer encoding ain't it.

There is a simple way to adapt this method. Use however many 4-bit blocks are needed to represent the integer. For a 40-bit integer, use 10 blocks; for a 41-bit integer use 11 blocks. Then, to allow the "commas" to be located implicitly, add a prefix bit (or suffix bit) to each block (prefix a 1 if it's the final part of the integer, prefix a 0 otherwise. Our example ( 0001 1000 0110 0010 ) now becomes ( 10001 11000 10110 10010 ). Twenty bits. Replace the 6 with 22 and we'd need twenty-five bits: ( 10001 11000 00001 10110 10010 )

I've shown this simple work-around to clarify the construction task. We will look at methods more interesting than that one.

The simplest coding is sometimes called the Stone Age code or unary code. ( 1, 8, 6, 2 ) is encoded as ( 1, 11111111, 111111, 11 ). N is encoded with N 1-bits. Change the commas to zeros and we're done. We then use 21 bits altogether: ( 10 111111110 1111110 110 )

Any instantaneous code will have a condition for defining the end of the output bit-string for a token. The Stone Age code has one of the simplest possible such conditions: Any 0-bit terminates an output token.

Does the Stone Age code seem too primitive to be useful? It is the the optimal code for some probability distributions! And that is our goal, the problem being addressed: Knowing the occurrence probabilities of the various tokens, what encoding minimizes the average total number of bits in the compressed representation?

Two distributions we consider are 'Power Law' of parameter q, and 'Geometric Law' of parameter p. In the former, the probability of token +N is N^-q / zeta(q). In the latter the probability of token N is (1-p)^N·p. (Never mind 'zeta(q)', just present for normalization.)

And it turns out that the Stone Age code is optimal for Geometric distributions with p = 0.5. The same code is nearly optimal for Power distributions with large parameter.

A useful way to describe an instantaneous code is with the lengths of the output bit strings. For the Stone Age code, these lengths (L) are
(L) = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...)
We will expect and desire the sum of 2^-L to sum to 1 (call this condition the Kraft Equality) and indeed 2^-1 + 2^-2 + 2^-3 + ... = 1

Instead of listing the lengths, we can list the populations (B) of length bins: In this example (B) = (1, 1, 1, 1, 1, 1, 1, ... )

Do you see how to modify the Stone Age code slightly to get lengths of (1, 2, 3, 4, 5, 7, 7, 8, 8, 9, 9, 10, 10, ...) ? (Or equivalently bin populations of (1, 1, 1, 1, 1, 0, 2, 2, 2, 2, 2, 2, ...)?) Rather than describing it in detail, I just wave my hands and say "It must be doable, because the Kraft Equality is still satisfied!" This revision will give slightly better compression than the Stone Age Code of Power Law data with parameter q ≈ 3.6.

(If you Google, look for "Kraft INequality. I prefer to speak of strict equality, since I don't like deliberately inefficient codes!)

So the unary code is optimal or near-optimal for very skewed distributions. Let's look at more interesting cases.

Something called the Golomb Code of parameter G is optimal for ALL geometric distributions. (Unary code is the Golomb Code when G=1.) The Golomb Coding begins by converting the input token N into quotient and remainder via N = a*G + b.
Then encode a using the Stone Age code, follow that with b encoded in simple binary. For example the Golomb Code with G=4 (or G=8) will be optimal when p ≈ .17 (or p ≈ .08) and b will be coded with 2 (or 3) bits.

For G=4, (B) = (0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...)
For G=8, (B) = (0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, ...)
When G is not a power-of-two, the Golomb Code is slightly less straightforward;
For G=5, (B) = (0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, ...)
For G=6, (B) = (0, 0, 2, 6, 6, 6, 6, 6, 6, 6, 6, ...)
For G=7, (B) = (0, 0, 1, 7, 7, 7, 7, 7, 7, 7, 7, ...)

Sometimes N=0 will be much more common than implied by a geometric distribution; do you see the easy way to modify the Golomb Code (G=4) by using '0' as the encoding of N=0 to get (B) = (1, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, ...) ?

Whew! It's taken me this long to tersely set up the background, and I still haven't come to interesting codes like the Fibonacci Code.

Without further ado, let's mention the so-called Base-Phi Number system. In decimal, we assign mass to bins in a greedy fashion. For example, to represent N=139:

Code:

Bins	100	10	1
139 = 	100 +	30 + 	9

Base-Phi uses Fibonacci Numbers instead of 1, 10, 100.

Code:

Bins	89	55	34	21	13	8	5	3	2	1
139 = 	89 +	0 + 	34 + 	0 + 	13 +	0 + 	0 + 	3 +	0 +	0

We only need the digits 0 and 1, so 139 is rendered as 1010100100 in Base-Phi. Note that a "greedy" decomposition ensures that there are never two consecutive 1-bits. Now reverse the bits and post-fix a 1 to get 00100101011. Voila! The Fibonacci Code. It is the optimal code for a Power Law with q ≈ 1.44.

The termination condition for this instantaneous (Fibonacci) code is elegant: a token ends as soon as two consecutive 1-bits are seen.

The length-bin populations are (0, 1, 1, 2, 3, 5, 8, 13, 21, ...). If all possible such output bit-strings occur in the code -- and they do -- then we know from the Kraft Equality that 2^-2 + 2^-3 + 2*2^-4 + 3*2^-5 + 5*2^-6 + 8*2^-7 + 13*2^-8 + 21*2^-9 + 34*2^-10 + ... = 1. We know this whether we can prove this elsehow or not!

Similarly, an instantaneous code can be defined by a token-ending condition of three consecutive 1-bits. The length-bin populations are now (0, 0, 1, 1, 2, 4, 7, 13, 24, 44, 81, ...) These numbers are the Fibonacci-like numbers, where the three (instead of two) recent numbers are summed. This is near-optimal for a Power Law with q ≈ 1.23.

We NEED not stop there! There can be compound termination conditions, etc. But I WILL stop.

Swammerdami · Jun 25, 2021

It seems almost magical the way that the Base-Phi number system, built by using the Fibonacci numbers (1,2,3,5,8,13,21,...) in place of the powers of ten (1,10,100,1000,10000,..) immediately transforms into a perfect instantaneous code! And not just any instantaneous code, but the optimal code for a power-law distribution with parameter q≈1.44. I once noticed this by myself, but my invention wasn't novel: I'm sure it was discovered several decades ago.

To get the instantaneous code, simply reverse the bits and postpend a '1'. Voilà! Presto!!

The elegance is breath-taking! Of course I was slightly disappointed that nobody has commented on it, but I can understand. The sheer beauty of the Fibonacci Code stunned you all into silence!

A few hours ago I posed myself a question that I didn't consider when last I pondered these things back in the 20th century. What about the Tribonacci numbers: 1, 2, 4, 7, 13, 24, 44, 81, 149, 274, 504, 927, 1705, 3136, 5768, 10609,...? As I mentioned in OP, these seem to lead to the instantaneous code ideal for a q≈1.23 power law. Is the construction as simple as for the Fibonacci Code?

By analogy with "Base-Phi" (the representation of any integer as a "greedy" sum of Fibonacci numbers), there is a "Base-1.8393" representation based on Tribonacci numbers which renders an integer with 0s and 1s. and never three 1s in a row.

But does this immediately lead to a perfect instantaneous code? Can we reverse the bits and postpend '11' to get such a code?

No; not quite. It breaks down because some representations will have '11' as the two MSBs and some will have '10'. In the former case, after reversal we can only postpend a single '1' bit; in the latter case we need '11.'

I mulled this a bit. Here's the best I came up with. I'll use the same N=139 example as in OP:

Code:

MostSig Bit:     97     53     29     16      9      5      3      2      1
Main Bins       149     81     44     24     13      7      4      2      1

139 =            97 +    0 +    0 +   24 +   13 +    0 +    4 +    0 +    1

The procedure is the same as we saw for the Fibonacci code, except that the MSB has a special weight, shown as the first line in the above table. These weights are just what they need to be to ensure that the two MSBs form '10', not '11'. As seen, 139 = 100110101 in this hybrid representation. Now we simply reverse the bits and postpend '11' to get the Tribonacci Code optimal for encoding power-law integers with q≈1.23. (10101100111 in the example.) The end of a code string is defined by '111.' Bravo.

Those weights are generated just like the Tribanacci numbers (sum of three previous terms) except that 1 is subtracted. Here they are: 1, 2, 3, 5, 9, 16, 29, 53, 97, 178, 327, 601, 1105, 2032, 3737, 6873, 12641,...

Instantaneous Codes (including the "Fibonacci Code")

Swammerdami

Squadron Leader

Swammerdami

Squadron Leader