Zeckendorf's theorem and property-based testing

Beware of bugs in the above code; I have only proved it correct, not tried it.

- Donald Knuth

Reading Donald Knuth's Concrete Mathematics, I was delighted to learn about the wonderfully named Zeckendorf's Theorem. Zeckendorf's Theorem states (from Wikipedia)

Every positive integer can be represented uniquely as the sum of one or more Fibonacci numbers in such a way that the sum does not include any two consecutive Fibonacci numbers.

Finding a number's Zeckendorf representation

A number $N$'s Zeckendorf representation is the list of unique Fibonacci numbers that sum to $N$. The Zeckendorf representation of $N$ can be obtained by always selecting the largest Fibonacci number, $f$, less than or equal to $N$, decrementing $N$ by $f$, and continuing until $N = 0$.

The approach can be broken down into three pieces

  1. Generating a list of Fibonacci numbers up to $N$
  2. Getting the largest Fibonacci number less than or equal to $N$
  3. Decrementing $N$ by the Fibonacci number from (2) and repeating until $N = 0$

Generating Fibonacci numbers

The easy part, start with $f_{0} = 0$ and $f_{1} = 1$ and continue to generate $f_{n} = f_{n-1} + f_{n-2}$ while $f_{n} < N$.

In [1]:
from typing import List

def fibonacci_numbers_up_to(num: int) -> List[int]:
    """Returns a sorted list of fibonacci numbers up to a given number"""
    fibonacci_nums = [0, 1]
    next_fibonacci = fibonacci_nums[-2] + fibonacci_nums[-1]
    while next_fibonacci < num:
        fibonacci_nums.append(next_fibonacci)
        next_fibonacci = fibonacci_nums[-2] + fibonacci_nums[-1]

    return fibonacci_nums

assert fibonacci_numbers_up_to(6) == [0, 1, 1, 2, 3, 5]
assert fibonacci_numbers_up_to(13) == [0, 1, 1, 2, 3, 5, 8]

Getting the largest element less than or equal to $N$

Step (2) from above. While not the most efficient, a simple way to find the largest element in a list less than or equal to a given number $N$ is to sort the list from largest to smallest and iterate over it until the first number less than or equal to $N$ is found.

In [2]:
def largest_element_lte(list_: List[float], num: int) -> int:
    """Returns the largest element in a list less than or equal to given number

    Raises ValueError if no elements in the list are less than or equal
    """
    for element in sorted(list_, reverse=True):
        if element <= num:
            return element
    else:
        raise ValueError(f"No element found less than or equal to {num}")

fibonaccis_up_to_twenty = fibonacci_numbers_up_to(20)
assert largest_element_lte(fibonaccis_up_to_twenty, 6) == 5
assert largest_element_lte(fibonaccis_up_to_twenty, 13) == 13

Finding the Zeckendorf representation

Putting together the three steps from above, build up a list of the next Fibonacci numbers that sum to $N$.

In [3]:
def zeckendorf_representation(num: int) -> List[int]:
    """Returns a sorted list of fibonacci numbers that sum to a given number"""
    if num < 1:
        raise ValueError("Only positive integers have a Zeckendorf representation")

    fibonaccis = fibonacci_numbers_up_to(num)

    zeckendorf_numbers = []
    while num > 0:
        next_zeckendorf = largest_element_lte(fibonaccis, num)
        zeckendorf_numbers.append(next_zeckendorf)
        num -= next_zeckendorf

    return list(reversed(zeckendorf_numbers))

Testing the implementation

There's a subtle bug in the above implementation that would pass by all but the most astute reviewers. In code bases without good unit tests, it would be very easy for this pass peer-review with the bug unnoticed.

It would also be easy to write some unit tests (albeit not great ones) that would fail to catch the bug. Indeed, having some unit tests could make a peer-reviewer even less likely to notice the issue as the tests might give a false sense of correctness.

Here are three simple unit tests that make the implementation appear correct

In [4]:
assert sum(zeckendorf_representation(100)) == 100
assert zeckendorf_representation(64) == [1, 8, 55]
assert zeckendorf_representation(20) == [2, 5, 13]

A careful reviewer should be able to see an important case that's missing from the testing above, but it is also easy to see how the cases make the implementation look correct at first glance.

The case that is not checked is finding the Zeckendorf represenation of a Fibonacci number. Its Zeckendorf representation should simply be itself. The cases below may not look obviously incorrect, but the second case is wrong as the theorem states that the sum should not include any two consecutive Fibonacci numbers, and 5 and 8 are consecutive Fibonaccis.

In [5]:
assert sum(zeckendorf_representation(13)) == 13
assert zeckendorf_representation(13) == [5, 8]

The bug exists in generating the list of Fibonacci numbers up to $N$ when in fact we need the list of Fibonacci numbers up to and including $N$ (or up to $N+1$).

I've built up to this to bring up a much better way of testing this function. Rather than manually coming up with a few cases to check, we can use Hypothesis to generate test cases for us.

Hypothesis is a Python implementation of property-based testing which allows us to describe the inputs to our function and the properties that our function must always satisfy. Hypothesis will then generate a huge number of test cases of various inputs and check that our properties always hold.

The input to our zeckendorf_representation is just a single postive integer. We'll describe this for Hypothesis and let it come up with a bunch of positive integers to run through our implementation.

Our function should satisfy two properties which come directly from Zeckendorf's Theorem

  1. The Zeckendorf representation of $N$ must sum to $N$
  2. The Zeckendorf representation should not contain consecutive Fibonacci numbers

Property (1) is easy to test, simply check that the sum of the return value is equal to the number that we put in.

Testing property (2) is slightly more involved, but if we have a list of Fibonacci numbers then we can check that all numbers in the returned Zeckendorf representation are at least two indices apart in the Fibonacci list.

In [6]:
from hypothesis import given, strategies

@given(strategies.integers(min_value=1))
def test_zeckendorf_representation(num: int):
    """Tests two properties of Zeckendorf Representation

    They sum up to the given number
    The sum does not include any consecutive Fibonacci numbers
    """
    zeckendorf_nums = zeckendorf_representation(num)
    assert sum(zeckendorf_nums) == num

    fibonacci_nums = fibonacci_numbers_up_to(num+1)
    # get the indices 
    indices = [
        fibonacci_nums.index(zeckendorf_num)
        for zeckendorf_num in zeckendorf_nums
    ]
    # check that no two indices are adjacent
    for index, next_index in zip(indices, indices[1:]):
        assert index + 1 < next_index


test_zeckendorf_representation()
Falsifying example: test_zeckendorf_representation(
    num=5,
)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-287cfc9a2439> in <module>
     22 
     23 
---> 24 test_zeckendorf_representation()

<ipython-input-6-287cfc9a2439> in test_zeckendorf_representation()
      2 
      3 @given(strategies.integers(min_value=1))
----> 4 def test_zeckendorf_representation(num: int):
      5     """Tests two properties of Zeckendorf Representation
      6 

/usr/local/anaconda3/envs/web/lib/python3.7/site-packages/hypothesis/core.py in wrapped_test(*arguments, **kwargs)
   1078                         get_trimmed_traceback()
   1079                     )
-> 1080                     raise the_error_hypothesis_found
   1081 
   1082         # After having created the decorated test function, we need to copy

<ipython-input-6-287cfc9a2439> in test_zeckendorf_representation(num)
     19     # check that no two indices are adjacent
     20     for index, next_index in zip(indices, indices[1:]):
---> 21         assert index + 1 < next_index
     22 
     23 

AssertionError: 

Debugging

Indeed, Hypothesis quickly found an example at $N=5$ that did not satisfy property (2). Printing out the falsifying example, the issue becomes obvious

In [7]:
zeckendorf_representation(5)
Out[7]:
[2, 3]

It ought to be that zeckendorf_representation(5) == [5], below is the corrected implementation that uses a list of Fibonacci numbers up to $N+1$

In [8]:
def zeckendorf_representation(num: int) -> List[int]:
    """Returns a sorted list of fibonacci numbers that sum to a given number"""
    if num < 1:
        raise ValueError("Only positive integers have a Zeckendorf representation")

    fibonaccis = fibonacci_numbers_up_to(num+1)

    zeckendorf_numbers = []
    while num > 0:
        next_zeckendorf = largest_element_lte(fibonaccis, num)
        zeckendorf_numbers.append(next_zeckendorf)
        num -= next_zeckendorf

    return list(reversed(zeckendorf_numbers))

Re-running the Hypothesis test shows the issue has been fixed

In [9]:
test_zeckendorf_representation()

The role of property-based tests

Property-based tests and libraries like Hypothesis and QuickCheck are not replacements for traditional unit tests, but they do deserve a first class role alongside these tests.

I've spent several years as a test engineer and as a developer writing automated test software and I've become increasingly convinced of the difficulty of writing good, thorough tests. I work with brilliant people and recently a trivial bug involving an acceleration value went unnoticed for some time because nobody thought to test a negative acceleration (an important case to test if you live somewhere with gravity).

Property-based tests force you to think critically about the space of inputs that your software accepts and libraries like Hypothesis are able to generate tests that cover a much larger range of the input space than is feasible with traditional unit tests.

Other resources

I was struck by a comment that I came across recently in Dan Luu's How I learned to program

The obvious skill I learned was how to write tests using a fancy testing framework, but the meta-thing I learned which has been even more useful is the fact that writing a test-case generator and a checker is often much more productive than the manual test-case writing that passes for automated testing in most places.

In which Dan is talking about IBM's SixthSense, a formal verification tool that generates test cases. This is a strong statement but I think that Dan's generally right about the extraordinary effectiveness of test-case generators.

Dan talks about this in even greater depth in his post on AFL + QuickCheck = ? and links to a number of good resources.

The author of Hypothesis, David MacIver, also writes an excellent blog and has one of my favorite writings on some things that might help you write better software.