For example, you can open
python3 and run the following.
>>> import math >>> math.factorial(2020) [number omitted] >>> math.log2(math.factorial(2020)) 19272.453841606068 >>> type(math.factorial(2020)) <class 'int'>
So a normal, every-day
python can easily store a value that would take up
19273 bits in a C-style fixed-size unsigned int type. In a language like
python where convenience is valued over speed and memory efficiency, that’s really useful.
This unlimited precision also means we can store an arbitrary amount of information in a single
int. With the right coding, an entire book, an entire database, or anything else fits in a single
We can therefore imagine a dialect of
python where we only have ints, and we need to represent everything else (
lists, etc.) just by using
ints. We’d also have special functions and methods that would treat
ints as if they were
dicts, etc. This can be a fun exercise, and that’s what we’ll be doing in this post.
There’s an obvious way to do this: all data structures are just bit-arrays in memory. Worst-case scenario, it’s a set of related bit-arrays (like each node in a linked list or tree, for example), and sets are just bit arrays as well. Bit arrays can be interpreted as binary numbers. So we always have this option. But that’s a bit boring.
For this post, and for the following posts in this series I’ll present some ways of representing complex data structures as
ints that I think are interesting. They are not necessarily the most compact, the most reasonable, or the most efficient. The common objective is to find interesting or fun representations of these data structures.3
Introducing Gödel numbering
The first data structure we’ll work on representing is
list. What we’ll use is called Gödel numbering, named after the logician Kurt Gödel. We’ll be dealing with lists of unsigned integers (i.e. naturals) only, for convenience.
Gödel numbering works by abusing the fact that each natural number greater than 1 is uniquely represented by its prime factorization. This is given by the fundamental theorem of arithmetic.
Looking at some examples.
A number is uniquely identifiable by the list of the exponents of its prime factors (up to its highest non-zero exponent). So we can say represents the list
[1, 2, 0, 1]. The first place in the list is the exponent of in the prime decomposition of , the second value is the exponent of , and so on.
A couple of examples:
What happens if you have a
0 at the end of the list? Well, with this encoding you can’t. There’s an infinite stream of primes with exponent
0 in our factorization, so we need to stop somewhere.4 We chose to stop at the last non-zero entry.
This representation uses very large numbers when the list contains somewhat large numbers. That’s because the numbers in our list become exponents, so the size of the int grows exponentially with them. For example,
[50, 1000, 250] uses a number with size
2266 bits. On the other hand, relatively long lists of very small
ints, and especially large sparse lists (i.e. where most of the values are 0) have very compact representations compared to other
Keep in mind this is not a practical encoding of
ints for programming. It’s meant to be a fun experiment only.
Let’s look at an implementation in
python. A couple of notes here:
- We’ll allow ourselves to use functions with
yieldbecause it simplifies things a great deal.5
- You’ll notice an excessive use of
whileloops. That’s because list comprehensions,
range, and most things you’d consider using in a
forloop are banned from our int-only dialect. All of those end up being replaced with
The first function we’ll write is an iterator that yields primes in order. This will be useful throughout. This implementation is the simplest implementation that works. I might write an entire post in the near future about algorithms for generating primes, because it’s such a cool topic, and a venerable field of research in itself. The algorithm most people know is the Sieve of Erathosthenes, but that’s just the tip of the iceberg.6
For today, a very naïve implementation will do.
def primes(starting: int = 2): """Yield the primes in order. Args: starting: sets the minimum number to consider. Note: `starting` can be used to get all prime numbers _larger_ than some number. By default it doesn't skip any candidate primes. """ candidate_prime = starting while True: candidate_factor = 2 is_prime = True # We'll try all the numbers between 2 and # candidate_prime / 2. If any of them divide # our candidate_prime, then it's not a prime! while candidate_factor <= candidate_prime // 2: if candidate_prime % candidate_factor == 0: is_prime = False break candidate_factor += 1 if is_prime: yield candidate_prime candidate_prime += 1
Create empty list
def empty_list() -> int: """Create a new empty list.""" # 1 is the empty list. It isn't divisible by any prime. return 1
def iter_list(l: int): """Yields elements in the list, from first to last.""" # We go through each prime in order. The next value of # the list is equal to the number of times the list is # divisible by the prime. for p in primes(): # We decided we will have no trailing 0s, so when # the list is 1, it's over. if l <= 1: break # Count the number of divisions until the list is # not divisible by the prime number. num_divisions = 0 while l % p == 0: num_divisions += 1 l = l // p # could be / as well yield num_divisions
def access(l: int, i: int) -> int: """Return i-th element of l.""" # First we iterate over all primes until we get to the # ith prime. j = 0 for p in primes(): if j == i: ith_prime = p break j += 1 # Now we divide the list by the ith-prime until we # cant divide it no more. num_divisions = 0 while l % ith_prime == 0: num_divisions += 1 l = l // ith_prime return num_divisions
def append(l: int, elem: int) -> int: # The first step is finding the largest prime factor. # We look at all primes until l. # The next prime after the last prime factor is going # to be the base we need to use to append. # E.g. if the list if 18 -> 2**1 * 3**2 -> [1, 2] # then the largest prime factor is 3, and we will # multiply by the _next_ prime factor to some power to # append to the list. last_prime_factor = 1 # Just a placeholder for p in primes(): if p > l: break if l % p == 0: last_prime_factor = p # Now get the _next_ prime after the last in the list. for p in primes(starting=last_prime_factor + 1): next_prime = p break # Now finally we append an item by multiplying the list # by the next prime to the `elem` power. return l * next_prime ** elem
Trying the functions out
You can open a
bpython session and try these functions out!
You should use numbers between
10 for the list elements. If you use larger numbers
access can take a very long time. In part, this is an impractical fact about using Gödel numbering for lists, although the range of viable element values can be improved a great deal by optimizing the prime generation and factorization algorithms.
In : l = empty_list() In : l = append(l, 2) In : l = append(l, 5) In : list(iter_list(l)) Out: [2, 5] In : access(l, 0) Out: 2 In : access(l, 1) Out: 5 In : l Out: 972
Other int encodings
We saw one way in which we can represent lists of natural numbers as
ints. There’s other, more practical ways that rely on subdividing the binary representation of the number into variably-sized chunks. I’m sure you can come up with how that would look.
In the future I might write other posts about better algorithms for generating primes and factorizing numbers, and also about int-representations of other complex data structures.
- I would assume the implementation breaks up at some point even before you run at out memory, but the documentation does explicitly mention they have unlimited precision.
- Note this is true of
python3but not of
python2. In the case of
ints are fixed-size. I think it’s safe to just say
python3in 2020, but I also think this detail is worth a footnote.
- For the case of Gödel numbering for lists, it’s actually easy to argue that it’s a particularly bad representation. We’ll talk a bit about the trade-offs in the representation later in the blogpost.
- We could store the length of the list in a separate
int, so that we know how many
0s at the end of the list to take into consideration. If we don’t want to have a whole separate
int, we can always write the length of the list as the exponent of and start the actual list with the exponent of . This has some redundant information, though. The way to avoid redundant information is to store the number of final
0s in the list, instead of the entire length. We won’t be worrying about any of this, though.
- Note that using
yieldis no different from using
returnand taking as an argument a
statevariable (it’s often enough to have the last returned element). This is a bit like Continuation Passing Style. Also similar to the usual accumulator hack for making non-tail recursive functions tail-recursive. If you’ve never heard of the accumulator trick, here are some links ,  I might one day write about imitating iterators in languages that don’t have them.
- See also the paper The Genuine Sieve of Erathosthenes, which clears up a common confusion about how the algorithm is defined.