Python’s default `int`

s, unlike in `C`

, `Rust`

or `Go`

, are of arbitrary size.^{1,2} What that means is there’s no absolute maximum value your ints can store. They’ll grow as long as they fit in memory.

For example, you can open `python3`

and run the following.

>>> import math >>> math.factorial(2020) [number omitted] >>> math.log2(math.factorial(2020)) 19272.453841606068 >>> type(math.factorial(2020)) <class 'int'>

So a normal, every-day `int`

in `python`

can easily store a value that would take up `19273`

bits in a C-style fixed-size unsigned int type. In a language like `python`

where convenience is valued over speed and memory efficiency, that’s really useful.

This *unlimited precision* also means we can store an *arbitrary amount of information* in a single `int`

. With the right coding, an entire book, an entire database, or anything else fits in a *single * `python`

`int`

.

We can therefore imagine a dialect of `python`

where we *only* have ints, and we need to represent everything else (`dict`

s, `list`

s, etc.) just by using `int`

s. We’d also have special functions and methods that would *treat* `int`

s as if they were `list`

s, `dict`

s, etc. This can be a fun exercise, and that’s what we’ll be doing in this post.

There’s an obvious way to do this: all data structures are just bit-arrays in memory. Worst-case scenario, it’s a set of related bit-arrays (like each node in a linked list or tree, for example), and sets are just bit arrays as well. Bit arrays can be interpreted as binary numbers. So we always have this option. But that’s a bit boring.

For this post, and for the following posts in this series I’ll present some ways of representing complex data structures as `int`

s that I think are *interesting.* They are not necessarily the most compact, the most reasonable, or the most efficient. The common objective is to find *interesting* or *fun* representations of these data structures.^{3}

# Introducing Gödel numbering

The first data structure we’ll work on representing is `list`

. What we’ll use is called Gödel numbering, named after the logician Kurt Gödel. We’ll be dealing with lists of *unsigned integers* (i.e. naturals) only, for convenience.

Gödel numbering works by abusing the fact that each natural number greater than 1 is *uniquely represented* by its prime factorization. This is given by the *fundamental theorem of arithmetic*.

Looking at some examples.

- .
- .
- .

A number is uniquely identifiable by the *list of the exponents of its prime factors* (up to its highest non-zero exponent). So we can say represents the list `[1, 2, 0, 1]`

. The first place in the list is the exponent of in the prime decomposition of , the second value is the exponent of , and so on.

A couple of examples:

What happens if you have a `0`

at the end of the list? Well, with this encoding you can’t. There’s an infinite stream of primes with exponent `0`

in our factorization, so we need to stop somewhere.^{4} We chose to stop at the last non-zero entry.

This representation uses *very* large numbers when the list contains somewhat large numbers. That’s because the numbers in our list become exponents, so the size of the int grows exponentially with them. For example, `[50, 1000, 250]`

uses a number with size `2266`

bits. On the other hand, relatively long lists of *very* small `int`

s, and especially large sparse lists (i.e. where most of the values are 0) have very compact representations compared to other `int`

-encodings of `list`

s.

Keep in mind this is not a practical encoding of `list`

s as `int`

s for programming. It’s meant to be a fun experiment only.

# Python implementation

Let’s look at an implementation in `python`

. A couple of notes here:

- We’ll allow ourselves to use functions with
`yield`

because it simplifies things a great deal.^{5} - You’ll notice an excessive use of
`while`

loops. That’s because list comprehensions,`range`

, and most things you’d consider using in a`for`

loop are banned from our int-only dialect. All of those end up being replaced with`while`

loops here.

## Prime generation

The first function we’ll write is an iterator that yields primes in order. This will be useful throughout. This implementation is the *simplest *implementation that works. I might write an entire post in the near future about algorithms for generating primes, because it’s such a cool topic, and a venerable field of research in itself. The algorithm most people know is the Sieve of Erathosthenes, but that’s just the tip of the iceberg.^{6}

For today, a very naïve implementation will do.

def primes(starting: int = 2): """Yield the primes in order. Args: starting: sets the minimum number to consider. Note: `starting` can be used to get all prime numbers _larger_ than some number. By default it doesn't skip any candidate primes. """ candidate_prime = starting while True: candidate_factor = 2 is_prime = True # We'll try all the numbers between 2 and # candidate_prime / 2. If any of them divide # our candidate_prime, then it's not a prime! while candidate_factor <= candidate_prime // 2: if candidate_prime % candidate_factor == 0: is_prime = False break candidate_factor += 1 if is_prime: yield candidate_prime candidate_prime += 1

## Create empty list

def empty_list() -> int: """Create a new empty list.""" # 1 is the empty list. It isn't divisible by any prime. return 1

## Yield elements

def iter_list(l: int): """Yields elements in the list, from first to last.""" # We go through each prime in order. The next value of # the list is equal to the number of times the list is # divisible by the prime. for p in primes(): # We decided we will have no trailing 0s, so when # the list is 1, it's over. if l <= 1: break # Count the number of divisions until the list is # not divisible by the prime number. num_divisions = 0 while l % p == 0: num_divisions += 1 l = l // p # could be / as well yield num_divisions

## Access element

def access(l: int, i: int) -> int: """Return i-th element of l.""" # First we iterate over all primes until we get to the # ith prime. j = 0 for p in primes(): if j == i: ith_prime = p break j += 1 # Now we divide the list by the ith-prime until we # cant divide it no more. num_divisions = 0 while l % ith_prime == 0: num_divisions += 1 l = l // ith_prime return num_divisions

## Append

def append(l: int, elem: int) -> int: # The first step is finding the largest prime factor. # We look at all primes until l. # The next prime after the last prime factor is going # to be the base we need to use to append. # E.g. if the list if 18 -> 2**1 * 3**2 -> [1, 2] # then the largest prime factor is 3, and we will # multiply by the _next_ prime factor to some power to # append to the list. last_prime_factor = 1 # Just a placeholder for p in primes(): if p > l: break if l % p == 0: last_prime_factor = p # Now get the _next_ prime after the last in the list. for p in primes(starting=last_prime_factor + 1): next_prime = p break # Now finally we append an item by multiplying the list # by the next prime to the `elem` power. return l * next_prime ** elem

## Trying the functions out

You can open a `python`

, `ipython`

or `bpython`

session and try these functions out!

You should use numbers between `1`

and `10`

for the list elements. If you use larger numbers `append`

and `access`

can take a *very* long time. In part, this is an impractical fact about using Gödel numbering for lists, although the range of viable element values can be improved a great deal by optimizing the prime generation and factorization algorithms.

In [16]: l = empty_list() In [17]: l = append(l, 2) In [18]: l = append(l, 5) In [19]: list(iter_list(l)) Out[19]: [2, 5] In [20]: access(l, 0) Out[20]: 2 In [21]: access(l, 1) Out[21]: 5 In [22]: l Out[22]: 972

# Other int encodings

We saw one way in which we can represent lists of natural numbers as `int`

s. There’s other, more practical ways that rely on subdividing the binary representation of the number into variably-sized chunks. I’m sure you can come up with how that would look.

In the future I might write other posts about better algorithms for generating primes and factorizing numbers, and also about int-representations of other complex data structures.

# Footnotes

- I would assume the implementation breaks up at some point even before you run at out memory, but the documentation does explicitly mention they have
*unlimited precision*. - Note this is true of
`python3`

but not of`python2`

. In the case of`python2`

,`int`

s are fixed-size. I think it’s safe to just say`python`

and mean`python3`

in 2020, but I also think this detail is worth a footnote. - For the case of Gödel numbering for lists, it’s actually easy to argue that it’s a particularly
*bad*representation. We’ll talk a bit about the trade-offs in the representation later in the blogpost. - We could store the length of the list in a separate
`int`

, so that we know how many`0`

s at the end of the list to take into consideration. If we don’t want to have a*whole separate*`int`

, we can always write the length of the list as the exponent of and start the*actual*list with the exponent of . This has some redundant information, though. The way to avoid redundant information is to store*the number of final*in the list, instead of the entire length. We won’t be worrying about`0`

s*any*of this, though. - Note that using
`yield`

is no different from using`return`

and taking as an argument a`state`

variable (it’s often enough to have the last returned element). This is a bit like Continuation Passing Style. Also similar to the usual accumulator hack for making non-tail recursive functions tail-recursive. If you’ve never heard of the accumulator trick, here are some links [1], [2] I might one day write about imitating iterators in languages that don’t have them. - See also the paper The Genuine Sieve of Erathosthenes, which clears up a common confusion about how the algorithm is defined.

Hi there, thanks for your excellent article.

I want to translate it into Chinese and publish it in the Chinese community for non-commercial use, so I hope to get your authorization if it’s OK for you? I will indicate the author information, the source and other information you wish to add.

Hope for your reply. Thank you!

Sure! Feel free to go ahead with that!