Python’s default `int`

s, unlike in `C`

, `Rust`

or `Go`

, are of arbitrary size.^{1,2} What that means is there’s no absolute maximum value your ints can store. They’ll grow as long as they fit in memory.

For example, you can open `python3`

and run the following.

```
>>> import math
>>> math.factorial(2020)
[number omitted]
>>> math.log2(math.factorial(2020))
19272.453841606068
>>> type(math.factorial(2020))
<class 'int'>
```

So a normal, every-day `int`

in `python`

can easily store a value that would take up `19273`

bits in a C-style fixed-size unsigned int type. In a language like `python`

where convenience is valued over speed and memory efficiency, that’s really useful.

This *unlimited precision* also means we can store an *arbitrary amount of information* in a single `int`

. With the right coding, an entire book, an entire database, or anything else fits in a *single * `python`

`int`

.

We can therefore imagine a dialect of `python`

where we *only* have ints, and we need to represent everything else (`dict`

s, `list`

s, etc.) just by using `int`

s. We’d also have special functions and methods that would *treat* `int`

s as if they were `list`

s, `dict`

s, etc. This can be a fun exercise, and that’s what we’ll be doing in this post.

There’s an obvious way to do this: all data structures are just bit-arrays in memory. Worst-case scenario, it’s a set of related bit-arrays (like each node in a linked list or tree, for example), and sets are just bit arrays as well. Bit arrays can be interpreted as binary numbers. So we always have this option. But that’s a bit boring.

For this post, and for the following posts in this series I’ll present some ways of representing complex data structures as `int`

s that I think are *interesting.* They are not necessarily the most compact, the most reasonable, or the most efficient. The common objective is to find *interesting* or *fun* representations of these data structures.^{3}

# Introducing Gödel numbering

The first data structure we’ll work on representing is `list`

. What we’ll use is called Gödel numbering, named after the logician Kurt Gödel. We’ll be dealing with lists of *unsigned integers* (i.e. naturals) only, for convenience.

Gödel numbering works by abusing the fact that each natural number greater than 1 is *uniquely represented* by its prime factorization. This is given by the *fundamental theorem of arithmetic*.

Looking at some examples.

- .
- .
- .

A number is uniquely identifiable by the *list of the exponents of its prime factors* (up to its highest non-zero exponent). So we can say represents the list `[1, 2, 0, 1]`

. The first place in the list is the exponent of in the prime decomposition of , the second value is the exponent of , and so on.

A couple of examples:

What happens if you have a `0`

at the end of the list? Well, with this encoding you can’t. There’s an infinite stream of primes with exponent `0`

in our factorization, so we need to stop somewhere.^{4} We chose to stop at the last non-zero entry.

This representation uses *very* large numbers when the list contains somewhat large numbers. That’s because the numbers in our list become exponents, so the size of the int grows exponentially with them. For example, `[50, 1000, 250]`

uses a number with size `2266`

bits. On the other hand, relatively long lists of *very* small `int`

s, and especially large sparse lists (i.e. where most of the values are 0) have very compact representations compared to other `int`

-encodings of `list`

s.

Keep in mind this is not a practical encoding of `list`

s as `int`

s for programming. It’s meant to be a fun experiment only.

# Python implementation

Let’s look at an implementation in `python`

. A couple of notes here:

- We’ll allow ourselves to use functions with
`yield`

because it simplifies things a great deal.^{5} - You’ll notice an excessive use of
`while`

loops. That’s because list comprehensions,`range`

, and most things you’d consider using in a`for`

loop are banned from our int-only dialect. All of those end up being replaced with`while`

loops here.

## Prime generation

The first function we’ll write is an iterator that yields primes in order. This will be useful throughout. This implementation is the *simplest *implementation that works. I might write an entire post in the near future about algorithms for generating primes, because it’s such a cool topic, and a venerable field of research in itself. The algorithm most people know is the Sieve of Erathosthenes, but that’s just the tip of the iceberg.^{6}

For today, a very naïve implementation will do.

```
def primes(starting: int = 2):
"""Yield the primes in order.
Args:
starting: sets the minimum number to consider.
Note: `starting` can be used to get all prime numbers
_larger_ than some number. By default it doesn't skip
any candidate primes.
"""
candidate_prime = starting
while True:
candidate_factor = 2
is_prime = True
# We'll try all the numbers between 2 and
# candidate_prime / 2. If any of them divide
# our candidate_prime, then it's not a prime!
while candidate_factor <= candidate_prime // 2:
if candidate_prime % candidate_factor == 0:
is_prime = False
break
candidate_factor += 1
if is_prime:
yield candidate_prime
candidate_prime += 1
```

## Create empty list

```
def empty_list() -> int:
"""Create a new empty list."""
# 1 is the empty list. It isn't divisible by any prime.
return 1
```

## Yield elements

```
def iter_list(l: int):
"""Yields elements in the list, from first to last."""
# We go through each prime in order. The next value of
# the list is equal to the number of times the list is
# divisible by the prime.
for p in primes():
# We decided we will have no trailing 0s, so when
# the list is 1, it's over.
if l <= 1:
break
# Count the number of divisions until the list is
# not divisible by the prime number.
num_divisions = 0
while l % p == 0:
num_divisions += 1
l = l // p # could be / as well
yield num_divisions
```

## Access element

```
def access(l: int, i: int) -> int:
"""Return i-th element of l."""
# First we iterate over all primes until we get to the
# ith prime.
j = 0
for p in primes():
if j == i:
ith_prime = p
break
j += 1
# Now we divide the list by the ith-prime until we
# cant divide it no more.
num_divisions = 0
while l % ith_prime == 0:
num_divisions += 1
l = l // ith_prime
return num_divisions
```

## Append

```
def append(l: int, elem: int) -> int:
# The first step is finding the largest prime factor.
# We look at all primes until l.
# The next prime after the last prime factor is going
# to be the base we need to use to append.
# E.g. if the list if 18 -> 2**1 * 3**2 -> [1, 2]
# then the largest prime factor is 3, and we will
# multiply by the _next_ prime factor to some power to
# append to the list.
last_prime_factor = 1 # Just a placeholder
for p in primes():
if p > l:
break
if l % p == 0:
last_prime_factor = p
# Now get the _next_ prime after the last in the list.
for p in primes(starting=last_prime_factor + 1):
next_prime = p
break
# Now finally we append an item by multiplying the list
# by the next prime to the `elem` power.
return l * next_prime ** elem
```

## Trying the functions out

You can open a `python`

, `ipython`

or `bpython`

session and try these functions out!

You should use numbers between `1`

and `10`

for the list elements. If you use larger numbers `append`

and `access`

can take a *very* long time. In part, this is an impractical fact about using Gödel numbering for lists, although the range of viable element values can be improved a great deal by optimizing the prime generation and factorization algorithms.

```
In [16]: l = empty_list()
In [17]: l = append(l, 2)
In [18]: l = append(l, 5)
In [19]: list(iter_list(l))
Out[19]: [2, 5]
In [20]: access(l, 0)
Out[20]: 2
In [21]: access(l, 1)
Out[21]: 5
In [22]: l
Out[22]: 972
```

# Other int encodings

We saw one way in which we can represent lists of natural numbers as `int`

s. There’s other, more practical ways that rely on subdividing the binary representation of the number into variably-sized chunks. I’m sure you can come up with how that would look.

In the future I might write other posts about better algorithms for generating primes and factorizing numbers, and also about int-representations of other complex data structures.

# Footnotes

- I would assume the implementation breaks up at some point even before you run at out memory, but the documentation does explicitly mention they have
*unlimited precision*. - Note this is true of
`python3`

but not of`python2`

. In the case of`python2`

,`int`

s are fixed-size. I think it’s safe to just say`python`

and mean`python3`

in 2020, but I also think this detail is worth a footnote. - For the case of Gödel numbering for lists, it’s actually easy to argue that it’s a particularly
*bad*representation. We’ll talk a bit about the trade-offs in the representation later in the blogpost. - We could store the length of the list in a separate
`int`

, so that we know how many`0`

s at the end of the list to take into consideration. If we don’t want to have a*whole separate*`int`

, we can always write the length of the list as the exponent of and start the*actual*list with the exponent of . This has some redundant information, though. The way to avoid redundant information is to store*the number of final*in the list, instead of the entire length. We won’t be worrying about`0`

s*any*of this, though. - Note that using
`yield`

is no different from using`return`

and taking as an argument a`state`

variable (it’s often enough to have the last returned element). This is a bit like Continuation Passing Style. Also similar to the usual accumulator hack for making non-tail recursive functions tail-recursive. If you’ve never heard of the accumulator trick, here are some links [1], [2] I might one day write about imitating iterators in languages that don’t have them. - See also the paper The Genuine Sieve of Erathosthenes, which clears up a common confusion about how the algorithm is defined.

Hi there, thanks for your excellent article.

I want to translate it into Chinese and publish it in the Chinese community for non-commercial use, so I hope to get your authorization if it’s OK for you? I will indicate the author information, the source and other information you wish to add.

Hope for your reply. Thank you!

Sure! Feel free to go ahead with that!