Python’s default `int`

s, unlike in `C`

, `Rust`

or `Go`

, are of arbitrary size.^{1,2} What that means is there’s no absolute maximum value your ints can store. They’ll grow as long as they fit in memory.

For example, you can open `python3`

and run the following.

```
>>> import math
>>> math.factorial(2020)
[number omitted]
>>> math.log2(math.factorial(2020))
19272.453841606068
>>> type(math.factorial(2020))
<class 'int'>
```

So a normal, every-day `int`

in `python`

can easily store a value that would take up `19273`

bits in a C-style fixed-size unsigned int type. In a language like `python`

where convenience is valued over speed and memory efficiency, that’s really useful.

This *unlimited precision* also means we can store an *arbitrary amount of information* in a single `int`

. With the right coding, an entire book, an entire database, or anything else fits in a *single * `python`

`int`

.

We can therefore imagine a dialect of `python`

where we *only* have ints, and we need to represent everything else (`dict`

s, `list`

s, etc.) just by using `int`

s. We’d also have special functions and methods that would *treat* `int`

s as if they were `list`

s, `dict`

s, etc. This can be a fun exercise, and that’s what we’ll be doing in this post.

There’s an obvious way to do this: all data structures are just bit-arrays in memory. Worst-case scenario, it’s a set of related bit-arrays (like each node in a linked list or tree, for example), and sets are just bit arrays as well. Bit arrays can be interpreted as binary numbers. So we always have this option. But that’s a bit boring.

For this post, and for the following posts in this series I’ll present some ways of representing complex data structures as `int`

s that I think are *interesting.* They are not necessarily the most compact, the most reasonable, or the most efficient. The common objective is to find *interesting* or *fun* representations of these data structures.^{3}

# Introducing Gödel numbering

The first data structure we’ll work on representing is `list`

. What we’ll use is called Gödel numbering, named after the logician Kurt Gödel. We’ll be dealing with lists of *unsigned integers* (i.e. naturals) only, for convenience.

Gödel numbering works by abusing the fact that each natural number greater than 1 is *uniquely represented* by its prime factorization. This is given by the *fundamental theorem of arithmetic*.

Looking at some examples.

- .
- .
- .

A number is uniquely identifiable by the *list of the exponents of its prime factors* (up to its highest non-zero exponent). So we can say represents the list `[1, 2, 0, 1]`

. The first place in the list is the exponent of in the prime decomposition of , the second value is the exponent of , and so on.

A couple of examples:

What happens if you have a `0`

at the end of the list? Well, with this encoding you can’t. There’s an infinite stream of primes with exponent `0`

in our factorization, so we need to stop somewhere.^{4} We chose to stop at the last non-zero entry.

This representation uses *very* large numbers when the list contains somewhat large numbers. That’s because the numbers in our list become exponents, so the size of the int grows exponentially with them. For example, `[50, 1000, 250]`

uses a number with size `2266`

bits. On the other hand, relatively long lists of *very* small `int`

s, and especially large sparse lists (i.e. where most of the values are 0) have very compact representations compared to other `int`

-encodings of `list`

s.

Keep in mind this is not a practical encoding of `list`

s as `int`

s for programming. It’s meant to be a fun experiment only.

# Python implementation

Let’s look at an implementation in `python`

. A couple of notes here:

- We’ll allow ourselves to use functions with
`yield`

because it simplifies things a great deal.^{5} - You’ll notice an excessive use of
`while`

loops. That’s because list comprehensions,`range`

, and most things you’d consider using in a`for`

loop are banned from our int-only dialect. All of those end up being replaced with`while`

loops here.

## Prime generation

The first function we’ll write is an iterator that yields primes in order. This will be useful throughout. This implementation is the *simplest *implementation that works. I might write an entire post in the near future about algorithms for generating primes, because it’s such a cool topic, and a venerable field of research in itself. The algorithm most people know is the Sieve of Erathosthenes, but that’s just the tip of the iceberg.^{6}

For today, a very naïve implementation will do.

```
def primes(starting: int = 2):
"""Yield the primes in order.
Args:
starting: sets the minimum number to consider.
Note: `starting` can be used to get all prime numbers
_larger_ than some number. By default it doesn't skip
any candidate primes.
"""
candidate_prime = starting
while True:
candidate_factor = 2
is_prime = True
# We'll try all the numbers between 2 and
# candidate_prime / 2. If any of them divide
# our candidate_prime, then it's not a prime!
while candidate_factor <= candidate_prime // 2:
if candidate_prime % candidate_factor == 0:
is_prime = False
break
candidate_factor += 1
if is_prime:
yield candidate_prime
candidate_prime += 1
```

## Create empty list

```
def empty_list() -> int:
"""Create a new empty list."""
# 1 is the empty list. It isn't divisible by any prime.
return 1
```

## Yield elements

```
def iter_list(l: int):
"""Yields elements in the list, from first to last."""
# We go through each prime in order. The next value of
# the list is equal to the number of times the list is
# divisible by the prime.
for p in primes():
# We decided we will have no trailing 0s, so when
# the list is 1, it's over.
if l <= 1:
break
# Count the number of divisions until the list is
# not divisible by the prime number.
num_divisions = 0
while l % p == 0:
num_divisions += 1
l = l // p # could be / as well
yield num_divisions
```

## Access element

```
def access(l: int, i: int) -> int:
"""Return i-th element of l."""
# First we iterate over all primes until we get to the
# ith prime.
j = 0
for p in primes():
if j == i:
ith_prime = p
break
j += 1
# Now we divide the list by the ith-prime until we
# cant divide it no more.
num_divisions = 0
while l % ith_prime == 0:
num_divisions += 1
l = l // ith_prime
return num_divisions
```

## Append

```
def append(l: int, elem: int) -> int:
# The first step is finding the largest prime factor.
# We look at all primes until l.
# The next prime after the last prime factor is going
# to be the base we need to use to append.
# E.g. if the list if 18 -> 2**1 * 3**2 -> [1, 2]
# then the largest prime factor is 3, and we will
# multiply by the _next_ prime factor to some power to
# append to the list.
last_prime_factor = 1 # Just a placeholder
for p in primes():
if p > l:
break
if l % p == 0:
last_prime_factor = p
# Now get the _next_ prime after the last in the list.
for p in primes(starting=last_prime_factor + 1):
next_prime = p
break
# Now finally we append an item by multiplying the list
# by the next prime to the `elem` power.
return l * next_prime ** elem
```

## Trying the functions out

You can open a `python`

, `ipython`

or `bpython`

session and try these functions out!

You should use numbers between `1`

and `10`

for the list elements. If you use larger numbers `append`

and `access`

can take a *very* long time. In part, this is an impractical fact about using Gödel numbering for lists, although the range of viable element values can be improved a great deal by optimizing the prime generation and factorization algorithms.

```
In [16]: l = empty_list()
In [17]: l = append(l, 2)
In [18]: l = append(l, 5)
In [19]: list(iter_list(l))
Out[19]: [2, 5]
In [20]: access(l, 0)
Out[20]: 2
In [21]: access(l, 1)
Out[21]: 5
In [22]: l
Out[22]: 972
```

# Other int encodings

We saw one way in which we can represent lists of natural numbers as `int`

s. There’s other, more practical ways that rely on subdividing the binary representation of the number into variably-sized chunks. I’m sure you can come up with how that would look.

In the future I might write other posts about better algorithms for generating primes and factorizing numbers, and also about int-representations of other complex data structures.

# Footnotes

- I would assume the implementation breaks up at some point even before you run at out memory, but the documentation does explicitly mention they have
*unlimited precision*. - Note this is true of
`python3`

but not of`python2`

. In the case of`python2`

,`int`

s are fixed-size. I think it’s safe to just say`python`

and mean`python3`

in 2020, but I also think this detail is worth a footnote. - For the case of Gödel numbering for lists, it’s actually easy to argue that it’s a particularly
*bad*representation. We’ll talk a bit about the trade-offs in the representation later in the blogpost. - We could store the length of the list in a separate
`int`

, so that we know how many`0`

s at the end of the list to take into consideration. If we don’t want to have a*whole separate*`int`

, we can always write the length of the list as the exponent of and start the*actual*list with the exponent of . This has some redundant information, though. The way to avoid redundant information is to store*the number of final*in the list, instead of the entire length. We won’t be worrying about`0`

s*any*of this, though. - Note that using
`yield`

is no different from using`return`

and taking as an argument a`state`

variable (it’s often enough to have the last returned element). This is a bit like Continuation Passing Style. Also similar to the usual accumulator hack for making non-tail recursive functions tail-recursive. If you’ve never heard of the accumulator trick, here are some links [1], [2] I might one day write about imitating iterators in languages that don’t have them. - See also the paper The Genuine Sieve of Erathosthenes, which clears up a common confusion about how the algorithm is defined.

## 8 responses to “Storing a list in an int”

[…] Overview Python’s default ints, unlike in C, Rust or Go, are of arbitrary size.1,2 What that means is there’s no absolute maximum value your ints can store. They’ll grow as long a… Read more […]

Hi there, thanks for your excellent article.

I want to translate it into Chinese and publish it in the Chinese community for non-commercial use, so I hope to get your authorization if it’s OK for you? I will indicate the author information, the source and other information you wish to add.

Hope for your reply. Thank you!

Sure! Feel free to go ahead with that!

[…] 原题 | Storing a list in an int (https://iantayler.com/2020/12/07/storing-a-list-in-an-int) […]

[…] 原题 | Storing a list in an int (https://iantayler.com/2020/12/07/storing-a-list-in-an-int) […]

[…] 原题 | Storing a list in an int (https://iantayler.com/2020/12/07/storing-a-list-in-an-int) […]

[…] Storing a list in an int – For a fun exercise, learn how you can leverage Python’s unlimited integer precision to encode and store lists of any size as a single integer. Because, why not? […]

[…] 原题 | Storing a list in an int (https://iantayler.com/2020/12/07/storing-a-list-in-an-int) […]