Storing a list in an int


Python’s default ints, unlike in C, Rust or Go, are of arbitrary size.1,2 What that means is there’s no absolute maximum value your ints can store. They’ll grow as long as they fit in memory.

For example, you can open python3 and run the following.

>>> import math
>>> math.factorial(2020)
[number omitted]
>>> math.log2(math.factorial(2020))
19272.453841606068
>>> type(math.factorial(2020))
<class 'int'>

So a normal, every-day int in python can easily store a value that would take up 19273 bits in a C-style fixed-size unsigned int type. In a language like python where convenience is valued over speed and memory efficiency, that’s really useful.

This unlimited precision also means we can store an arbitrary amount of information in a single int. With the right coding, an entire book, an entire database, or anything else fits in a single python int.

We can therefore imagine a dialect of python where we only have ints, and we need to represent everything else (dicts, lists, etc.) just by using ints. We’d also have special functions and methods that would treat ints as if they were lists, dicts, etc. This can be a fun exercise, and that’s what we’ll be doing in this post.

There’s an obvious way to do this: all data structures are just bit-arrays in memory. Worst-case scenario, it’s a set of related bit-arrays (like each node in a linked list or tree, for example), and sets are just bit arrays as well. Bit arrays can be interpreted as binary numbers. So we always have this option. But that’s a bit boring.

For this post, and for the following posts in this series I’ll present some ways of representing complex data structures as ints that I think are interesting. They are not necessarily the most compact, the most reasonable, or the most efficient. The common objective is to find interesting or fun representations of these data structures.3

Introducing Gödel numbering

The first data structure we’ll work on representing is list. What we’ll use is called Gödel numbering, named after the logician Kurt Gödel. We’ll be dealing with lists of unsigned integers (i.e. naturals) only, for convenience.

Gödel numbering works by abusing the fact that each natural number greater than 1 is uniquely represented by its prime factorization. This is given by the fundamental theorem of arithmetic.

Looking at some examples.

  • 126 = 2^1 \cdot 3^2 \cdot 5^0 \cdot 7^1.
  • 256 = 2^8.
  • 17891 = 2^0 \cdot 3^0\cdot 5^0 \cdot [\ldots] \cdot 17891^1.

A number is uniquely identifiable by the list of the exponents of its prime factors (up to its highest non-zero exponent). So we can say 126 represents the list [1, 2, 0, 1]. The first place in the list is the exponent of 2 in the prime decomposition of 126, the second value is the exponent of 3, and so on.

A couple of examples:

  • 2 = 2^1 \to [1]
  • 143 = 2^0 \cdot 3^0 \cdot 5^0 \cdot 7^0 \cdot 11^1 \cdot 13^1 \to [0, 0, 0, 0, 1, 1]
  • 9 = 2^0 \cdot 3^2 \to [0, 2]

What happens if you have a 0 at the end of the list? Well, with this encoding you can’t. There’s an infinite stream of primes with exponent 0 in our factorization, so we need to stop somewhere.4 We chose to stop at the last non-zero entry.

This representation uses very large numbers when the list contains somewhat large numbers. That’s because the numbers in our list become exponents, so the size of the int grows exponentially with them. For example, [50, 1000, 250] uses a number with size 2266 bits. On the other hand, relatively long lists of very small ints, and especially large sparse lists (i.e. where most of the values are 0) have very compact representations compared to other int-encodings of lists.

Keep in mind this is not a practical encoding of lists as ints for programming. It’s meant to be a fun experiment only.

Python implementation

Let’s look at an implementation in python. A couple of notes here:

  1. We’ll allow ourselves to use functions with yield because it simplifies things a great deal.5
  2. You’ll notice an excessive use of while loops. That’s because list comprehensions, range, and most things you’d consider using in a for loop are banned from our int-only dialect. All of those end up being replaced with while loops here.

Prime generation

The first function we’ll write is an iterator that yields primes in order. This will be useful throughout. This implementation is the simplest implementation that works. I might write an entire post in the near future about algorithms for generating primes, because it’s such a cool topic, and a venerable field of research in itself. The algorithm most people know is the Sieve of Erathosthenes, but that’s just the tip of the iceberg.6

For today, a very naïve implementation will do.

def primes(starting: int = 2):
    """Yield the primes in order.
    
    Args:
        starting: sets the minimum number to consider.
    
    Note: `starting` can be used to get all prime numbers
    _larger_ than some number. By default it doesn't skip
    any candidate primes.
    """
    candidate_prime = starting
    while True:
        candidate_factor = 2
        is_prime = True
        # We'll try all the numbers between 2 and
        # candidate_prime / 2. If any of them divide
        # our candidate_prime, then it's not a prime!
        while candidate_factor <= candidate_prime // 2:
            if candidate_prime % candidate_factor == 0:
                is_prime = False
                break
            candidate_factor += 1
        if is_prime:
            yield candidate_prime
        candidate_prime += 1

Create empty list

def empty_list() -> int:
    """Create a new empty list."""
    # 1 is the empty list. It isn't divisible by any prime.
    return 1

Yield elements

def iter_list(l: int):
    """Yields elements in the list, from first to last."""
    # We go through each prime in order. The next value of
    # the list is equal to the number of times the list is
    # divisible by the prime.
    for p in primes():
        # We decided we will have no trailing 0s, so when
        # the list is 1, it's over.
        if l <= 1:
            break
        # Count the number of divisions until the list is
        # not divisible by the prime number.
        num_divisions = 0
        while l % p == 0:
            num_divisions += 1
            l = l // p  # could be / as well
        yield num_divisions

Access element

def access(l: int, i: int) -> int:
    """Return i-th element of l."""
    # First we iterate over all primes until we get to the
    # ith prime.
    j = 0
    for p in primes():
        if j == i:
            ith_prime = p
            break
        j += 1
    # Now we divide the list by the ith-prime until we
    # cant divide it no more.
    num_divisions = 0
    while l % ith_prime == 0:
        num_divisions += 1
        l = l // ith_prime
    return num_divisions

Append

def append(l: int, elem: int) -> int:
    # The first step is finding the largest prime factor.
    # We look at all primes until l.
    # The next prime after the last prime factor is going
    # to be the base we need to use to append.
    # E.g. if the list if 18 -> 2**1 * 3**2 -> [1, 2]
    # then the largest prime factor is 3, and we will
    # multiply by the _next_ prime factor to some power to
    # append to the list.
    last_prime_factor = 1  # Just a placeholder
    for p in primes():
        if p > l:
            break
        if l % p == 0:
            last_prime_factor = p
    # Now get the _next_ prime after the last in the list.
    for p in primes(starting=last_prime_factor + 1):
        next_prime = p
        break
    # Now finally we append an item by multiplying the list
    # by the next prime to the `elem` power.
    return l * next_prime ** elem

Trying the functions out

You can open a python, ipython or bpython session and try these functions out!

You should use numbers between 1 and 10 for the list elements. If you use larger numbers append and access can take a very long time. In part, this is an impractical fact about using Gödel numbering for lists, although the range of viable element values can be improved a great deal by optimizing the prime generation and factorization algorithms.

In [16]: l = empty_list()

In [17]: l = append(l, 2)

In [18]: l = append(l, 5)

In [19]: list(iter_list(l))
Out[19]: [2, 5]

In [20]: access(l, 0)
Out[20]: 2

In [21]: access(l, 1)
Out[21]: 5

In [22]: l
Out[22]: 972

Other int encodings

We saw one way in which we can represent lists of natural numbers as ints. There’s other, more practical ways that rely on subdividing the binary representation of the number into variably-sized chunks. I’m sure you can come up with how that would look.

In the future I might write other posts about better algorithms for generating primes and factorizing numbers, and also about int-representations of other complex data structures.

Footnotes

  1. I would assume the implementation breaks up at some point even before you run at out memory, but the documentation does explicitly mention they have unlimited precision.
  2. Note this is true of python3 but not of python2. In the case of python2, ints are fixed-size. I think it’s safe to just say python and mean python3 in 2020, but I also think this detail is worth a footnote.
  3. For the case of Gödel numbering for lists, it’s actually easy to argue that it’s a particularly bad representation. We’ll talk a bit about the trade-offs in the representation later in the blogpost.
  4. We could store the length of the list in a separate int, so that we know how many 0s at the end of the list to take into consideration. If we don’t want to have a whole separate int, we can always write the length of the list as the exponent of 2 and start the actual list with the exponent of 3. This has some redundant information, though. The way to avoid redundant information is to store the number of final 0s in the list, instead of the entire length. We won’t be worrying about any of this, though.
  5. Note that using yield is no different from using return and taking as an argument a state variable (it’s often enough to have the last returned element). This is a bit like Continuation Passing Style. Also similar to the usual accumulator hack for making non-tail recursive functions tail-recursive. If you’ve never heard of the accumulator trick, here are some links [1], [2] I might one day write about imitating iterators in languages that don’t have them.
  6. See also the paper The Genuine Sieve of Erathosthenes, which clears up a common confusion about how the algorithm is defined.

8 responses to “Storing a list in an int”

  1. Hi there, thanks for your excellent article.

    I want to translate it into Chinese and publish it in the Chinese community for non-commercial use, so I hope to get your authorization if it’s OK for you? I will indicate the author information, the source and other information you wish to add.

    Hope for your reply. Thank you!

Leave a reply to 脑洞:如何用一个整数来示意一个列表?_fm分享平台 – fm分享网 Cancel reply

Create a website or blog at WordPress.com