Data Structures in Python - Trie

Published

A major problem in Computer Science is Data Storage: how can we store and retrieve data in the most efficient way for a particular application.

When thinking about it, there are 2 main different formats to store our data:

  • Tables
  • Trees

Tables are the simplest form of database to understand (think Excel). You can chain queries to get relevant data in different tables so you can implement complex lookups relatively easily.

Trees on the other hand are much more flexible, and are mostly used for fast string lookups (auto-complete, text search…), but also for more complex applications which we will explain in subsequent articles (e.g. Merkle Patricia Trie in the Ethereum and XRP blockchains).

So what’s a trie? A trie is a “tree” that splits in branches, according to keys. You can basically think of it as dictionary containing other dictionary. See the illustration below:

trie example
A Simple Trie (from Wikipedia)

In this case, the tree gets split for every letter in a word, but every split also holds a value, which enables us to deal with non-fixed length data.

So how do we implement this? A trie is a special case of a Directed Acyclic Graph (DAG for short), and these kind of structures are usually prime candidates for recursion.

Every node in the tree will need the following:

  • A value field
  • A way to get children elements depending on a key

We want to create an implementation that is easily extensible so we won’t put any restrictions on keys, we’ll allow numbers and letters, and it won’t be fully initialized. Instead, data is added as we go.

In this case, the best solution is to use a simple Dictionary-based class:

  • When adding an element (key, value), if the key is of length 1, set the value in the value field of the current node.
  • If the key has length > 1 then use the key’s first letter as a key for the map field, and use key[1:] as the input for the next node.

We also need to add some None checks: if we try to set/get an element in a Node that does not exist yet, we first create the new Node and then continue the recursive descent.

My implementation looks as follows:

from typing import List, Tuple, Union, Dict

class Trie(object):
    def __init__(self):
        self.Map: Dict[Trie] = {}
        self.Value: Union[None, int] = None
    def __setitem__(self, item: str, value: int) -> None:
        if len(item) == 0:
            raise KeyError("Trie __setitem__ - Invalid item of len 0")
        else:
            self.__ensure_not_none(item[0])
            if len(item) == 1:
                self.Map[item].Value = value
            else:
                self.Map[item[0]][item[1:]] = value
    def __getitem__(self, item: str) -> Union[None, int]:
        if len(item) == 0:
            raise KeyError("Trie __getitem__ - Invalid item of len 0")
        else:
            self.__ensure_not_none(item[0])
            if len(item) == 1:
                return self.Map[item].Value
            else:
                return self.Map[item[0]][item[1:]]
    def delete(self, item: str) -> bool:
        if len(item) == 0:
            raise KeyError("Trie delete - Invalid item of len 0")
        else:
            if self.__getitem__(item) is None:
                return False
            else:
                if len(item) == 1:
                    self.Map[item] = None
                else:
                    self.Map[item[0]].delete(item[1:])
                return True
    def __ensure_not_none(self, key: str):
        if key not in self.Map.keys():
            self.Map[key] = Trie()

We are making use of the getitem and setitem properties of python classes to use our Trie as we would a dictionary, with the following results:

>> trie = Trie()
>> trie["test"]  # Nothing set yet so this returns None
None
>> trie["test"] = 4  # We set the value for test as 4
>> trie["test"]  # and then get it   
4

With this, we get a simple recursive implementation of a Trie in pure Python.
I thought it would be better to explicitly write down the recursive nature of the tree and implement the null checks myself, but it would obviously be possible to modify the Map field to instead use a defaultdict, with a Trie as constructor parameter. Feel free to try it out!

Code is available on Github.