Python has built-in data structures for lists, arrays and dictionaries, but not for tree-like data structures. In LeetCode, questions for “Trees” are limited to binary search trees, and its implementation doesn’t have many functionalities.
The bigtree
Python package can construct and export trees to and from Python lists, dictionaries and Pandas DataFrames, integrating seamlessly with existing Python workflows.
Tree-like data structures can be used to show hierarchical relationships, such as family trees and organizational charts.
How to Implement Trees in Python
A tree is a data structure used to show a hierarchical relationship, such as an organizational chart. It can be created in Python using the bigtree package,
This article will introduce basic tree concepts, how to construct trees with the bigtree
Python package, tree traversal, search, modification and export methods. We’ll finish with ways to use trees for to-do list implementation, and extend tree implementation to trie and directed acyclic graph data structures.
Table of contents:
- Tree basics and terminologies
- Bigtree setup
- Constructing trees
- Tree traversal algorithms
- Tree search methods
- Tree modification methods
- Exporting trees
- Using to-do list with
bigtree
- Extending to Trie
- Directed acyclic graph (DAG)
What Are Trees in Python?
Trees are non-linear data structures that store data hierarchically and are made up of nodes connected by edges. For example, in a family tree, a node would represent a person, and an edge would represent the relationship between two nodes.
There are a few terminologies that extend to these components:
- Root: A node that doesn’t have any parent, and the entire tree originates from it. In Fig. 1, the root is node
a
. - Leaf: Nodes that don’t have any child. In Fig. 1, leaf nodes are nodes
c
,d
, ande
. - Parent node: Immediate predecessor of a node. In Fig. 1, node a is the parent node of nodes
b
andc
. - Child node: Immediate successor of a node. In Fig. 1, Node
b
is the child node of Nodea
. - Ancestors: All predecessors of a node. In Fig. 1, nodes
a
andb
are ancestors of noded
. - Descendants: All successors of a node. In Fig. 1, nodes
d
ande
are descendants of nodeb
. - Siblings: Nodes that have the same parent. In Fig. 1, node
d
ande
are siblings. - Left sibling: Sibling to the left of the node. In Fig. 1, node
d
is the left sibling of nodee
. - Right sibling: Sibling to the right of the node. In Fig. 1, node
e
is the right sibling of noded
. - Depth: Length of the path from node to root. In Fig. 1, the depth of node
b
is 2, and Noded
is 3. - Height/Max depth: Maximum depth of root to a leaf node. In Fig. 1, the height of the tree is 3.
How to Setup Bigtree in Python
Bigtree is easy to set up: simply run the following command on Terminal.
$ pip install bigtree
If you want to export trees to image, run the following command on Terminal instead.
$ pip install 'bigtree[image]'
Without further ado, let’s dive into implementing trees!
How to Create Trees in Python
To construct trees, we must first define the nodes and link the nodes by specifying the parent and children of the node.
For example, to construct a family tree:
from bigtree import Node
root = Node("a", age=90)
b = Node("b", age=65, parent=root)
c = Node("c", age=60, parent=root)
root.children
# (Node(/a/b, age=65), Node(/a/c, age=60))
root.depth, b.depth
# (1, 2)
root.max_depth
# 2
root.show(attr_list=["age"])
"""
a [age=90]
├── b [age=65]
└── c [age=60]
"""
root.hshow()
"""
┌─ b
─ a ─┤
└─ c
"""
In the example above, we define Node b
and c
to be children of Node a
with three lines of codes. We can also add attributes, such as the age attribute to nodes. To view the tree structure, we can use the show or hshow
method.
We can also query the root, leaves, parent, children, ancestors, descendants, siblings, left_sibling
, right_sibling
, depth and max_depth
of the nodes, as covered in the previous section.
The above method to define every node and edge can be manual and tedious. There are alternative ways to construct trees with lists, dictionaries and Pandas DataFrame.
If there are no node attributes, the simplest way to construct a tree would be using Python lists and the list_to_tree
method.
from bigtree import list_to_tree
path_list = ["a/b/d", "a/b/e", "a/c"]
root = list_to_tree(path_list)
root.show()
"""
a
├── b
│ ├── d
│ └── e
└── c
"""
root.hshow()
"""
┌─ d
┌─ b ─┤
─ a ─┤ └─ e
└─ c
"""
If there are node attributes, it’s best to construct a tree with a dictionary or Pandas DataFrame, using dict_to_tree
and dataframe_to_tree
methods respectively.
from bigtree import dict_to_tree
path_dict = {
"a": {"age": 90},
"a/b": {"age": 65},
"a/c": {"age": 60},
"a/b/d": {"age": 40},
"a/b/e": {"age": 35},
}
root = dict_to_tree(path_dict)
root.show(attr_list=["age"])
"""
a [age=90]
├── b [age=65]
│ ├── d [age=40]
│ └── e [age=35]
└── c [age=60]
"""
import pandas as pd
from bigtree import dataframe_to_tree
path_data = pd.DataFrame(
[
["a", 90],
["a/b", 65],
["a/c", 60],
["a/b/d", 40],
["a/b/e", 35],
],
columns=["PATH", "age"],
)
root = dataframe_to_tree(path_data)
root.show(attr_list=["age"])
"""
a [age=90]
├── b [age=65]
│ ├── d [age=40]
│ └── e [age=35]
└── c [age=60]
"""
For more node attributes and methods and other data structures to tree methods, do refer to the bigtree Documentation.
Tree Traversal Algorithms in Python
There are two types of tree traversal, depth-first search (DFS) and breadth-first search (BFS),
- Depth-first search starts at the root and explores each branch to its leaf node before moving to the next branch.
- Breadth-first search starts at the root and explores every child node, and recursively does so for every node
Pre-Order Traversal (DFS, NLR)
Pre-order traversal is a DFS method that performs three steps recursively:
- Visit the current node (N).
- Recursively traversal the current node’s left subtree (L).
- Recursively traverse the current node’s right subtree (R).
For pre-order traversal, it will traverse the tree in Fig. 2 in the order:
['a', 'b', 'd', 'e', 'g', 'h', 'c', 'f']
Post-Order Traversal (DFS, LRN)
Post-order traversal is a depth-first search (DFS) method that performs three steps recursively:
- Recursively traversal the current node’s left subtree (L).
- Recursively traverse the current node’s right subtree (R)
- Visit the current node (N).
For post-order traversal, it will traverse the tree in Fig. 3 in the order:
['d', 'g', 'h', 'e', 'b', 'f', 'c', 'a']
Level-Order Traversal (BFS)
Level-order traversal is a breadth-first search method.
For level-order traversal, it will traverse the tree in Fig. 3 in the order:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Level-Order Group Traversal (BFS)
Level-order group traversal is similar to level-order traversal, with the difference being that each level will be returned as a nested list; list[idx]
denotes the items in depth idx + 1
.
For level-order group traversal, it will traverse the tree in Fig. 3 in the order:
[['a'], ['b', 'c'], ['d', 'e', 'f'], ['g', 'h']]
There are other traversal algorithms available on bigtree
, namely:
- In-order traversal (DFS, LNR): Applicable to binary trees and not generic trees.
- Zig-zag traversal (BFS): Similar to level-order traversal, but in a zig-zag manner across different levels.
- Zig-zag group traversal (BFS): Similar to level-order group traversal, but in a zig-zag manner across different levels.
Python Tree Search Methods
We can use tree search methods to get one or multiple nodes that fulfill certain criteria, with methods find for one node and findall
for multiple nodes.
from bigtree import Node, find, findall
root = Node("a", age=90)
b = Node("b", age=65, parent=root)
c = Node("c", age=60, parent=root)
d = Node("d", age=40, parent=b)
root.show(attr_list=["age"])
"""
a [age=90]
├── b [age=65]
│ └── d [age=40]
└── c [age=60]
"""
find(root, lambda node: node.age == 65)
# Node(/a/b, age=65)
findall(root, lambda node: node.age >= 60)
# (Node(/a, age=90), Node(/a/b, age=65), Node(/a/c, age=60))
For generic search methods without defining a lambda function, there are built-in methods:
find_attr
andfind_attrs
: Find one/multiple nodes by attribute.find_name
andfind_names
: Find one/multiple nodes by name.find_path
andfind_paths
: Find one/multiple nodes by full or partial paths.find_full_path
: Find one node by their full path.find_relative_path
: Find nodes matching relative path, supports unix folder expression and wildcards.find_child
andfind_children
: Find one/multiple children nodes by custom condition, removes the need for searching the whole tree.find_child_by_name
: Find one child node by their name.
Tree Modification Methods in Python
bigtree
supports cases where nodes must be shifted or copied from a location to a destination. For instance, we can shift and reorder nodes in a to-do list implementation.
from bigtree import Node, shift_nodes
root = Node("List")
groceries = Node("Groceries", parent=root)
urgent = Node("Urgent", parent=root)
groceries_milk = Node("Milk", parent=groceries)
shift_nodes(
root,
from_paths=["List/Groceries/Milk"],
to_paths=["List/Urgent/Milk"],
)
root.show()
# List
# ├── Groceries
# └── Urgent
# └── Milk
There are also other tree modifications methods, such as:
copy_nodes
: Make a copy of the node from location to destination, node will exist in two locations.shift_and_replace_nodes
: Shift node from location to replace node in destination.copy_nodes_from_tree_to_tree
: Make a copy of the node from one tree to another tree. The node will exist in two trees.copy_and_replace_nodes_from_tree_to_tree
: Make a copy of the node from one tree and replace node in another tree.
Exporting Trees in Python
As mentioned at the start of the article, bigtree
integrates seamlessly with Python dictionaries and Pandas DataFrame. Trees can be exported to a dictionary, nested dictionary, Pandas DataFrame and more formats.
Printing Tree to Console
Given a tree, we can print the tree to the console using print_tree with the ability to specify the attributes to print and the style of the tree. More customizations are also available in the bigtree Documentation.
from bigtree import print_tree
print_tree(root, attr_list=["age"], style="ascii")
"""
a [age=90]
|-- b [age=65]
| |-- d [age=40]
| +-- e [age=35]
| |-- g [age=10]
| +-- h [age=6]
+-- c [age=60]
+-- f [age=38]
"""
For a generator method, you can use the yield_tree
method instead.
Export Tree to Dictionary
Given a tree, we can export the tree to a dictionary using tree_to_dict
with the ability to store all attributes with names as-is or map tree attributes to custom attribute names using a dictionary.
from bigtree import tree_to_dict
tree_to_dict(root, all_attrs=True)
# {
# '/a': {'name': 'a', 'age': 90},
# '/a/b': {'name': 'b', 'age': 65},
# '/a/b/d': {'name': 'd', 'age': 40},
# '/a/b/e': {'name': 'e', 'age': 35},
# '/a/b/e/g': {'name': 'g', 'age': 10},
# '/a/b/e/h': {'name': 'h', 'age': 6},
# '/a/c': {'name': 'c', 'age': 60},
# '/a/c/f': {'name': 'f', 'age': 38}
# }
tree_to_dict(root, attr_dict={"age": "AGE"})
# {
# '/a': {'name': 'a', 'AGE': 90},
# '/a/b': {'name': 'b', 'AGE': 65},
# '/a/b/d': {'name': 'd', 'AGE': 40},
# '/a/b/e': {'name': 'e', 'AGE': 35},
# '/a/b/e/g': {'name': 'g', 'AGE': 10},
# '/a/b/e/h': {'name': 'h', 'AGE': 6},
# '/a/c': {'name': 'c', 'AGE': 60},
# '/a/c/f': {'name': 'f', 'AGE': 38}
# }
The original tree can be reconstructed back using the dict_to_tree
method.
Export Tree to DataFrame
Given a tree, we can export the tree to a DataFrame using tree_to_dataframe with the ability to store all attributes as columns with names as-is, or map tree attributes to custom column names using a dictionary.
from bigtree import tree_to_dataframe
tree_to_dataframe(root, all_attrs=True)
# path name age
# 0 /a a 90
# 1 /a/b b 65
# 2 /a/b/d d 40
# 3 /a/b/e e 35
# 4 /a/b/e/g g 10
# 5 /a/b/e/h h 6
# 6 /a/c c 60
# 7 /a/c/f f 38
The original tree can be reconstructed back using the dataframe_to_tree
method.
Export Tree to Image
Given a tree, we can export the tree to an image or other graphics or files using tree_to_dot
. This uses pydot
under the hood, which uses Dot language and can be interfaced with Graphviz.
from bigtree import tree_to_dot
graph = tree_to_dot(root)
graph.write_png("tree.png")
graph.write_dot("tree.dot")
graph.to_string()
In the code snippet above, graph is of pydot.Dot
data type, which has built-in implementation to write to dot, PNG, SVG file formats, and more. The output is similar to Fig. 3.
There are other export methods available, including:
tree_to_nested_dict
: Export to nested dictionary, original tree can be reconstructed back usingnested_dict_to_tree
.tree_to_newick
: Export to Newick notation, original tree can be reconstructed back usingnewick_to_tree
.tree_to_mermaid
: Export to mermaid Markdown texttree_to_pillow
: Export to Pillow image.
Using To-Do List With bigtree
If at this point you’re still wondering what you can do with bigtree
, bigtree
comes with a built-in to-do list workflow with the ability to import and export from a JSON file.
This to-do list implementation has three levels — app
name, list name and item name. You can add lists to app
or add items to list. For example:
from bigtree import AppToDo
app = AppToDo("To-Do App")
app.add_item(item_name="Homework 1", list_name="School")
app.add_item(item_name=["Milk", "Bread"], list_name="Groceries", description="Urgent")
app.add_item(item_name="Cook")
app.show()
# To Do App
# ├── School
# │ └── Homework 1
# ├── Groceries
# │ ├── Milk [description=Urgent]
# │ └── Bread [description=Urgent]
# └── General
# └── Cook
app.save("list.json")
app2 = AppToDo.load("list.json")
In the example above:
- App name refers to
To-Do App
. - List name refers to
School
,Groceries
, andGeneral
. - Item name refers to
Homework 1
,Milk
,Bread
andCook
.
Extending to Trie in Python
Trie is a type of k-ary search tree used for storing and searching a specific key from a set, derived from the word reTRIEval. Trie can be used to sort a collection of strings alphabetically or search if a prefix is present for a string.
To extend bigtree
with Trie, we can add the leading symbol ^
and trailing symbol $
to each word, and use tree search methods to find a specific word or subword with find_path
. A Trie can be constructed as such:
from bigtree import list_to_tree, find_path
bag_of_words = ["ant", "and", "angry", "buoy", "buoyant"]
list_of_paths = ["/".join(["^"] + list(x) + ["$"]) for x in bag_of_words]
list_of_paths
# [
# "^/a/n/t/$",
# "^/a/n/d/$",
# "^/a/n/g/r/y/$",
# "^/b/o/y/$",
# "^/b/o/y/a/n/t/$"
# ]
trie = list_to_tree(list_of_paths)
find_path(trie, "/".join(list("^boy$")))
The code snippet above results in the image Fig. 4 if exported using the tree_to_dot
method.
Directed Acyclic Graph (DAG)
Directed acyclic graph (DAG) is a graph data structure where each node can have more than one parent. A tree is considered a restricted form of a graph. This results in the following differences:s
- Root: There is no concept of root in DAG since a node can have multiple parents.
- Depth: There is no concept of depth in DAG since there is no root node.
- Height/Max depth: There is no concept of height in DAG since there is no root node.
DAGs are best used to represent workflows as workflows have a certain order (directed), and do not repeat indefinitely, i.e. it doesn’t not have loops (acyclic).
Similar to trees, DAGs in bigtree
can be constructed manually, from Python lists, dictionaries or Pandas DataFrames with methods list_to_dag
, dict_to_dag
, and dataframe_to_dag
respectively.
from bigtree import DAGNode, dag_to_dot
a = DAGNode("Ingest Data from Source A")
b = DAGNode("Ingest Data from Source B")
c = DAGNode("Clean data", parents=[a, b])
d = DAGNode("Feature Engineering", parents=[c])
graph = dag_to_dot(a, node_colour="gold")
graph.write_png("dag.png")
The above code snippet results in the following image:
With that, I hope you have gained a better understanding of tree structures and how to implement them using the bigtree
Python package.