This card compares patterns to store trees in a relation database like MySQL or PostgreSQL. Implementation examples are for the ActiveRecord Show archive.org snapshot ORM used with Ruby on Rails, but the techniques can be implemented in any language or framework.
We will be using this example tree (from the acts_as_nested_set Show archive.org snapshot docs):
root
|
+-- Child 1
| |
| +-- Child 1.1
| |
| +-- Child 1.2
|
+-- Child 2
|
+-- Child 2.1
|
+-- Child 2.2
Option 1: Parent Association
The simplest way to serialize a tree is to give each node a parent_id
column that contains the ID of the parent node.
This would serialize the example above to this table:
id | parent_id | data
---+-----------+----------
1 | NULL | root
2 | 1 | Child 1
3 | 2 | Child 1.1
4 | 2 | Child 1.2
5 | 1 | Child 2
6 | 5 | Child 2.1
7 | 5 | Child 2.2
This pattern is so simple, you don't even need a library for it. With ActiveRecord you could implement it with a single belongs_to
association:
class Node < ActiveRecord::Base
belongs_to :parent, class: 'Node'
end
Any modification of the tree (like adding a node or changing a node's parent) only affect a single row in the table, so changes are fast.
The big drawback of this pattern is that you need an unbounded number of queries to read from the tree. E.g. to select all descendants of Child 1 you would need to recursively query for children until you have reached all leaf nodes:
SELECT * FROM nodes WHERE parent_id=2
SELECT * FROM nodes WHERE parent_id=3
SELECT * FROM nodes WHERE parent_id=4
The number of queries grows with the depth of your tree.
Because of this limitation this pattern should generally be avoided.
Option 2: Materialized Path
With the Materialized Path pattern you would cache all ancestor IDs of each node in a string column.
This would serialize the example above to this table:
id | ancestry | data
---+----------+----------
1 | NULL | root
2 | 1/ | Child 1
3 | 1/2/ | Child 1.1
4 | 1/2/ | Child 1.2
5 | 1/ | Child 2
6 | 1/5/ | Child 2.1
7 | 1/5/ | Child 2.2
Implementations like
ancestry
Show archive.org snapshot
will maintain this ancestry
column for you. A library will also give you convenient methods to cache this new column to effectively query your tree. E.g. we can select all descendants of Child 1 with a single query:
SELECT * FROM nodes WHERE ancestry = '1/2' OR ancestry LIKE '1/2/%'
Since LIKE
queries can be indexed, this query can be fast.
Adding a node to the tree only affects a single row in the table, so changes are usually fast as well. However, changing a node's parent means updating the node and all of its descendants.
Overall Materialized Path is a well-rounded pattern that can be recommended for most use cases.
Option 3: Nested Set
The Nested Set pattern looks at the entire tree structure and assigns left and right boundary values to each node. The boundary values must contain the boundary values for all descendants.
The acts_as_nested_set Show archive.org snapshot docs provide us with this pieace of ASCII art that shows boundary values for the table above:
___________________________________________________________________
| Root |
| ____________________________ ____________________________ |
| | Child 1 | | Child 2 | |
| | __________ _________ | | __________ _________ | |
| | | C 1.1 | | C 1.2 | | | | C 2.1 | | C 2.2 | | |
1 2 3_________4 5________6 7 8 9_________10 11_______12 13 14
| |___________________________| |___________________________| |
|___________________________________________________________________|
Note that the boundary values will not necessarily correspond to row IDs.
This would serialize the example above to this table:
id | parent_id | lft | rgt | data
---+-----------+-----+-----+----------
1 | 0 | 1 | 14 | root
2 | 1 | 2 | 7 | Child 1
3 | 2 | 3 | 4 | Child 1.1
4 | 2 | 5 | 6 | Child 1.2
5 | 1 | 8 | 13 | Child 2
6 | 5 | 9 | 10 | Child 2.1
7 | 5 | 11 | 12 | Child 2.2
Implementations like awesome_nested_set Show archive.org snapshot will maintain this data structure for you. A library will also give you convenient methods to cache this new column to effectively query your tree. E.g. we can select all descendants of Child 1 with a single query:
SELECT * FROM nodes WHERE lft >= 2 AND rgt < 7 AND id != 2
By indexing the lft
and rgt
columns we can make this query very fast, since the database only needs to compare indexed numbers.
The big drawback of this pattern is that this structure is extremely expensive to maintain during writes. Adding a node or changing a node's parent might require reading and updating the entire tree.
Because of the complexity of this pattern, Nested Set is only recommended when you expect few writes and require maximum read performance. For all other cases prefer Materialized Path.
Option 4: Closure Table
The Closure Table pattern Show archive.org snapshot extends the Parent Association pattern by maintaining a separate table with hierarchy information. This second table can be used for efficient tree queries.
An ActiveRecord implementation of this pattern is closure_tree. It promises superior performance to Materialized Path with ancestry Show archive.org snapshot , but we haven't worked with it yet.
In any case this pattern requires you to maintain a second table. Reads will need to span two tables using JOIN
s or subselects, which costs some overhead.