Posted over 2 years ago. Visible to the public. Repeats.

Storing trees in databases

This card compares patterns to store trees in a relation database like MySQL or PostgreSQL. Implementation examples are for the ActiveRecord ORM used with Ruby on Rails, but the techniques can be implemented in any language or framework.

We will be using this example tree (from the acts_as_nested_set docs):

Copy
root | +-- Child 1 | | | +-- Child 1.1 | | | +-- Child 1.2 | +-- Child 2 | +-- Child 2.1 | +-- Child 2.2

Option 1: Parent Association

The simplest way to serialize a tree is to give each node a parent_id column that contains the ID of the parent node.

This would serialize the example above to this table:

Copy
id | parent_id | data ---+-----------+---------- 1 | NULL | root 2 | 1 | Child 1 3 | 2 | Child 1.1 4 | 2 | Child 1.2 5 | 1 | Child 2 6 | 5 | Child 2.1 7 | 5 | Child 2.2

This pattern is so simple, you don't even need a library for it. With ActiveRecord you could implement it with a single belongs_to association:

Copy
class Node < ActiveRecord::Base belongs_to :parent, class: 'Node' end

Any modification of the tree (like adding a node or changing a node's parent) only affect a single row in the table, so changes are fast.

The big drawback of this pattern is that you need an unbounded number of queries to read from the tree. E.g. to select all descendants of Child 1 you would need to recursively query for children until you have reached all leaf nodes:

Copy
SELECT * FROM nodes WHERE parent_id=2 SELECT * FROM nodes WHERE parent_id=3 SELECT * FROM nodes WHERE parent_id=4

The number of queries grows with the depth of your tree.

Because of this limitation this pattern should generally be avoided.

Option 2: Materialized Path

With the Materialized Path pattern you would cache all ancestor IDs of each node in a string column.

This would serialize the example above to this table:

Copy
id | ancestry | data ---+----------+---------- 1 | NULL | root 2 | 1 | Child 1 3 | 1/2 | Child 1.1 4 | 1/2 | Child 1.2 5 | 1 | Child 2 6 | 1/5 | Child 2.1 7 | 1/5 | Child 2.2

Implementations like ancestry will maintain this ancestry column for you. A library will also give you convenient methods to cache this new column to effectively query your tree. E.g. we can select all descendants of Child 1 with a single query:

Copy
SELECT * FROM nodes WHERE ancestry = '1/2' OR ancestry LIKE '1/2/%'

Since LIKE queries can be indexed, this query can be fast.

Adding a node to the tree only affects a single row in the table, so changes are usually fast as well. However, changing a node's parent means updating the node and all of its descendants.

Overall Materialized Path is a well-rounded pattern that can be recommended for most use cases.

Option 3: Nested Set

The Nested Set pattern looks at the entire tree structure and assigns left and right boundary values to each node. The boundary values must contain the boundary values for all descendants.

The acts_as_nested_set docs provide us with this pieace of ASCII art that shows boundary values for the table above:

Copy
___________________________________________________________________ | Root | | ____________________________ ____________________________ | | | Child 1 | | Child 2 | | | | __________ _________ | | __________ _________ | | | | | C 1.1 | | C 1.2 | | | | C 2.1 | | C 2.2 | | | 1 2 3_________4 5________6 7 8 9_________10 11_______12 13 14 | |___________________________| |___________________________| | |___________________________________________________________________|

Note that the boundary values will not necessarily correspond to row IDs.

This would serialize the example above to this table:

Copy
id | parent_id | lft | rgt | data ---+-----------+-----+-----+---------- 1 | 0 | 1 | 14 | root 2 | 1 | 2 | 7 | Child 1 3 | 2 | 3 | 4 | Child 1.1 4 | 2 | 5 | 6 | Child 1.2 5 | 1 | 8 | 13 | Child 2 6 | 5 | 9 | 10 | Child 2.1 7 | 5 | 11 | 12 | Child 2.2

Implementations like awesome_nested_set will maintain this data structure for you. A library will also give you convenient methods to cache this new column to effectively query your tree. E.g. we can select all descendants of Child 1 with a single query:

Copy
SELECT * FROM nodes WHERE lft >= 2 AND rgt < 7 AND id != 2

By indexing the lft and rgt columns we can make this query very fast, since the database only needs to compare indexed numbers.

The big drawback of this pattern is that this structure is extremely expensive to maintain during writes. Adding a node or changing a node's parent might require reading and updating the entire tree.

Because of the complexity of this pattern, Nested Set is only recommended when you expect few writes and require maximum read performance. For all other cases prefer Materialized Path.

Option 4: Closure Table

The Closure Table pattern extends the Parent Association pattern by maintaining a separate table with hierarchy information. This second table can be used for efficient tree queries.

An ActiveRecord implementation of this pattern is closure_tree. It promises superior performance to Materialized Path with ancestry, but we haven't worked with it yet.

In any case this pattern requires you to maintain a second table. Reads will need to span two tables using JOINs or subselects, which costs some overhead.

makandra has been working exclusively with Ruby on Rails since 2007. Our laser focus on a single technology has made us a leader in this space.

Owner of this card:

Avatar
Henning Koch
Last edit:
9 months ago
by Jakob Scholz
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Henning Koch to makandra dev
This website uses cookies to improve usability and analyze traffic.
Accept or learn more