Rails is generally used for building web applications that sit on top of relational databases. Relational databases work for many kinds of data, but not all. What if we need to represent a hierarchical data structure such as a tree?

We often encounter hierarchical data in our domains: organization charts, file systems, messages and replies, tasks and sub-tasks (and sub-sub-tasks and …), etc. How do we represent these structures in our relational databases?

It turns out that there are a number of different approaches, and there are implementations of all of them in Rails-compatible Ruby gems. The approach we choose really depends on the needs of our application. Do we need efficient insertion and deletion? Do we need efficient access to the parent? The children? All ancestors? All descendants?

Earlier this year, I worked on a project that needed to represent two different kinds of hierarchical data, so I spent some time researching the various options.

The best resource I found while researching was this excellent presentation by Bill Karwin. Karwin outlines the various options for representing hierarchies in relational databases and discusses the tradeoffs.

Below, I’ll use the generic term “node” to refer to an object in the tree. This might be a task, a department, a file or folder, a message, or whatever you’re representing.

Karwin identifies four different approaches for representing trees.

Adjacency List

The Adjacency List approach is the simplest and most obvious. Each node knows its own parent (so has a parent_id column).

With this approach:

  • Insertions and deletions are simple and efficient.
  • Moving nodes is also simple and efficient.
  • Finding the parent or children of a node is relatively efficient.
  • Inefficient or impossible to query all ancestors or descendants.

The best-known Ruby gem for this approach is acts_as_tree.

Path Enumeration (or Materialized Path)

The Path Enumeration approach (also known as Materialized Path) is similar to the Adjacency List approach. However, instead of each node knowing its parent, each node knows its full path from the root node. For example, a node that is three levels deep in the tree might have a path column with a value like 1/42/ or -1-42- or similar.

With this approach:

  • Insertions and moves are slightly harder than with Adjacency List, but still reasonable.
  • Finding ancestors and descendants is very efficient.
  • Finding direct children is a bit harder.

The best known Ruby gem for this approach is ancestry, but for the project I worked on, we found arboreal to be a better fit for our needs.

Nested Sets

The Nested Set approach encodes a node’s descendants using two numbers. The left number of a node is smaller than all of the numbers used by its descendants, and the right number is greater than all of the numbers used by its descendants.

This approach can be combined with the Adjacency List approach so that each node also knows its parent.

With this approach:

  • Insertions, moves, and deletions are complicated and inefficient because entire sub-trees need to be renumbered.
  • Finding ancestors and descendants is very efficient.
  • Finding direct children is harder.

The best known Ruby gem for this approach is awsome_nested_set.

I haven’t worked on an application where Nested Sets made sense. To me, they seem to have more downsides than upsides, and yet they seem to be very popular. Awesome nested set is the most popular gem in the Active Record Nesting category on Ruby Toolbox.

Closure Table

The Closure Table approach uses a second table to track every single relationship in the tree, including that of a node to itself. This table has ancestor_id and descendant_id columns. You can optionally add a depth column to indicate how many generations apart the nodes are. A depth of 0 would be the node’s relationship to itself; a depth of 2 would be a grandparent/grandchild relationship.

With this approach:

  • Insertions, moves, and deletions are relatively simple and efficient.
  • Finding ancestors and descendants is very efficient.
  • Finding direct parent or child is a bit harder, but can be simplified by using the depth column.

The biggest downside to this approach is the size of the extra table. This table requires O(n2) rows for a fully-connected tree. In practice, trees aren’t fully connected so you need fewer rows. There are also potential locking issues with that table in a multi-threaded environment.

In the project I was on, we seriously considered this approach. In the end, we were concerned about contention on the extra table and decided to go with the Path Enumeration approach instead.

The best known Ruby gem for this approach is closure_tree.

Conclusion

This should give you a sense of the landscape when it comes to representing tree structures in your Rails app. Again, I highly recommend looking at Bill Karwin’s presentation, reading the READMEs of the various gems, and figuring out which approach is best for your needs.