I was recently working on a Rails project where we needed to represent a hierarchical (tree-like) data structure in the database. We had chosen the Ancestry Ruby gem to help us.
Ancestry uses the
pattern, storing the ids of the ancestors of each entity in a single
column separated by
/ characters. So, a child three levels deep in
the tree might have an
ancestry column with a value of
We needed to add another feature to the application that also required a tree structure. Unfortunately, the Ancestry gem was not a good fit for this new feature, so we looked around for alternatives. After several experiments, we settled on Arboreal. We didn’t want to maintain two separate gems doing essentially the same job, so we wanted to convert our existing use of Ancestry to Arboreal as well as using Arboreal for the new feature.
Like Ancestry, Arboreal uses the materialized path pattern, but
separates the ancestor IDs with
- characters and includes a leading
-. So the same child as above would have an
column with a value of
Both gems do a good job of hiding this internal representation, and both provide a nice API for accessing the elements of the tree. However, the internal representation is stored in a column in the database, and that makes it difficult to truly hide it from other code.
In our case, we had written code that accessed the
JSON API. There were several places in both the front- and back-end
code that were splitting the strings to extract the ancestor IDs.
This leakage of the internal representation made the job of converting to Arboreal more difficult than it needed to be.
We were able to refactor the back-end code to use the Ancestry API and then convert it to the Arboreal API. Fortunately, the two APIs are very similar so this wasn’t too difficult.
but we changed the JSON API to return an array of ancestor IDs
rather than a string. In fact, both gems have an
method that does exactly that. Fortunately, this was an internal API
only, so we didn’t have to worry about versioning issues or other
Once we finished this refactoring, we were able to convert to Arboreal and move forward with our new feature.
We really shouldn’t have had to do this refactoring. The code we
encountered should not have been exposing the internal representation
ancestry column. Instead, it should have decoupled itself
from that representation as quickly as possible, making all of the
other code independent of it, and thus protected from changes like the
one we needed to make.
The next time you encounter a situation where you’re tempted to take the easy road and expose an internal data representation, stop and think. Is there a way you can hide that representation behind an interface or convert it to a more generic representation that makes more sense to consume?