I was recently working on a Rails project where we needed to represent a hierarchical (tree-like) data structure in the database. We had chosen the Ancestry Ruby gem to help us.

Ancestry uses the materialized path pattern, storing the ids of the ancestors of each entity in a single column separated by / characters. So, a child three levels deep in the tree might have an ancestry column with a value of 42/58.

We needed to add another feature to the application that also required a tree structure. Unfortunately, the Ancestry gem was not a good fit for this new feature, so we looked around for alternatives. After several experiments, we settled on Arboreal. We didn’t want to maintain two separate gems doing essentially the same job, so we wanted to convert our existing use of Ancestry to Arboreal as well as using Arboreal for the new feature.

Like Ancestry, Arboreal uses the materialized path pattern, but separates the ancestor IDs with - characters and includes a leading and trailing -. So the same child as above would have an ancestry column with a value of -42-58-.

Both gems do a good job of hiding this internal representation, and both provide a nice API for accessing the elements of the tree. However, the internal representation is stored in a column in the database, and that makes it difficult to truly hide it from other code.

In our case, we had written code that accessed the ancestry column directly and even passed it through to the JavaScript front-end via a JSON API. There were several places in both the front- and back-end code that were splitting the strings to extract the ancestor IDs.

This leakage of the internal representation made the job of converting to Arboreal more difficult than it needed to be.

We were able to refactor the back-end code to use the Ancestry API and then convert it to the Arboreal API. Fortunately, the two APIs are very similar so this wasn’t too difficult.

The front-end code couldn’t use either API since it was in JavaScript, but we changed the JSON API to return an array of ancestor IDs rather than a string. In fact, both gems have an ancestor_ids method that does exactly that. Fortunately, this was an internal API only, so we didn’t have to worry about versioning issues or other clients.

Once we finished this refactoring, we were able to convert to Arboreal and move forward with our new feature.

We really shouldn’t have had to do this refactoring. The code we encountered should not have been exposing the internal representation of the ancestry column. Instead, it should have decoupled itself from that representation as quickly as possible, making all of the other code independent of it, and thus protected from changes like the one we needed to make.

The next time you encounter a situation where you’re tempted to take the easy road and expose an internal data representation, stop and think. Is there a way you can hide that representation behind an interface or convert it to a more generic representation that makes more sense to consume?