When I was at Rogue Rails last month, our team was working on a story-tracking application. At one point, we had a pair of Cucumber specs that looked like this:

Story Specs
Scenario: Create a story
Given I'm at the New Story page
When I fill in "In order to" with "Goal"
And I fill in "As a" with "Stakeholder"
And I fill in "I want to" with "Behavior"
And I click "Save Story"
Then I will see "Stories"
And I will see "Save successful"
And I will see "Behavior"
Scenario: Edit a story
Given a story with behavior "Some behavior"
And I visit the story list page
When I click "Some behavior"
And I fill in "I want to" with "Other behavior"
And I click "Save Story"
Then I will see "Stories"
And I will see "Save successful"
And I will see "Other behavior"

They’re not the greatest specs in the world, but they’re handy for illustrating the point I want to make.

As you can see, the last five lines of the spec are nearly identical. This is a good place to remove some duplication.

There are at least two ways to approach such a refactoring.

The first is a structural approach, where we just look at the structure of the code and mechanically remove all of the duplication. Using this approach, we’d extract all five lines to a higher-level step definition and completely eliminate the duplication.

I’m not sure that’s even possible in Cucumber, because the duplicated lines span part of the When and the Then sections of the spec. Even if it is possible, how would we word it? What kind of name could we give those five lines that would communicate what they do?

The second approach is a more conceptual approach. In this approach, we notice that the duplication of And I fill in "I want to" with "<something>" is really somewhat incidental. In the Create a story scenario, that line really belongs with the lines before it where we’re filling in the rest of the form.

The duplication of the And I click "Save Story" line is less incidental, but is also the core action of the spec and so shouldn’t necessarily be extracted.

However, the three parts of the Then section are very much related to each other and should be extracted. In our case, we refactored to this:

Refactored Specs
Scenario: Create a story
Given I'm at the New Story page
When I fill in "In order to" with "Goal"
And I fill in "As a" with "Stakeholder"
And I fill in "I want to" with "Behavior"
And I click "Save Story"
Then the story "Behavior" is saved successfully
Scenario: Edit a story
Given a story with behavior "Some behavior"
And I visit the story list page
When I click "Some behavior"
And I fill in "I want to" with "Other behavior"
And I click "Save Story"
Then the story "Other behavior" is saved successfully

In my experience, conceptual refactorings tend to turn out better in the long run. The concepts that bind parts of the code to each other tend to last longer than incidental structural similarity at this fine grain size.

It takes a deeper understanding of the code to be able to find the right concepts to apply. When working on an unfamiliar legacy codebase, some simple structural refactorings are often needed in order to even begin to get a handle on things. In that case, I recommend starting with the obvious structural refactorings, but pay attention and learn as much as you can while doing them so that you can begin to move to more conceptual refactorings over time.

The structural vs conceptual divide happens at coarser grain sizes as well, all the way up to overall system design and architecture.

Often, we decide to architect our systems around structural considerations: “these parts of the system do the same kinds of things, so they should go together.” This gives us client-server and N-tier architectures, for example.

The alternative would be to architect our systems around conceptual considerations: “these parts of the system are about accomplishing goal X, so they should go together”. I’m sure there’s a name for such architectures, but I don’t know of one. I tend to see less of this style of system in the wild.

At the architecture level, I think it makes more sense to have structural divisions in the code, largely for reasons of infrastructure and deployment. In web applications, for example, it makes a lot of sense to separate what happens on the server from what happens on the client. However, when working on a system like this, I notice that every new feature I build has to touch many or all of the architectural layers and I wonder if there is a better way to do things.

I don’t have any good answers for that yet. Do you?