This post is part of an ongoing series about Unit Testing.

Introduction

In my previous post in this series, I talked about the different layers of tests that developers write. In this post, I’d like to focus on the outermost layer, commonly called “Acceptance Tests”, “Customer Tests”, or “Story Tests”. These are the tests that help us show our stakeholders that the system is doing what they’d like it to do.

In agile software development, the ideal has always been that our stakeholders write (or at least specify) these tests for us, often with the assistance of other people such as testing professionals. This happens much less frequently than we’d like. Robert Martin (a.k.a. Uncle Bob) has written a lovely rant on the subject.

Whether we can convince our stakeholders to write the tests or not, it is still important to get some kind of definition of what they expect to see out of each feature we implement. This definition is often called “acceptance criteria” or “the definition of done”.

The Sample Application

For the next few posts, I’m going use a simple application to illustrate. The application should do two things:

  1. Convert an amount of money from one currency to another.

  2. Show a list of all currencies supported by the application.

For this post, I’m going to use Cucumber, but the same ideas apply whether you’re using FitNesse or even just your unit-testing framework to write these tests.

Note that I haven’t said anything yet about the form this application should take; I’ve only talked about the problem to be solved.

Specs for a Command-Line Application

If our stakeholder wants a command-line application, a typical set of Cucumber specs might look like this:

Specs for Command-Line Version
Feature: Currency Exchange
Scenario: simple currency exchange
Given the exchange rate for 1 USD is 0.91 EUR
When I run "currencyfx 100 --from USD --to EUR"
Then I should see
"""
100 USD => 91.00 EUR
"""
Scenario: reverse currency exchange
Given the exchange rate for 1 USD is 0.91 EUR
When I run "currencyfx 100 --from EUR --to USD"
Then I should see
"""
100 EUR => 109.89 USD
"""
Scenario: compound currency exchange
Given the exchange rate for 1 USD is 0.91 EUR
And the exchange rate for 1 USD is 1.23 CAD
When I run "currencyfx 100 --from CAD --to EUR"
Then I should see
"""
100 CAD => 73.98 EUR
"""
Feature: Currency List
Scenario: currency list
Given the following currencies exist:
| symbol | description |
| USD | United States Dollars |
| CAD | Canadian Dollars |
| EUR | European Union Euros |
When I run "currencyfx --list"
Then I should see
"""
| Symbol | Description |
==================================
| CAD | Canadian Dollars |
| EUR | European Union Euros |
| USD | United States Dollars |
"""

There are a number of issues with these specs, but before we discuss those, let’s look at some typical specs for a web-based version of this application.

Specs for a Web-Based Application

Specs for Web-Based Version
Feature: Currency Exchange
Scenario: simple currency exchange
Given the exchange rate for 1 USD is 0.91 EUR
And I visit the main page
And I fill in "amount" with 100
And I select "USD" from "source_currency"
And I select "EUR" from "target_currency"
When I press "Convert"
Then the page should show "91.00 EUR"
Scenario: reverse currency exchange
Given the exchange rate for 1 USD is 0.91 EUR
And I visit the main page
And I fill in "amount" with 100
And I select "EUR" from "source_currency"
And I select "USD" from "target_currency"
When I press "Convert"
Then the page should show "109.89 USD"
Scenario: compound currency exchange
Given the exchange rate for 1 USD is 0.91 EUR
And the exchange rate for 1 USD is 1.23 CAD
And I visit the main page
And I fill in "amount" with 100
And I select "CAD" from "source_currency"
And I select "EUR" from "target_currency"
When I press "Convert"
Then the page should show "73.98 EUR"
Feature: Currency List
Scenario: currency list
Given the following currencies exist:
| symbol | description |
| USD | United States Dollars |
| CAD | Canadian Dollars |
| EUR | European Union Euros |
And I visit the main page
And I click the "supported_currencies" link
Then the page should show "CAD"
And the page should show "Canadian Dollars"
And the page should show "Eur"
And the page should show "European Union Euros"
And the page should show "USD"
And the page should show "United States Dollars"

Evaluation

Now that we’ve looked at both versions of the specs, let’s evaluate them.

The Good

  1. We are testing both of the features of the application. We have a way of demonstrating to our stakeholders that the functionality they’re looking for is present and working.

  2. All of the necessary information to understand the specs is present. We’re supplying the exchange rates and currency lists right in the specs and not relying on magic numbers or on some external system that returns variable information. Depending on how we implement this under the hood, it might mean that these specs are a little less end-to-end than we’d like, but it does make the specs very clear about what’s going on.

  3. The specs provide documentation about how the system works. For the command-line version, we see the exact command-line arguments to use, and the exact output format. For the web-based version, we can see that we’re interacting with a form of some kind, filling in fields and clicking buttons and/or links.

The Bad

  1. There are no supporting descriptions or text to tell us about the features we’re testing. At the very least, there should be a description of the feature. We don’t want to write a novel, but if we’re going to take the time to use a tool like Cucumber to write an “executable specification” for our application, we should at least include a bit of descriptive text.

  2. There are too many tests. The Currency List spec is OK, but the Currency Exchange spec is testing three different special cases. In most systems, the tests at the outer layer are the slowest of any tests we have in our system, so we want relatively fewer of them. Recall the Test Pyramid I discussed in my previous post. We don’t need to test all of the special cases at this layer; we want just enough to show that the system as a whole is essentially working. If there are complex business rules that need to be tested exhaustively at this level, consider using table-based tests to get rid of the noise. Cucumber has scenario outlines to support this, and FitNesse is all about table-based tests.

  3. The specs are too coupled to implementation details. While they do provide clear documentation about how to use the system, that documentation may not be needed at this level. If it is, then we should find a maintainable way to include it. But it is more likely that our business stakeholders don’t care about this level of detail; they just want to know that the features they’ve asked for are present. By coupling so heavily to the implementation (command-line format, output format, web form layout), we’ve made our specs much more brittle and hard to maintain.

  4. Lots of extra noise, especially in the web-based version. The specs for the web-based version of the application are written like scripts that a manual tester would follow, and that’s almost never a good idea for automated tests. Script-y tests are extremely verbose and brittle, and they hide the most valuable parts of the test.

These are all problems we can fix, but there is a much larger over-arching issue with these specs.

The Ugly

Why should the outermost layer of specs for our application have to change when we change the underlying structure? Why should we write two different kinds of specs for the exact same system just because we changed from command-line to web-based? The fundamental features of the application haven’t changed, so why should the specs?

Even more, when we use a tool like Cucumber or FitNesse, we should be able to port our application to a completely different programming language and keep the same specs.

Certainly the step definitions (for Cucumber) or fixture code (for FitNesse) will have to adapt when we change languages or implementation platform. But the specs themselves should be able to survive all of those changes.

I think of the step definitions and fixtures as a shearing layer. They allow the actual specs to stay fixed even while the underlying system changes.

Given this goal and the other issues mentioned above, let’s write different specs for our application.

Better Specs

Better Specs
Feature: Currency Exchange
Allow the exchange of amounts of money from one
currency to another.
Scenario: basic currency exchange
Given the exchange rate for 1 USD is 0.91 EUR
When I convert 100 from USD to EUR
Then I should get 91.00 EUR
Feature: Currency List
Show the list of supported currencies in alphabetical order
by currency symbol.
Scenario: currency list
Given the following currencies exist:
| symbol | description |
| USD | United States Dollars |
| CAD | Canadian Dollars |
| EUR | European Union Euros |
When I ask for a currency list
Then I should see currencies and descriptions in this order:
| symbol | description |
| CAD | Canadian Dollars |
| EUR | European Union Euros |
| USD | United States Dollars |

Note that:

  • We’re still testing both features of the application

  • The necessary information is still available; we’re still supplying the exchange rates and currency list to avoid depending on an external service.

  • We’ve lost the usage documentation aspect of the original tests; we may need to add that back in another form if our stakeholders need it (possibly at a lower layer of tests).

  • We’ve added brief descriptions to both features.

  • We’ve eliminated some of the currency exchange specs; we really just need one case to make sure everything is basically working; we’ll test special cases at lower layers as necessary.

  • We’ve completely decoupled the specs from implementation details. We’re using generic language like “When I convert” and “When I ask for a currency list”.

  • We’ve eliminated the noise from the specs. They are now a fairly concise description of the features.

  • Most importantly, these specs will work for any kind of application that implements these features. Again, the step definitions will change, but the specs themselves will not.

Conclusion

When you’re writing tests for the outermost layer of your system, continually ask yourself, “Would these specs survive if I significantly changed the underlying implementation of the system?” If the answer is “No”, then try to write more generic specs. Think about “what” the system is going to do, rather than “how” it’s going to do it or how the end user will interact with it.