This post is part of an ongoing series about Unit Testing.
In my previous post in this series, I talked about the different layers of tests that developers write. In this post, I’d like to focus on the outermost layer, commonly called “Acceptance Tests”, “Customer Tests”, or “Story Tests”. These are the tests that help us show our stakeholders that the system is doing what they’d like it to do.
In agile software development, the ideal has always been that our stakeholders write (or at least specify) these tests for us, often with the assistance of other people such as testing professionals. This happens much less frequently than we’d like. Robert Martin (a.k.a. Uncle Bob) has written a lovely rant on the subject.
Whether we can convince our stakeholders to write the tests or not, it is still important to get some kind of definition of what they expect to see out of each feature we implement. This definition is often called “acceptance criteria” or “the definition of done”.
The Sample Application
For the next few posts, I’m going use a simple application to illustrate. The application should do two things:
Convert an amount of money from one currency to another.
Show a list of all currencies supported by the application.
Note that I haven’t said anything yet about the form this application should take; I’ve only talked about the problem to be solved.
Specs for a Command-Line Application
If our stakeholder wants a command-line application, a typical set of Cucumber specs might look like this:
There are a number of issues with these specs, but before we discuss those, let’s look at some typical specs for a web-based version of this application.
Specs for a Web-Based Application
Now that we’ve looked at both versions of the specs, let’s evaluate them.
We are testing both of the features of the application. We have a way of demonstrating to our stakeholders that the functionality they’re looking for is present and working.
All of the necessary information to understand the specs is present. We’re supplying the exchange rates and currency lists right in the specs and not relying on magic numbers or on some external system that returns variable information. Depending on how we implement this under the hood, it might mean that these specs are a little less end-to-end than we’d like, but it does make the specs very clear about what’s going on.
The specs provide documentation about how the system works. For the command-line version, we see the exact command-line arguments to use, and the exact output format. For the web-based version, we can see that we’re interacting with a form of some kind, filling in fields and clicking buttons and/or links.
There are no supporting descriptions or text to tell us about the features we’re testing. At the very least, there should be a description of the feature. We don’t want to write a novel, but if we’re going to take the time to use a tool like Cucumber to write an “executable specification” for our application, we should at least include a bit of descriptive text.
There are too many tests. The
Currency Listspec is OK, but the
Currency Exchangespec is testing three different special cases. In most systems, the tests at the outer layer are the slowest of any tests we have in our system, so we want relatively fewer of them. Recall the Test Pyramid I discussed in my previous post. We don’t need to test all of the special cases at this layer; we want just enough to show that the system as a whole is essentially working. If there are complex business rules that need to be tested exhaustively at this level, consider using table-based tests to get rid of the noise. Cucumber has scenario outlines to support this, and FitNesse is all about table-based tests.
The specs are too coupled to implementation details. While they do provide clear documentation about how to use the system, that documentation may not be needed at this level. If it is, then we should find a maintainable way to include it. But it is more likely that our business stakeholders don’t care about this level of detail; they just want to know that the features they’ve asked for are present. By coupling so heavily to the implementation (command-line format, output format, web form layout), we’ve made our specs much more brittle and hard to maintain.
Lots of extra noise, especially in the web-based version. The specs for the web-based version of the application are written like scripts that a manual tester would follow, and that’s almost never a good idea for automated tests. Script-y tests are extremely verbose and brittle, and they hide the most valuable parts of the test.
These are all problems we can fix, but there is a much larger over-arching issue with these specs.
Why should the outermost layer of specs for our application have to change when we change the underlying structure? Why should we write two different kinds of specs for the exact same system just because we changed from command-line to web-based? The fundamental features of the application haven’t changed, so why should the specs?
Even more, when we use a tool like Cucumber or FitNesse, we should be able to port our application to a completely different programming language and keep the same specs.
Certainly the step definitions (for Cucumber) or fixture code (for FitNesse) will have to adapt when we change languages or implementation platform. But the specs themselves should be able to survive all of those changes.
I think of the step definitions and fixtures as a shearing layer. They allow the actual specs to stay fixed even while the underlying system changes.
Given this goal and the other issues mentioned above, let’s write different specs for our application.
We’re still testing both features of the application
The necessary information is still available; we’re still supplying the exchange rates and currency list to avoid depending on an external service.
We’ve lost the usage documentation aspect of the original tests; we may need to add that back in another form if our stakeholders need it (possibly at a lower layer of tests).
We’ve added brief descriptions to both features.
We’ve eliminated some of the currency exchange specs; we really just need one case to make sure everything is basically working; we’ll test special cases at lower layers as necessary.
We’ve completely decoupled the specs from implementation details. We’re using generic language like “When I convert” and “When I ask for a currency list”.
We’ve eliminated the noise from the specs. They are now a fairly concise description of the features.
Most importantly, these specs will work for any kind of application that implements these features. Again, the step definitions will change, but the specs themselves will not.
When you’re writing tests for the outermost layer of your system, continually ask yourself, “Would these specs survive if I significantly changed the underlying implementation of the system?” If the answer is “No”, then try to write more generic specs. Think about “what” the system is going to do, rather than “how” it’s going to do it or how the end user will interact with it.