Getting Testy: History and Mechanics

This post is part of an ongoing series about Unit Testing.

Before I start giving my advice about writing good tests (or specs), I want to talk a little bit about the history of unit testing and some of the terminology you may encounter as you read about testing.

In this series, my focus is on the testing that developers do as part of their day-to-day development cycle. There are a lot of other kinds of testing that need to happen for any application of size (performance testing, security testing, exploratory testing, and many more).

Developer testing is not a new concept, but became much more popular in the last 15 years or so as part of the advent of agile software development, notably the test-driven development (TDD) practice of Extreme Programming (XP).

TDD

TDD in its current form started with Kent Beck’s SUnit testing tool in Smalltalk. He later reimplemented the SUnit ideas in Java as JUnit with Erich Gamma. Implementations in many other languages followed and still exist today. In Ruby, we have Test::Unit which has pretty much been replaced by Minitest. Collectively, they are known as the xUnit family of testing frameworks.

In almost all of the xUnit frameworks, an individual test is a specially named method implemented in a class that inherits from a TestCase class. These classes are collected into test suites that are then run by a test runner. Test case classes can also have methods that perform common setup and cleanup (commonly called tear-down) for all of the tests in the class.

Each test method generally follows the Arrange, Act, Assert pattern:

Arrange: Set up test-specific state; create necessary objects, etc.
Act: Perform the action to be tested.
Assert: Determine whether the action had the desired effect.

xUnit frameworks typically have a number of assertion methods. The basic assert method just checks a boolean condition. Other popular assert methods include assert_equal, assert_includes, and a negative assertion such as deny or refute. It is also possible to write your own assertions that fit better with your application domain. I’ll talk more about this in later posts in the series.

The test runner creates a separate instance of the test case class for each individual test method. On that instance, it calls the setup method, the test method, and then the tear-down method. In this way, each test runs in an isolated environment so that the tests are independent of each other. Many versions of xUnit randomize the order of the tests from run to run in order to keep the tests free from subtle ordering dependencies.

Kent Beck’s Test-Driven Development By Example is the seminal work on the subject and well worth reading.

BDD

Behavior-Driven Development (BDD) is newer than TDD and was introduced by Dan North after he encountered difficulty when trying to teach TDD to new developers.

As the name suggests, in BDD the focus is more on the desired behaviors of the system. Instead of “tests”, we write “specs” of that behavior. BDD was originally intended to facilitate communication between business/product people and the development team. It later evolved to encompass more class or object level testing. In Ruby, Cucumber follows more of the original intent of BDD while RSpec is used for the class/object level of testing. In JavaScript, Jasmine is the tool of choice for this style of testing.

BDD as originally formulated typically has the specification broken up into “given”, “when”, and “then” sections that correspond to the “arrange”, “act”, and “assert” sections described above.

RSpec and Jasmine use a domain-specific language (DSL) to write specifications. Specifications for a class or behavior are grouped inside of a describe block. Specifications that are related to each other are grouped into context blocks. Each individual specification is placed in an it or specify block. All of these methods take strings that can be used for textual descriptions of the behavior. Common setup can be moved into before (beforeEach in Jasmine) blocks, with corresponding after (afterEach) blocks for tear-down.

The RSpec and Jasmine runners work very much like the xUnit test runners. Each specification is run in an isolated environment to minimize interdependencies. The before and after blocks are run before and after each individual spec.

In RSpec and other BDD frameworks, output and object state are verified using “matchers” – special-purpose objects that know how to compare other objects according to some strategy. As with xUnit assertions, it is possible to write your own special-purpose matchers.

For Ruby, a good book to read is The RSpec Book.

TDD vs BDD

I personally don’t see a huge difference between TDD and BDD. Yes, the terminology is different, and that can have an impact on how you think about testing. The use of nested contexts in the BDD tools also makes a difference. Certainly, the word “test” carries some baggage in some environments.

It’s possible to write behavior-driven specs in an xUnit framework, and it’s possible to write TDD-style tests in a BDD framework.

It’s possible to write good tests/specs using either approach, and it’s also possible to write bad tests/specs using either approach.

At the end of the day, the important thing is that we have a cost-effective way of telling whether our software is reliably doing what it needs to do.

The advice I give in this series is largely applicable to both styles of testing.

Test Doubles

When using TDD or BDD, it is sometimes desirable to create simple objects that stand in for real objects from your application. These stand-in objects are collectively known as “test doubles”. This term is an analog of the idea of a stunt double that stands in for an actor in a movie.

Different testing approaches rely on test doubles to different degrees. Even if you don’t use them much, they’re still worth having in your bag of tricks.

There are a number of different kinds of test doubles. Each serves a different purpose. For an exhaustive list, see Martin Fowler’s Mocks Aren’t Stubs article or Gerard Meszaros’ book, xUnit Test Patterns.

The main kinds of test-doubles I’ll talk about in this series are dummies, fakes, stubs, mocks, and spies.

A Dummy is an object with no behavior. It is typically used as a placeholder in parameter lists.
A Fake is an object that has real behavior, but takes some shortcuts that make it unsuitable for use in the production code. An in-memory database is a good example.
A Stub is an object that has no real behavior, but just returns hard-coded values from one or more method calls.
A Mock Object (or Mock, for short) is pre-programmed with one or more “expectations”. Each expectation is a specification of a method to be called along with the properties of the arguments to the method. Once the test is complete, all of the expectations are “verified” to ensure that the expectation has been met.
A Spy records all of the method calls that are made on it, including the arguments. The test can later make assertions about these method calls.

Conclusion

I’ve attempted to outline a bit of the history, terminology, and structure of TDD and BDD in order to set the stage for what follows.

In the next post, I’ll talk about my testing philosophy and how it’s evolved over time.

TDD

BDD

TDD vs BDD

Test Doubles

Conclusion

Comments