Getting Testy: Anti-Patterns

This post is part of an ongoing series about Unit Testing.

Introduction

As we near the end of this series on testing, I want to talk about some general testing anti-patterns. An anti-pattern is a pattern that describes a problem and an ineffective solution to that problem. A good anti-pattern description also describes alternative patterns that avoid the problems introduced by the bad solution.

FIRST

I was recently reminded of the FIRST properties of good unit tests, originally described by Tim Ottinger and Brett Schuchert. A good unit test should be:

Fast
Isolated
Repeatable
Self-verifying
Timely

The acronym itself also reminds us when to write the tests: first.

These properties of tests are the antidotes to the anti-patterns that follow.

Slow Tests

Slow tests cause all kinds of problems with the TDD workflow. The slower our tests, the less often we’ll run them. We’ll make more pervasive changes between test runs, making it harder to find problems when the tests fail and harder to get the immediate feedback we need to continuously improve our design.

Our slow tests will become an obstacle to refactoring, because we won’t want to wait for the tests to run yet again after making a small but beneficial change.

If we’re using a hosted Continuous Integration (CI) service and our tests our slow, we’ll share our changes less frequently, because we don’t want to clog up the CI server. It’ll take longer to get changes into production while they pile up waiting for the CI server to run.

I’ve come onto several projects recently where it wasn’t even possible to run all of the unit tests on my development workstation in a timely manner. This is a bad situation. Now, I can’t even build my project without the help of a heavily-parallelized CI server. I can’t tell if I’ve broken something locally before pushing to CI.

The problem with slow tests is that they don’t start out that way. The test time slowly creeps up when we’re not paying attention. Unfortunately, by the time the tests get slow enough to notice, it takes some effort to get them running fast again.

The F in FIRST is for Fast tests.

Do whatever you can to keep your test suite fast. Decouple code from the database, the filesystem, the network, and other external dependencies. Adopt design and architecture approaches that allow your tests to stay fast. Set a hard limit on test-run time. Every time you approach the limit, put in a concerted effort to get the run time down again. Keep track of your top-ten slowest tests and see what you can do to speed those up.

If you’ve never worked with a fast test suite before, it’s hard to imagine how many micro-decisions you make that are negatively affected by the time it takes to run your test suite.

Ordering Dependencies

Often, we’ll write tests that depend on previous tests having been run. We might run a test that creates some objects and another that works with those objects.

Dependent tests are hard to understand at a glance, and are also brittle. If I change one test, it might break some setup that another test needs. These failures are hard to track down.

The I in FIRST is for Independent tests.

Each test should be independent of all other tests. It should set up what it needs, run the test, and verify the results.

Many test runners have started running test cases in random order to help with this problem. Minitest has the minitest-bisect tool that helps you track down ordering dependencies by finding a minimally-reproducible set of tests that depend on each other. RSpec recently added this same feature as well.

Flickering Failures

Sometimes we write tests that occasionally fail even when nothing else has changed. This is pretty easy to do, especially when the tests are run in multiple environments.

I’ve introduced a number of these flickering failures in the past. Here are some examples:

Date/time dependencies
Timezone dependencies
Tests that depend on timing of operations. This can be especially easy in a garbage-collected language when a GC cycle occurs at just the wrong time.
Tests that depend on the order that files will come back from the filesystem
Tests that rely on some external service that may be down

The R in FIRST stands for Repeatable tests.

Every test should either pass all the time, or fail all the time. If you find a flickering failure in your suite, dig into it and figure out what’s going on. Then try to come up with a better way to write the test to avoid the flicker.

Manual Verification

Sometimes, we’ll write tests that require a human to verify the results. This is common with graphical or other visual output, since it can be difficult to verify this with a program.

The worst instance of this technique is when the manual verification has to happen on each test run.

An improvement to the technique is known as Golden Master Testing or Guru Checks Output. This involves capturing the known-good test result that was manually verified. New results are automatically compared to the known-good result. As long as nothing has changed, the test is fine.

Every time the expected output needs to change, the manual verification must be re-performed.

The biggest problem with manual verification is that it requires a human in the loop. This slows down the process, and is also potentially error-prone. What if the only human available is not as experienced at looking at the output and misses something critical?

What if we need to rush a hot-fix into production? That’s the time when any human involved is likely to be the most stressed and rushed. That’s not when we’re likely to do our best at looking carefully at changed output.

The S in FIRST stands for Self-Verifying.

We want our tests to verify their own results automatically. This allows the tests to run faster and un-supervised. It keeps the busy, stressed humans out of the loop.

For output that is difficult to verify, try to identify some properties of the good output that can be checked automatically. Isolate the tests from parts of the output that change frequently (timestamps are a notorious problem here).

Writing Tests After the Code

Sometimes we’re sold on the idea of writing tests, but we can’t see how to write them first. So instead, we write the code and then we write tests for this code.

This can be effective, but it takes away the “driving” part of TDD. The tests we write are the first client of the code we’re testing. By writing the test first, we are designing the API of our object. When we write the test after, we don’t get that design benefit.

The T in FIRST stands for Timely.

Strive to write tests first, ideally one test at a time. Write the test in terms of the external API and externally-visible behavior of the code you’re testing. Then write the code, just enough to make the test pass. Repeat the process, refactoring as you go.

TDD is as much (or more) a design technique as it is a testing technique.

Conclusion

These are just a few of the anti-patterns I’ve seen in test suites and some ideas for better solutions.

In my next post in the series, I’ll talk about testing legacy code.