Getting Testy: Why?
This post is part of an ongoing series about Unit Testing.
Introduction
So far in this series, I’ve been talking about the kinds of tests I write and how I write them. Implicit in that is the kinds of tests I don’t write. This post talks about some of these tests, and why I don’t write them.
If I could boil this post down to one piece of advice, it would be to keep asking, “Why?” For every assertion you want to write, ask why you care about that assertion. Does that reason have to do with implementation details, or about the essential responsibilities of the code you’re testing? If the former, take a step back and re-evaluate. If the latter, then you’re probably on the right track.
I’ll preface everything in this post by saying that sometimes I do write these kinds of tests. If I’m working on something I’m really not sure about or just trying to get some momentum going, I’ll write tests like these. However, once I get over the hurdle I’ll come back and delete them because they add more long-term cost than value.
Test-Driving Implementation Details
There are a number of TDD approaches I’ve seen that focus on what I’d call “test-driving implementation details.”
Let’s illustrate using the classic example of a Stack class.
Testing the Class of a Newly-Created Object
The first test many people write is to ensure that they can create a
new Stack
, and they do it by checking the class of the created
object.
Why do we care about this? Do we not trust Ruby to properly initialize an object?
This test does force us to create the Stack
class, and to be
sure that we can get access to the class we’re testing from the test
file. So there is some temporary value in it to get the basic hookup
working. But it is definitely not worth keeping this test for long.
Exception: If the object creation I’m doing involves some kind of factory method where the factory can create one of several possible classes, then I might write tests like this to ensure that the factory is working correctly.
Testing That an Object Responds to a Method
The next test many people write is one that forces a desired method to exist.
Again, why do we care about this? What benefit does the existence of
an empty?
method have to our application? What we really care about
is that our Stack
correctly handles the responsibility of detecting
whether or not it is empty. We should test that instead. The
existence of the empty?
method is just a means to that end.
Again, there is some temporary value in being forced to define the
empty?
method. But this test should also be deleted shortly after it is
written.
Exception: If I’m doing some tricky metaprogramming involving
method_missing
/respond_to_missing
or dynamically defined
methods, then I might write and keep this test.
Testing Methods, Not Responsibilities
Many people like to test the individual methods of an object in isolation. This is a really good pattern when defining an API because the tests serve as excellent documentation for the API.
But inside a system, what’s more important are the responsibilities of the objects.
For the Stack
, the only way to test a method like empty?
or push
in isolation is to expose internal implementation details to the
tests.
Why do we care about this? We do care that we can push items onto
our Stack
, so that’s good. But we don’t care about how the Stack
does its job internally. We only care about external
responsibilities. What is different about the behavior of the stack
after pushing an item onto it? Make an assertion about that instead.
Asserting Setup
I’ve sometimes seen tests where there are assertions in setup
methods (or before
blocks in RSpec). Another variant of this is
when a test starts with assertions before performing any actions.
Why does this matter? If it is important that a new stack is empty, then there should be a test that says that; there is no need to keep asserting it over and over for every other test that uses that setup.
Including a lot of assertions about test setup makes the individual tests much more verbose. It’s harder to see the core important idea in each test. If we later change the behavior of the code, we have to go update all of these extra assertions as well, making the change that much more costly.
There is some communication value, however. By making assertions about the state of the world before performing the action being tested, we are communicating something to the reader. I’d much rather communicate this fact in a separate test and not make the redundant assertions over and over again.
I’ve heard the argument that sometimes the separate test is too far away from the current context. I find that this is usually a symptom of a bigger problem. If I keep my objects small and focused, they don’t need so many tests that I can’t find the one I’m looking for. If you’ve got a 7000 test file, then yes, it is hard to find the other test. But an object that needs a 7000 line test file has other issues.
The other case where I’ve seen some value in asserting setup is in legacy code. If the existing tests aren’t written in a clean style, or if the tests make use of complicated fixtures (say in a Rails app), or you’re still not very familiar with what the code does, then it might be worth adding some of these assertions.
Redundant Assertions
I’ve also seen cases where the same assertions are repeated over and over again in each test.
The assertions in the first part of the second test are identical to the first test, so are completely redundant. They just clutter up the second test and make it unclear what we’re really trying to test. Test things once and only once as much as possible.
Scripts
Another style of testing I’ve seen is to write tests that read more like scripts.
This is a nice example of exactly how the stack will work, but it tests all of the responsibilities in one test, and there is a lot of interaction between the parts of the test.
There are also a lot of extra assertions in here. There are four
assertions about top
and five about empty?
. How many of those are
redundant? If we later change the behavior of one of those methods,
how many assertions do we have to fix?
I used to write tests like this. I found that I had trouble understanding all of the subtleties of the tests when I came back to them later.
What’s really missing here is the additional information that would be provided by additional test names. Each section of the test serves a different purpose, and giving each of those purposes a name really helps the reader.
A Better Way
I think these kinds of tests arise from good intentions. One of the core tenets of TDD is, “Never write a line of code without a failing test.” That’s good advice and worth following. However, it is sometimes taken to mean something like, “Write a new test for every line of code you add.” Those are two very different things.
“Never write a line of code without a failing test” != “Write a new test for every line of code you add”
— @randycoulman
For example, I’d start the Stack
tests with something like this:
I’ve written exactly one test at this point, and it’s a good test of the responsibility of the stack. In order to pass the test, I have to write a little bit of code.
Every line of that code was written in response to a failing test.
The first failure is an error that says the class Stack
doesn’t
exist. So I write the class Stack ... end
part. The next failure
is an error that says that Stack
doesn’t implement empty?
. That
forces me to write the def empty? ... end
part. The final failure
says that empty?
didn’t return the expected value. That forces me
to hard-code the true
return value until I can write another test to
make me implement it properly.
So, I didn’t have to write a new test for each line of code I added; I
wrote one good test that caused me to write a few lines of code. I
don’t need to explicitly test that I get a Stack
back and that it
responds to empty?
. If either of those tests would have failed,
then so would this one and the reason for the failure would be clear.
I’m not advocating that you write one test for hundreds or even tens of lines of code. But it is OK to write a few lines of code in response to a single, focused test that is concerned about the responsibilities of the object under test.
As I said above, if I’m in an uncertain situation, I might write individual tests for each line of code, but I won’t keep those tests around for long. I treat them as scaffolding to help me get where I need to go, and then tear down the scaffolding when I’m done.
The tests I described in the first part of this post drive us to write the code we think we need to write. It is better to write tests that force our objects to have the responsibilities we want them to have.
There’s a subtle, but important difference there. By focusing on the responsibilities, the code is given room to flex and evolve over time. As long as it still fulfills its responsibilities, everything is fine. If I focus on the code I want to write, my tests end up strongly coupled to the implementation, making them brittle and hard to change later.
Don't write a test that forces you to write the code you want. Write a test that forces your object to do what you want it to do.
— @randycoulman
Here are the rest of the tests I’d write for a Stack
:
Other Tests I Don’t Write
There are a few other kinds of tests I don’t write.
Accessors
I almost never write tests for simple getters and setters. First of all, I try really hard to not need (public) getters and setters at all. But if I do need them, I don’t test them directly.
Again, it is important to ask why. Why is it important to know that I can access some instance variable of the object under test? Whatever that reason is, that’s what I should be testing. I can use the getter to help me test that responsibility, but I don’t need to test the getter directly.
Counts
I’ve seen tests that check the number of objects in a collection. For example in Rails, a test might create an ActiveRecord object and then assert that there is one record in the table of interest.
This is a fragile test, because there might be other records in the table due to fixtures or other test setup. A slightly improved version of the test would check that there is one more record in the table than there was before. This is slightly better, but not much.
A better option is to actually test what you mean. If I create an ActiveRecord object, then I should check that the table includes the record I want, rather than just checking the count. That makes for a much clearer test.
I once worked on a test suite that performed an action and then verified that there were a certain number of delayed jobs scheduled as a result. A change to other code in the object resulted in there being more jobs scheduled, and all of the tests that asserted on the count broke. We refactored the tests to check that the jobs we were interested in got scheduled.
Private Methods
I almost never test private methods. There are times, especially in legacy code, when there isn’t a good way around it.
I find that a desire to test a private method is almost certainly an indication that my object is trying to do too much. There is some other object that is trying to get out. That other object can then be tested directly.
Conclusion
I hope this post has given a little more background about how I write tests, and also how I don’t write tests, and my reasons for working this way.
The main thing to remember when you’re about to write a test or assertion is to ask “Why?”
If the answer to that question has to do with a core responsibility of the object and the assertion is testing externally visible behavior, then you’re probably on the right track.
If the test or assertion is more about internal implementation details, then it’s time to take a step back and think of a different test to write.