Getting Testy: Why?

This post is part of an ongoing series about Unit Testing.

Introduction

So far in this series, I’ve been talking about the kinds of tests I write and how I write them. Implicit in that is the kinds of tests I don’t write. This post talks about some of these tests, and why I don’t write them.

If I could boil this post down to one piece of advice, it would be to keep asking, “Why?” For every assertion you want to write, ask why you care about that assertion. Does that reason have to do with implementation details, or about the essential responsibilities of the code you’re testing? If the former, take a step back and re-evaluate. If the latter, then you’re probably on the right track.

I’ll preface everything in this post by saying that sometimes I do write these kinds of tests. If I’m working on something I’m really not sure about or just trying to get some momentum going, I’ll write tests like these. However, once I get over the hurdle I’ll come back and delete them because they add more long-term cost than value.

Test-Driving Implementation Details

There are a number of TDD approaches I’ve seen that focus on what I’d call “test-driving implementation details.”

Let’s illustrate using the classic example of a Stack class.

Testing the Class of a Newly-Created Object

The first test many people write is to ensure that they can create a new Stack, and they do it by checking the class of the created object.

Testing Kind-Of

class StackTest < Minitest::Test
  def setup
    @stack = Stack.new
  end
 
  attr_reader :stack
 
  # Please don't do this
  def test_construction
    assert_kind_of(Stack, stack)
  end
end

Why do we care about this? Do we not trust Ruby to properly initialize an object?

This test does force us to create the Stack class, and to be sure that we can get access to the class we’re testing from the test file. So there is some temporary value in it to get the basic hookup working. But it is definitely not worth keeping this test for long.

Exception: If the object creation I’m doing involves some kind of factory method where the factory can create one of several possible classes, then I might write tests like this to ensure that the factory is working correctly.

Testing That an Object Responds to a Method

The next test many people write is one that forces a desired method to exist.

Testing Responds-To

# Please don't do this, either
def test_responds_to_empty?
  assert_respond_to(stack, :empty?)
end

Again, why do we care about this? What benefit does the existence of an empty? method have to our application? What we really care about is that our Stack correctly handles the responsibility of detecting whether or not it is empty. We should test that instead. The existence of the empty? method is just a means to that end.

Again, there is some temporary value in being forced to define the empty? method. But this test should also be deleted shortly after it is written.

Exception: If I’m doing some tricky metaprogramming involving method_missing/respond_to_missing or dynamically defined methods, then I might write and keep this test.

Testing Methods, Not Responsibilities

Many people like to test the individual methods of an object in isolation. This is a really good pattern when defining an API because the tests serve as excellent documentation for the API.

But inside a system, what’s more important are the responsibilities of the objects.

For the Stack, the only way to test a method like empty? or push in isolation is to expose internal implementation details to the tests.

Testing in Isolation

# Don't do this
def test_push
  stack.push(42)
  assert_equal([42], stack.internal_stack)
end

Why do we care about this? We do care that we can push items onto our Stack, so that’s good. But we don’t care about how the Stack does its job internally. We only care about external responsibilities. What is different about the behavior of the stack after pushing an item onto it? Make an assertion about that instead.

Asserting Setup

I’ve sometimes seen tests where there are assertions in setup methods (or before blocks in RSpec). Another variant of this is when a test starts with assertions before performing any actions.

Asserting Setup

def test_push
  assert(stack.empty?)
 
  stack.push(42)
  # ...
end

Why does this matter? If it is important that a new stack is empty, then there should be a test that says that; there is no need to keep asserting it over and over for every other test that uses that setup.

Including a lot of assertions about test setup makes the individual tests much more verbose. It’s harder to see the core important idea in each test. If we later change the behavior of the code, we have to go update all of these extra assertions as well, making the change that much more costly.

There is some communication value, however. By making assertions about the state of the world before performing the action being tested, we are communicating something to the reader. I’d much rather communicate this fact in a separate test and not make the redundant assertions over and over again.

I’ve heard the argument that sometimes the separate test is too far away from the current context. I find that this is usually a symptom of a bigger problem. If I keep my objects small and focused, they don’t need so many tests that I can’t find the one I’m looking for. If you’ve got a 7000 test file, then yes, it is hard to find the other test. But an object that needs a 7000 line test file has other issues.

The other case where I’ve seen some value in asserting setup is in legacy code. If the existing tests aren’t written in a clean style, or if the tests make use of complicated fixtures (say in a Rails app), or you’re still not very familiar with what the code does, then it might be worth adding some of these assertions.

Redundant Assertions

I’ve also seen cases where the same assertions are repeated over and over again in each test.

Redundant Assertions

def test_push_one_item
  stack.push(42)
  refute(stack.empty?)
  assert_equal(42, stack.top)
end
 
def test_push_two_items
  stack.push(42)
  refute(stack.empty?)
  assert_equal(42, stack.top)
 
  stack.push(58)
  refute(stack.empty?)
  assert_equal(58, stack.top)
end

The assertions in the first part of the second test are identical to the first test, so are completely redundant. They just clutter up the second test and make it unclear what we’re really trying to test. Test things once and only once as much as possible.

Scripts

Another style of testing I’ve seen is to write tests that read more like scripts.

Script-y Tests

# Another nope!
def test_pushing_and_popping
  assert(stack.empty?)
 
  stack.push(58)
  assert_equal(58, stack.top)
  refute(stack.empty?)
 
  stack.push(42)
  assert_equal(42, stack.top)
  refute(stack.empty?)
 
  popped = stack.pop
  assert_equal(42, popped)
  assert_equal(58, stack.top)
  refute(stack.empty?)
 
  popped = stack.pop
  assert_equal(58, popped)
  assert_nil(stack.top)
  assert(stack.empty?)
 
  assert_raises(StackEmpty) { stack.pop }
end

This is a nice example of exactly how the stack will work, but it tests all of the responsibilities in one test, and there is a lot of interaction between the parts of the test.

There are also a lot of extra assertions in here. There are four assertions about top and five about empty?. How many of those are redundant? If we later change the behavior of one of those methods, how many assertions do we have to fix?

I used to write tests like this. I found that I had trouble understanding all of the subtleties of the tests when I came back to them later.

What’s really missing here is the additional information that would be provided by additional test names. Each section of the test serves a different purpose, and giving each of those purposes a name really helps the reader.

A Better Way

I think these kinds of tests arise from good intentions. One of the core tenets of TDD is, “Never write a line of code without a failing test.” That’s good advice and worth following. However, it is sometimes taken to mean something like, “Write a new test for every line of code you add.” Those are two very different things.

“Never write a line of code without a failing test” != “Write a new test for every line of code you add”
— @randycoulman

For example, I’d start the Stack tests with something like this:

Initial Stack Tests

class StackTest < Minitest::Test
  def test_initially_empty
    stack = Stack.new
    assert(stack.empty?)
  end
end

I’ve written exactly one test at this point, and it’s a good test of the responsibility of the stack. In order to pass the test, I have to write a little bit of code.

Starting the Stack Implementation

class Stack
  def empty?
    true
  end
end

Every line of that code was written in response to a failing test. The first failure is an error that says the class Stack doesn’t exist. So I write the class Stack ... end part. The next failure is an error that says that Stack doesn’t implement empty?. That forces me to write the def empty? ... end part. The final failure says that empty? didn’t return the expected value. That forces me to hard-code the true return value until I can write another test to make me implement it properly.

So, I didn’t have to write a new test for each line of code I added; I wrote one good test that caused me to write a few lines of code. I don’t need to explicitly test that I get a Stack back and that it responds to empty?. If either of those tests would have failed, then so would this one and the reason for the failure would be clear.

I’m not advocating that you write one test for hundreds or even tens of lines of code. But it is OK to write a few lines of code in response to a single, focused test that is concerned about the responsibilities of the object under test.

As I said above, if I’m in an uncertain situation, I might write individual tests for each line of code, but I won’t keep those tests around for long. I treat them as scaffolding to help me get where I need to go, and then tear down the scaffolding when I’m done.

The tests I described in the first part of this post drive us to write the code we think we need to write. It is better to write tests that force our objects to have the responsibilities we want them to have.

There’s a subtle, but important difference there. By focusing on the responsibilities, the code is given room to flex and evolve over time. As long as it still fulfills its responsibilities, everything is fine. If I focus on the code I want to write, my tests end up strongly coupled to the implementation, making them brittle and hard to change later.

Don't write a test that forces you to write the code you want. Write a test that forces your object to do what you want it to do.
— @randycoulman

Here are the rest of the tests I’d write for a Stack:

Better Stack Tests

class StackTest < Minitest::Test
  def setup
    @stack = Stack.new
  end
 
  attr_reader :stack
 
  def test_initially_empty
    assert(stack.empty?)
  end
 
  def test_not_empty_after_pushing
    stack.push(42)
    refute(stack.empty?)
  end
 
  def test_last_pushed_item_is_on_top
    stack.push(58)
    stack.push(42)
 
    assert_equal(42, stack.top)
  end
 
  def test_top_is_nil_for_empty_stack
    assert_nil(stack.top)
  end
 
  def test_pop_returns_last_pushed_item
    stack.push(58)
    stack.push(42)
 
    assert_equal(42, stack.pop)
  end
 
  def test_pop_removes_popped_item_from_stack
    stack.push(58)
    stack.push(42)
    stack.pop
 
    assert_equal(58, stack.top)
  end
 
  def test_popping_from_empty_stack_raises_error
    assert_raises(Stack::Empty) { stack.pop }
  end
end

Other Tests I Don’t Write

There are a few other kinds of tests I don’t write.

Accessors

I almost never write tests for simple getters and setters. First of all, I try really hard to not need (public) getters and setters at all. But if I do need them, I don’t test them directly.

Again, it is important to ask why. Why is it important to know that I can access some instance variable of the object under test? Whatever that reason is, that’s what I should be testing. I can use the getter to help me test that responsibility, but I don’t need to test the getter directly.

Counts

I’ve seen tests that check the number of objects in a collection. For example in Rails, a test might create an ActiveRecord object and then assert that there is one record in the table of interest.

This is a fragile test, because there might be other records in the table due to fixtures or other test setup. A slightly improved version of the test would check that there is one more record in the table than there was before. This is slightly better, but not much.

A better option is to actually test what you mean. If I create an ActiveRecord object, then I should check that the table includes the record I want, rather than just checking the count. That makes for a much clearer test.

I once worked on a test suite that performed an action and then verified that there were a certain number of delayed jobs scheduled as a result. A change to other code in the object resulted in there being more jobs scheduled, and all of the tests that asserted on the count broke. We refactored the tests to check that the jobs we were interested in got scheduled.

Private Methods

I almost never test private methods. There are times, especially in legacy code, when there isn’t a good way around it.

I find that a desire to test a private method is almost certainly an indication that my object is trying to do too much. There is some other object that is trying to get out. That other object can then be tested directly.

Conclusion

I hope this post has given a little more background about how I write tests, and also how I don’t write tests, and my reasons for working this way.

The main thing to remember when you’re about to write a test or assertion is to ask “Why?”

If the answer to that question has to do with a core responsibility of the object and the assertion is testing externally visible behavior, then you’re probably on the right track.

If the test or assertion is more about internal implementation details, then it’s time to take a step back and think of a different test to write.

Introduction

Test-Driving Implementation Details

Testing the Class of a Newly-Created Object

Testing That an Object Responds to a Method

Testing Methods, Not Responsibilities

Asserting Setup

Redundant Assertions

Scripts

A Better Way

Other Tests I Don’t Write

Accessors

Counts

Private Methods

Conclusion

Comments