If you’ve taken a look at JUnit 5, you might have seen that it introduced a different way of writing tests: “nested tests”.

According to the official docs:

@Nested tests give the test writer more capabilities to express the relationship among several groups of tests.

I’ve read mostly favourable impressions of them (see here and here), but I wanted to share my experiences and highlight some painpoints (and some positives!).

So let’s take a look their example which contains tests for java.util.Stack.

New structure, new thinking

Immediately we can see that the structure of the overall class is quite different from the traditional “flat tests”. The sequence of arranging steps (defining a test’s initial state), actions and assertions are more deeply woven into the class. Writing tests becomes an exercise in expressing that. That’s not a critique, but an indication that you need to change the way you do things.

Readability

Understanding the intention and definition of tests is important for maintaining your code. If the tests are said to be the documentation of the class, then that documentation should be easy to read and update.

One of the most challenging aspects of reading individual nested tests is that they don’t exist as a single, cohesive entity, but rather as a series of steps defined throughout the nested classes structure.

In their example, to understand the test asserted in returnElementWhenPopped, we need to:

read returnElementWhenPopped to understand the assertions
search for and read pushAnElement to understand its initialisation and precondition.
search for and read createNewStack to understand its initialisation
and finally go to TestingAStackDemo to check what initialisation it does.

I’ve found that composing a mental model of an individual nested test to be more difficult than with the denser, self-contained structure associated with flat tests.

As code editors don’t understand the nested test structure, there’s no easy way to navigate around the steps. You have to do the navigation leg work yourself and unravel this structure each time. Slightly painful in this example, even more painful with a complex and large suite. Imagine the mess if your nested tests line wrap because of the amount of nesting. Perhaps by that point, your class under test is more likely a bigger problem!

Changeability

Beyond readability, the nesting creates stumbling blocks for refactoring; as the tests’ code is more intertwined, making modifications becomes more involved. It can feel a bit like pivoting around an unwieldy object hierachy at times. The process is usually more manageable with flat tests; whose code structure is more isolated. Flat tests will have more repetition, but I found that more predictable to deal with than the nested structure.

Replacing one boilerplate with another

Perhaps my biggest lament, is that the intent of the tests becomes obscured by the necessary Java language boilerplate. By using raw classes and methods for structure and with little abstraction, it can feel like a convoluted DSL dictated by the Java language.

Perhaps a clever abstraction over this could help improve this? There are other solutions, such as Spock which (which I haven’t used, admittedly) seem to have done a clearer job of providing a more declarative test structure.

Another observation is that the more you use the key feature: nesting, the harder your tests can become to work with. It’s notable that their example only has two levels of nesting - perhaps an indictation that it should be used sparingly.

Worse or just different?

“Am I so out of touch. No. It’s the children who are wrong”.

I have to ask myself the question whether I just found nested tests difficult to work with because they are different. Testing is full of conventions, best practice advice and pedantry, so if something new comes along that challenges our long-held beliefs, then at times it can be hard to adjust.

Nested tests open up a new range of questions about the best practices for structuring and naming them. Resolving those questions requires time and because nested tests aren’t as widely used, there’s also less consensus available. At the time of writing, there was also no samples in the JUnit 5 repo. Of course, that argument could be used to shoot down any new library, framework or language. What matters more is whether your pioneering leads to a net gain in the quality of your tests; and I’m not convinced that it will.

Some positives

You might have got the impression that I don’t think that there are any upsides to using nested tests. Not true!

Here are some positives that I encountered:

Reduces naming repetition/tedium and improved consistency and cohesion between the arrangement and the test names.
Reduces arrangement and assertion repetition.
Enforces grouping of related tests: organising your test’s functional divisions.
A structured test report.
Combining flat and nested tests is supported.

Conclusions and caveats

I wrote this post over a series of months, during which time my views mellowed somewhat as I got over my initial outrage and came to see the shades of grey. I’ve recognised that at least some of my frustrations were the result of struggling to adjust from the flat test model. Liken it to changing between procedural and functional programming models.

Yet I think it’s fair to say that nested tests shouldn’t be viewed as the “latest and greatest”. They won’t improve all your test cases; don’t blindly change to start using them without considering the cost. They are more an alternative than an upgrade and you might find that nested tests add a cognitive overhead disproportionate to their benefit, especially early on.

Some of the pain could be helped by better code editor support. Being able to see these odd collection of classes as nested tests could provide features to improve readability and navigation. Add to that wishlist, the ability to improve visual isolation (“flattening”?) of individual test cases or even alternative visualisations of the test model.

Ultimately the real strengths and difficulties of working with nested tests lies in defining the test model. This takes effort and can be cumbersome, however, it can prove helpful in reducing useless or duplicated tests as you’re more inclined to consider how to test the relationship between states, actions and their outcomes, rather than quickly churning out another test.

It’s been a few weeks since I last used nested tests and now I find myself fence-sitting. Theoretically, they offer useful features, but often the practicalities detract from that. It might be worth me trying to push through the pain barrier again and see where it takes me. Would I definitively recommend using them though? Probably not.

Thanks for reading!

This post is Creative Commons Attribution 4.0 International (CC BY 4.0) licensed.