How slow is too slow for unit tests?

How slow is too slow for unit tests? - unit-testing

Michael Feathers, in Working Effectively With Legacy Code, on pages 13-14 mentions:
A unit test that takes 1/10th of a
second to run is a slow unit test...
If [unit tests] don't run fast, they
aren't unit tests.
I can understand why 1/10th a second is too slow if one has 30,000 tests, as it would take close to an hour to run. However, does this mean 1/11th of a second is any better? No, not really (as it's only 5 minutes faster). So a hard fast rule probably isn't perfect.
Thus when considering how slow is too slow for a unit tests, perhaps I should rephrase the question. How long is too long for a developer to wait for the unit test suite to complete?
To give an example of test speeds. Take a look at several MSTest unit test duration timings:
0.2637638 seconds
0.0589954
0.0272193
0.0209824
0.0199389
0.0088322
0.0033815
0.0028137
0.0027601
0.0008775
0.0008171
0.0007351
0.0007147
0.0005898
0.0004937
0.0004624
0.00045
0.0004397
0.0004385
0.0004376
0.0003329
The average for all 21 of these unit tests comes to 0.019785 seconds. Note the slowest test is due to it using Microsoft Moles to mock/isolate the file system.
So with this example, if my unit test suite grows to 10,000 tests, it could take over 3 minutes to run.

I've looked at one such project where the number of unit tests made the system take too long to test everything. "Too long" meaning that you basically didn't do that as part of your normal development routine.
However, what they had done was to categorize the unit tests into two parts. Critical tests, and "everything else".
Critical tests took just a few seconds to run, and tested only the most critical parts of the system, where "critical" here meant "if something is wrong here, everything is going to be wrong".
Tests that made the entire run take too long was relegated to the "everything else" section, and was only run on the build server.
Whenever someone committed code to the source control repository, the critical tests would again run first, and then a "full run" was scheduled a few minutes into the future. If nobody checked in code during that interval, the full tests was run. Granted, they didn't take 30 minutes, more like 8-10.
This was done using TeamCity, so even if one build agent was busy with the full unit test suit, the other build agents could still pick up normal commits and run the critical unit tests as often as needed.

I've only ever worked on projects where the test suite took at least ten minutes to run. The bigger ones, it was more like hours. And we sucked it up and waited, because they were pretty much guaranteed to find at least one problem in anything you threw at them. The projects were that big and hairy.
I wanna know what these projects are that can be tested comprehensively in seconds.
(The secret to getting things done when your project's unit tests take hours is to have four or five things you're working on at the same time. You throw one set of patches at the test suite and you task-switch, and by the time you're done with the thing you switched to, maybe your results have come back.)

I've got unit tests that takes a few seconds to execute. I've got a method which does very complicated computing and billions and billions of operations. There are a few know good values that we use as the basis for unit testing when we refactor this tricky and uber-fast method (which we must optimize the crap out of it because, as I said, it is performing billions and billions of computations).
Rules don't adapt to every domain / problem space.
We can't "divide" this method into smaller methods that we could unit test: it is a tiny but very complicated method (making use of insanely huge precomputed tables that can't be re-created fast enough on the fly etc.).
We have unit tests for that method. They are unit tests. They takes seconds to execute. It is a Good Thing [TM].
Now of course I don't dispute that you use unit testing libraries like JUnit for things that aren't unit testing: for example we also use JUnit to test complex multi-threaded scenario. These ones aren't "unit test" but you bet that JUnit still rules the day :)

EDIT See my comment to another answer (Link). Please note that there was a lot of back and forth about Unit Testing so before you decide to upvote or downvote this answer, please read all the comments on that answer.
Next, use a tool like Might-Moose (Mighty-Moose was abandoned but there are other tools) that only runs the tests affected by your code change (instead of your entire test library) every time you check-in a file.

So what's your question? :-) I agree, the true metric here is how long developers have to wait for a complete run of the unit tests. Too long and they'll start cutting corners before committing code. I'd like to see a complete commit build take less than a minute or two, but that's not always possible. At my work, a commit build used to take 8 minutes and people just started only running small parts of it before committing - so we bought more powerful machines :-)

How long is too long for a developer to wait for the unit test suite to complete?
It really depends how long the devs are happy to wait for feedback of their change. I'd say if you start talking minutes than it's too slow and you should probably break up the test suite into individual test projects and run them separately.

Related

Smart unit tests choosing after commit

I'm looking for a tool that can solve following problem:
Our complete unit tests suite takes hours to complete. So when programmer commits code, he gets tests result after few hours. What we would like to achieve is to shorten time for finding simple bugs. This could be done by smart selection of few unit tests which would be run just before/right after the commit. Of course we don't want to randomly pick this unit test - we want unit tests that will more likely find a bug.
Idea to solve this problem without additional software:
Make code coverage for each single unit test. Knowing which files are "touched" by which unit test, we can pick this unit test if user changed any of this files.
This solution has obvious disadvantage - we have to manually store and update list of covered files for each unit test.
I wonder, if there is any tool that helps selecting tests to run?
Project uses C++ and works under Linux.

In Working Effectively with Legacy Code, Michael Feathers writes that a unit test that takes 10ms is a slow unit test. You must root out the slow tests. Do not hack up subset runners based on coverage guesses that will eventually be wrong and bite you.
Keep in mind the distinction between unit tests and integration tests. Unit tests do not touch the filesystem, talk over the network, or communicate with a database: those are integration tests. Yes, integration tests are often easier to write, but that is a strong indication that your software could be factored better—and as a happy coincidence, easier to test.
My suspicion is your integration tests are the ones taking so long. Move those to a separate suite that runs less frequently than on every checkin, say nightly.

Nightly Builds vs. Continuous Integration: Long-Running Automated Tests

We have the "problem" of a large automated suite of integration tests. While our build times are reasonable (< 1 hour), the tests typically take > 6 hours to complete.
While it's great to have this large chunk of functionality tested in our build runs, it obviously is a barrier to implementing CI, which I've found to be a very helpful for keeping source trees in a "always buildable" state.
I've reviewed threads of discussion like this one, which elaborate on the distinctions.
This leads me to a few questions:
Does CI dictate or recommend Unit vs. Integration testing automation? I've heard Unit-only in the past, but am not finding any such statements (or rationale) for this in a quick search.
What is a good "best practice" for combined build + automated test times/ratios to have effective CI for a team? My gut tells me that this should be < 2 hours as a worst case, and probably < 1 hour to be really effective. In theory, we could break up the tests to run in parallel and probably get them running in under 2 hours, but this would still be a 3 hour run.
What's the best way forward from long-running Nightly Builds + Integration Tests to CI? I'm thinking of a CI build with a few skeletal Unit Tests only, in combination with nightly builds that continue with the integration tests.
Any tooling recommendations are also welcome (Windows-only C#/C++ codebase)

For most projects, however, the XP
guideline of a ten minute build is
perfectly within reason. Most of our
modern projects achieve this. It's
worth putting in concentrated effort
to make it happen, because every
minute you reduce off the build time
is a minute saved for each developer
every time they commit. Since CI
demands frequent commits, this adds up
to a lot of time.
Source:
http://martinfowler.com/articles/continuousIntegration.html#KeepTheBuildFast
Why does it takes 6 hours? How many tests do you have? What are the ratio of the unit-test compared to integrated ones? You probrably have many more integrated tests or your unit-test are not really unit. Are your unit tests touching the DB? This may be the problem.
6 hours is a long long time. The article above has some tips.

There are a few things here.
In general you will have a number of builds, one that compiles & runs unit tests, one that does that and runs local acceptance tests, and one that runs integration tests.
You definately don't need a single build that does everything.
Your build times to me sound pretty long - remember that the point here is to give quick feedback that something has gone awry. I don't know much about your project - but i would think that you should look to get your compile and unit test build down to under two to three minutes. This is perfectly achievable, in all but very large projects, so if your unit tests take along time, then its time to start asking why.
6 hours is also a very long time. are you sure that your tests are testing the right stuff? do you have too many wide scope tests? are you using "sleep()" everywhere to makeup for the fact that you haven't modeled asynchrony well in your test code?
You should probably get hold of Jez Humbles book "Continuous Delivery", and take a look at Growing Object Oriented Software as how to write unit / integration tests. GOOS uses Java as an implementation language, but all the concepts are the same.

Is it fair to say that high code coverage and high performance not needed for a well designed unit test?

In designing unit tests, from what I've read you should try to stick to these principles:
Isolate your tests from each other
Test only a single behavior at a time
Make sure the test is repeatable
On the other hand, these features do not always seem to correlate with a good test:
High code coverage
High performance
Is this a fair approach?

Performance is generally not a concern with unit tests. High code coverage is. You write individual tests to narrowly test a single function. You write enough (narrow) unit tests to cover most of your functions.

I would say that individual unit tests more than likely are not going to cover a big piece of code, as that defeats the purpose, so yes, I would agree with your first point.
Now, as for High Performance, this is not a "necessity" I guess, however, when developing a system with hundreds of tests, you want to be doing them as efficient as possible so that you can quickly execute your tests.

High code coverage is more an indication of how widely you've tested your code base, ie if you've tested all the code paths possible in your system etc. They give no indication of the quality of individual tests, but is one metric to measure the quality of your unit tests as a whole.
As for high performance, you need to categorize your tests, separating out the tests that touch your database or other high latency services. Keep your performance tests in a separate category as well. Also make sure you keep an eye out for integration (or end to end tests), for instance a test that opens up a web browser, posts and then verifies the response. If you do this, you don't really need to worry about the performance of your tests.

Coverage is not a (meaningful) property of individual tests. As you say, one test should cover only a single behavior. Performance, although it is (generally) only significant in the aggregate, is very important, and here's why.
We run tests for lots of reasons. The fewer barriers there are to writing and running them, the more tests we'll have - and the better our code coverage will be. If we don't run tests because it's too much trouble - we don't run tests; we don't discover our bugs; our productivity suffers. So they should be automated (zero trouble). If we don't run tests because it takes time away from coding - or because the time they take distracts us from the problem, jars our concentration: we don't run tests; we don't discover our bugs; our productivity suffers. So they should be fast enough that running them is not a distraction. If our tests hit the file system or the database much, that will slow them down and we'll run them less. So avoid that; abstract away the slow bits. Test early and often. Test lots. If they're slow, you won't. So keep them fast.

Each unit test should concentrate on testing a single behaviour, so the code coverage of one unit test should ideally be very small. Then when you have hundreds and thousands of such very focused tests, their total coverage should be high.
Performance both is not and is important.
Performance is not important in the sense that micro optimizations should not be done. You should first of all concentrate on the readability of the tests. For example, the book Clean Code has an example of tests that verified the state of a temperature alarm. Originally each test had some five asserts checking booleans such as assertTrue(hw.heaterState()), but the asserts were then refactored to one string comparison assertEquals("HBchL", hw.getState()) where uppercase means enabled and lowercase means disabled. The latter code has lower performance because of creating some additional strings and comparing them, but its readability is much better, so it is better test code.
Performance is important in the sense that running all unit tests should be fast, hundreds or thousands of tests per second (I prefer less than 1 ms per test in average). You should be able to run all your unit tests in a matter of seconds. If the tests take so long to run that you hesitate to run them after making a small change, and instead you run them only when going to take more coffee, then they take too long time. If the test suite is slow, you should break and mock dependencies to other components, so that the system under test will be as small as possible. In particular, the unit tests should not use a database, because that will make them hopelessly slow.
In addition to unit tests, there is also need for integration/acceptance tests that test the system as a whole. They have a different role in the development as unit tests, so it is acceptable for the acceptance test suite to be slow (no pun intended). They should be run by the continuous integration server at least once a day.

Yes I think it's fair to say that. You can always get high code coverage by creating many different tests.

Upon rereading the question:
On the first point, a suite of unit tests should have high code coverage but not necessarily focus on performance. A single unit test should be rather small in terms of how much code it covers. A cell doesn't cover your body, but a group of them forms skin that does cover most of the human body would be a biological metaphor if you want one.

Both performance & coverage are the goals of the entire test suite. Each individual test should not try to "grab" as much coverage as posdible nor should be concerned with performance.
We do want the whole test suite to run in a resonable time and should cover most of the code functionality but not at the price of writing "ba" unit tests.

High code coverage:
For an individual unit test, it does not have to cover 100% of the method or methods under test. Eventually, the suite of unit tests need to cover as much of the method or methods under test as you feel is important. Could be 1%, could be 100%, and it is probably different for different sections of code.
I generally try for 70%-80% coverage, and then pay more attention to the trend of coverage than to the actual value.
High performance:
More important than the speed of an individual unit test is the time it takes to run all the tests in the suite. If it takes too long, developers may decide not to run them as often. But you often can't afford to spend a lot of time optimizing, especially when it comes to unit tests. You need the entire suite to run fast enough.
It isn't uncommon for a system I'm on to have hundreds of unit tests, and still run in 2 minutes. One really slow test, or a bunch of slightly slow tests, can really slow down the build-test-refactor cycle.

What is the ideal timeframe for testing code

I'm currently working on a critical monthly Accounts payable report. This report i just wrote will be the basis for my client in which how much is he going to pay to his vendors. So far i have spent about 5 hrs building automated tests and found zero errors so far. Do you think i am already spending too much time testing? What should be the ideal time-frame for testing?

There is no "exact" amount of numbers need to be spent on writing test codes after writing the function specification. Since you have created several test cases as well as encountering minimum / zero errors so far, i would say you're in the right track.
Those test cases will act as "safety net" when you add more codes or refactor existing codes.

I do remember seeing some statistics mapping percentage of time spent testing for different projects. It broadly varied from about 30% to 50% of the total development time with smaller projects taking a smaller percentage. This is consistent with my experience as well.
Regards

Ideally you should spend as little time as possible writing tests, which means your code should be testable using simple and straight forward unit tests, and you achieve that by trying to reduce Cyclomatic complexity of your code.
So try to concentrate on writing code that requires as few unit tests as possible, simple clean cone with low Cyclomatic complexity.
There isn't a standard time ration of how much time should be spent writing tests or code, the only measure is test coverage, if your code is simple your unit tests will also be simple and require less of them therefore less time spent writing unit tests will be spent.

Before it becomes non-trivial to start testing your code.
I usually start testing a feature as soon as I add it.

I try not to test my code with anything other than automated tests. When I'm building my feature, instead of trying it out by hand as I build it, I try it out with tests. That way it's no more work than you'd have done anyway, and you have the tests afterward.
After that I add tests as I discover bugs, or occasionally to cover bugs that I think could be added by a careless maintainer. The idea is for testing to help you, not get in your way!

There is no fixed amount of time to spend on testing. It's a little like asking "How long is appropriate to write any feature?" It really depends on how complex the item being written is and on how wide the surface area is. The more the user can do with your tool, the more you should test it.
Testing of anything end-user facing should be of two sorts:
1) Automated testing. Regression tests, unit tests, etc.
2) Manual tests. Wait until you are mostly done, then try to hit all the corners as the user would. You won't cover everything in your automated tests and might not notice side effects there so you need a human eye on it before shipping.
Rather than deciding how much time to spend testing, decide what you think needs to be tested and spend whatever time that takes.

How is unit testing better than just testing the entire output of your application as a whole?

I don't understand how an unit test could possibly benefit.
Isn't it sufficient for a tester to test the entire output as a whole rather than doing unit tests?
Thanks.

What you are describing is integration testing. What integration testing will not tell you is which piece of your massive application is not working correctly when your output is no longer correct.
The advantage to unit testing is that you can write a test for each business assumption or algorithm step that you need your program to perform. When someone adds or changes code to your application, you immediately know exactly which step, which piece, and maybe even which line of code is broken when a bug is introduced. The time savings on maintenence for that reason alone makes it worthwhile, but there is an even bigger advantage in that regression bugs cannot be introduced (assuming your tests are running automatically when you build your software). If you fix a bug, and then write a test specifically to catch that bug in the future, there is no way someone could accidentally introduce it again.
The combination of integration testing and unit testing can let you sleep much easier at night, especially when you've checked in a big piece of code that day.

The earlier you catch bugs, the cheaper they are to fix. A bug found during unit testing by the coder is pretty cheap (just fix the darn thing).
A bug found during system or integration testing costs more, since you have to fix it and restart the test cycle.
A bug found by your customer will cost a lot: recoding, retesting, repackaging and so forth. It may also leave a painful boot print on your derriere when you inform management that you didn't catch it during unit testing because you didn't do any, thinking that the system testers would find all the problems :-)
How much money would it cost GM to recall 10,000 cars because the catalytic converter didn't work properly?
Now think of how much it would cost them if they discovered that immediately after those converters were delivered to them, but before they were put into those 10,000 cars.
I think you'll find the latter option to be quite a bit cheaper.
That's one reason why test driven development and continuous integration are (sometimes) a good thing - testing is done all the time.
In addition, unit tests don't check that the program works as a whole, just that each little bit performs as expected. That's often quite a lot more than higher level tests would check.

From my experience:
Integration and functional testing tend to be more indicative of the overall quality of the system, than unit test suit is.
High level testing (functional, acceptance) is a QA tool.
Unit testing is a development tool. Especially in a TDD context, where unit test becomes more of a design implement, rather than that of a quality assurance.
As a result of better design, quality of the entire system improves (indirectly).
Passing unit test suite is meant to ensure that a single component conforms to the developer's intentions (correctness). Acceptance test is the level that covers validity of the system (i.e. system does what user want it to do).
Summary:
Unit test is meant as a development tool first, QA tool second.
Acceptance test is meant as a QA tool.

There is still a need for a certain level of manual testing to be performed but unit testing is used to decrease the number of defects that make it to that stage. Unit testing tests the smallest parts of the system and if they all work the chances of the application as a whole working correctly are increased significantly.
It also assists when adding new features since regression testing can be performed quickly and automatically.

For a complex enough application, testing the entire output as a whole may not cover enough different possibilities. For example, any given application has a huge number of different code paths that can be followed depending on input. In typical testing, there may be many parts of your code that are simply never encountered, because they are only used in certain circumstances, so you can't be sure that any code that isn't run in your test situation, actually works. Also, errors in one section of code may be masked a majority of the time by something else in another section of code, so you may never discover some errors.
It is better to test each function or class separately. That way, the test is easier to write, because you are only testing a certain small section of the code. It's also easier to cover every possible code path when testing, and if you test each small part separately then you can detect errors even when those errors would often be masked by other parts of your code when run in your application.

Do yourself a favor and try out unit testing first. I was quite the skeptic myself until I realized just how darned helpful/powerful unit-tests can be. If you think about it, they aren't really there to add to your workload. They are there to provide you with peace of mind and allow you to continue extending your application while ensuring that your code is solid. You get immediate feedback as to when you may have broke something and this is something of extraordinary value.
To your question regarding why to test small sections of code consider this: Suppose your giant app uses a cool XOR encryption scheme that you wrote and eventually product management changes the requirements of how you generate these encrypted strings. So you say: "Heck, I wrote the the encryption routine so I'll go ahead and make the change. It'll take me 15 minutes and we'll all go home and have a party." Well, perhaps you introduced a bug during this process. But wait!!! Your handy dandy TestXOREncryption() test method immediately tells you that the expected output did not match the input. Bingo, this is why you broke down your unit tests ahead of time into small "units" to test for because in your big giant application you would not have figured this out nearly as fast.
Also, once you get into the frame of mind of regularly writing unit tests you'll realize that although you pay an upfront cost in the beginning in terms of time, you'll get that back 10 fold later in the development cycle when you can quickly identify areas in your code that have introduced problems.
There is no magic bullet with unit tests because your ability to identify problems is only as good as the tests you write. It boils down to delivering a better product and relieving yourself of stress and headaches. =)

Agree with most of the answers. Let's drill down on the topic of speed. Here are some real numbers:
Unit test results in 1 or 2 minutes from a
fresh compile. As true unit tests
(no interaction with external
systems like dbs) they can cover a
lot of logic really fast.
Automated functional test results in 1 or 2 hours. These run on a simplified platform, but sometimes cover multiple systems and the database - which really kills the speed.
Automated integration test results once a day. These exercise the full meal deal, but are so heavy and slow, we can only execute them once a day and it takes a few hours.
Manual regression results come in after a few weeks. We get stuff over to testers a few times a day, but your change isn't realistically regressed for week or two at best.
I want to find out what I broke in 1 or 2 minutes, not a few weeks, not even a few hours. That's where the 10fold ROI on unit tests that people talk about comes from.

This is a tough question to approach because it questions something of such enormous breadth. Here's my short answer, however:
Test Driven Development (or TDD) seeks to prove that every logical unit of an application (or block of code) functions exactly as it should. By making tests as automated as possible for productivity's sake, how could this really be harmful?
By testing every logical piece of code, you can trust the usage of the code up some hierarchy. Say I build an application that relies on a thread-safe stack implementation. Shouldn't the stack be guaranteed to work up at every stage before I build on it?
The key is that if something in the whole application breaks, meaning just looking at the total output/outcome, how do you know where it came from? Well, debugging, of course! Which puts you back where you started. TDD allows you to -hopefully- bypass this most painful stage in development.

Testers generally test end to end functionality. Obviously this is geared for going at user scenarios and has incredible value.
Unit Tests serve a different functionality. The are the developers way of verifying the components they write work correctly in the absence of other features or in combination with other features. This offers a range of value including
Provides un-ignorable documentation
Ability to isolate bugs to specific components
Verify invariants in the code
Provide quick, immediate feedback to changes in the code base.

One place to start is regression testing. Once you find a bug, write a small test that demonstrates the bug, fix it, then make sure the test now passes. In future you can run that test before each release to ensure that the bug has not been reintroduced.
Why do that at a unit level instead of a whole-program level? Speed. In good code it's much faster to isolate a small unit and write a tiny test than to drive a complex program through to the bug point. Then when testing a unit test will generally run significantly faster than an integration test.

Very simply: Unit tests are easier to write, since you're only testing a single method's functionality. And bugs are easier to fix, since you know exactly what method is broken.
But like the other answerers have pointed out, unit tests aren't the end-all-be-all of testing. They're just the smallest piece of the equation.

Probably the single biggest difficulty with software is the sheer number of interacting things, and the most useful technique is to reduce the number of things that have to be considered.
For example, using higher-level languages rather than lower-level improves productivity, because one line is a separate thing, and being able to write a program in fewer lines reduces the number of things.
Procedural programming came about as an attempt to reduce complexity by making it possible to treat a function as a thing. In order to do that, though, we have to be able to think about what the function does in a coherent manner, and with confidence that we're right. (Object-oriented programming does a similar thing, on a larger scale.)
There are several ways to do this. Design-by-contract is a way of exactly specifying what the function does. Using function parameters rather than global variables to call the function and get results reduces the complexity of the function.
Unit testing is one way to verify that the function does what it is supposed to. It's usually possible to test all the code in a function, and sometimes all the execution paths. It is a way to tell if the function works as it should or not. If the function works, we can think about it as a single thing, rather than as multiple things we have to keep track of.
It serves other purposes. Unit tests are usually quick to run, and so can catch bugs quickly, when they're easy to fix. If developers make sure a function passes the tests before being checked in, then the tests are a form of documenting what the function does that is guaranteed correct. The act of creating the tests forces the test writer to think about what the function should be doing. After that, whoever wanted the change can look at the tests to see if he or she was properly understood.
By way of contrast, larger tests are not exhaustive, and so can easily miss lots of bugs. They're bad at localizing bugs. They are usually performed at fairly long intervals, so they may detect a bug some time after it's made. They define parts of the total user experience, but provide no basis to reason about any part of the system. They should not be neglected, but they are not a substitute for unit tests.

As others have stated, the length of the feedback loop and isolation of the problem to a specific component are key benefits of Unit Tests.
Another way that they are complementary to functional tests is how coverage is tracked in some organizations:
Unit tests on code coverage
Functional tests on requirements coverage
Functional tests might miss features that were implemented but are not in the spec.
Being based on the code, Unit tests might miss that a certain feature wasn't implemented, which is where requirements based coverage analysis of Functional testing comes in.
A final point : there are some things that are easier/faster to test at the unit level, especially around error scenarios.

Unit testing will help you identify the source of your bug more clearly and let you know that you have a problem earlier. Both are good to have, but they are different, and unit testing does have benefits.

The software you test is a system. When you are testing it as a whole you are black box testing since you primarily deal with inputs and outputs. Black box testing is great when you have no means of getting inside of the system.
But since you usually do, you create a lot of unit tests that actually test your system as a white box. You can slice system open in many ways and organize your tests depending on system internal structure. White box testing provides you with many more ways of testing and analyzing systems. It's clearly complimentary to Black box testing and should not be considered as an alternative or competing methodology.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js