Unit test execution speed (how many tests per second?) - unit-testing

What kind of execution rate do you aim for with your unit tests (# test per second)? How long is too long for an individual unit test?
I'd be interested in knowing if people have any specific thresholds for determining whether their tests are too slow, or is it just when the friction of a long running test suite gets the better of you?
Finally, when you do decide the tests need to run faster, what techniques do you use to speed up your tests?
Note: integration tests are obviously a different matter again. We are strictly talking unit tests that need to be run as frequently as possible.
Response roundup: Thanks for the great responses so far. Most advice seems to be don't worry about the speed -- concentrate on quality and just selectively run them if they are too slow. Answers with specific numbers have included aiming for <10ms up to 0.5 and 1 second per test, or just keeping the entire suite of commonly run tests under 10 seconds.
Not sure whether it's right to mark one as an "accepted answer" when they're all helpful :)

All unit tests should run in under a second (that is all unit tests combined should run in 1 second). Now I'm sure this has practical limits, but I've had a project with a 1000 tests that run this fast on a laptop. You'll really want this speed so your developers don't dread refactoring some core part of the model (i.e., Lemme go get some coffee while I run these tests...10 minutes later he comes back).
This requirement also forces you to design your application correctly. It means that your domain model is pure and contains zero references to any type of persistance (File I/O, Database, etc). Unit tests are all about testing those business relatonships.
Now that doesn't mean you ignore testing your database or persistence. But these issues are now isolated behind repositories that can be separately tested with integration tests that is located in a separate project. You run your unit tests constantly when writing domain code and then run your integration tests once on check in.

The goal is 100s of tests per second. The way you get there is by following Michael Feather's rules of unit tests.
An important point that came up in a past CITCON discussion is that if your tests aren't this fast it is quite likely that you aren't getting the design benefits of unit testing.

If we're talking strictly unit tests, I'd aim more for completeness than speed. If the run time starts to cause friction, separate the test into different project/classes etc., and only run the tests related to what you're working on. Let the Integration server run all the tests on checkin.

I tend to focus more on readability of my tests than speed. However, I still try to make them reasonably fast. I think if they run on the order of milliseconds, you are fine. If they run a second or more per test... then you might be doing something that should be optimized.
Slow tests only become a problem as the system matures and causes the build to take hours, at which point you are more likely running into an issue of a lot of kind of slow tests rather than one or 2 tests that you can optimize easily... thus you should probably pay attention RIGHT AWAY if you see lots of tests running hundreds of milliseconds each (or worse, seconds each), rather than wait till it gets to the hundreds of tests taking that long point (at which point it is going to be really hard to solve the problem).
Even so, it will only reduce the time between when your automated build issues errors... which is ok if it is an hour later (or even a few hours later), I think. The problem is running them before you check in, but this can be avoided by selecting a small subset of tests to run that are related to what you are working on. Just make sure to fix the build if you check in code that breaks tests you didn't run!

We're currently at 270 tests in around 3.something seconds. There are probably around 8 tests that perform file IO.
These are run automatically upon a successful build of our libraries on every engineers machine. We have more extensive (and time consuming) smoke-testing that is done by the build machine every night, or can be started manually on an engineers machine.
As you can see we haven't yet reached the problem of tests being too time consuming. 10 seconds for me is the point where it starts to become intrusive, when we start to approach that it'll be something we'll take a look at. We'll likely move the lower level libraries, which are more robust since they change infrequently and have few dependencies, into the nightly builds, or a configuration where they're only executed by the build machine.
If you find it's taking more than a few seconds to run a hundred or so tests you may need to examine what you are classifying as a unit test and whether it would be better treated as a smoke test.
your mileage will obviously be highly variable depending on your area of development.

Data Point -- Python Regression Tests
Here are the numbers on my laptop for running "make test" for Python 2.5.2:
number of tests: 3851 (approx)
execution time: 9 min, 6 sec
execution rate: 7 tests / sec

One of the most important rules about unit tests is they should run fast.
How long is too long for an individual unit test?
Developers should be able to run the whole suite of unit tests in seconds, and definitely not in minutes and minutes. Developers should be able to quickly run them after changing the code in anyway. If it takes too long, they won't bother running them and you lose one of the main benefits of the tests.
What kind of execution rate do you aim for with your unit tests (# test per second)?
You should aim for each test to run in an order of milliseconds, anything over 1 second is probably testing too much.
We currently have about 800 tests that run in under 30 seconds, about 27 tests per second. This includes the time to launch the mobile emulator needed to run them. Most of them each take 0-5ms (if I remember correctly).
We have one or two that take about 3 seconds, which are probably candidates for checking, but the important thing is the whole test suite doesn't take so long that it puts off developers running it, and doesn't significantly slow down our continuous integration build.
We also have a configurable timeout limit set to 5 seconds -- anything taking longer will fail.

I judge my unit tests on a per test basis, not by by # of tests per second. The rate I aim for is 500ms or less. If it is above that, I will look into the test to find out why it is taking so long.
When I think a test is to slow, it usually means that it is doing too much. Therefore, just refactoring the test by splitting it up into more tests usually does the trick. The other times that I have noticed my tests running slow is when the test shows a bottleneck in my code, then a refactoring of the code is in order.

How long is too long for an individual
unit test?
I'd say it depends on the compile speed. One usually executes the tests at every compile. The objective of unit testing is not to slow down, but to bring a message "nothing broken, go on" (or "something broke, STOP").
I do not bother about test execution speed until this is something that starts to get annoying.
The danger is to stop running the tests because they're too slow.
Finally, when you do decide the tests
need to run faster, what techniques do
you use to speed up your tests?
First thing to do is to manage to find out why they are too slow, and wether the issue is in the unit tests or in the code under test ?
I'd try to break the test suite into several logical parts, running only the part that is supposedly affected by the code I changed at every compile. I'd run the other suites less often, perhaps once a day, or when in doubt I could have broken something, and at least before integrating.

Some frameworks provide automatic execution of specific unit tests based on heuristics such as last-modified time. For Ruby and Rails, AutoTest provides much faster and responsive execution of the tests -- when I save a Rails model app/models/foo.rb, the corresponding unit tests in test/unit/foo_test.rb get run.
I don't know if anything similar exists for other platforms, but it would make sense.

Related

Strange Unit Test Run Time Differences

I have unit test methods calling exactly same thing :
void Test()
{
for (int i = 0; i < 100000; i++);
}
One of them is always run in different duration.
If I remove first one, TestMethod3 is always different:
If I add another test methods, TestMethod6 is always different:
There is always one method that is different from others.
What is the reason behind this strange difference?
I am currently studying on algorithms and trying to measure run times with test methods. This difference made me think whether test method run times are reliable.
That has something to do with the test runner in visual studio. The tests are usually run simultaneously but the ones you see with the greater time is usually the one that was started first. I've noticed that in visual studio for years now. If you were to run one of them on their own you will notice that its run time will be longer than if it was run as part of a run all.
I've always assumed that it had to do with the timer being started early while the tests were still loading.
You can't test performance in a simple unit test. Part of the reason is that there are many different implementations and configurations of testing frameworks, with different impacts on the performance of a test.
The most notable is whether tests run in parallel, multi-threaded, or consecutively. Obviously the first option completely invalidates any benchmarking. The second option, though, still doesn't guarantee a valid benchmarking.
This is because of other factors which are independent of the actual unit testing framework: These include
Initial delays due to class loading and memory allocation
Just-in-time compilation of your byte code into machine code. This is difficult to control and can happen seemingly unpredictably.
Branch prediction, which may greatly influence your runtime behaviour, depending on the nature of the processed data and control flow
Garbage collection
Doing even remotely valid benchmarks in Java is an art form in itself. In order to get close, you should at least ensure that
you are running your code of interest in a single thread, with no other active threads
don't use garbage collection (i.e. make sure that there is enough memory for the test to perform without GC, and setting the GC options of your JVM appropriately)
have a warmup phase where you run your code in a sufficient number of iterations before starting to benchmark it.
This IBM article on 'Robust Java Benchmarking' is helpful as an introduction of the pitfalls of Java benchmarking.

Are there tools/ methods to objectively measure performance?

I'm writing a high performance application (a raytracer) in C++ using Visual Studio, and I just spent two days trying to root out a performance drop I witnessed after refactoring the code. The reason it took so long was because the performance drop was smaller than the normal variation in execution time I witnessed from run to run.
Not sure if this is normal, but sometimes the program may run at around 33fps pretty consistently, then if you close and rerun, it may run at 37fps. This means that in order to test any new change, I had to manually run and rerun until I witnessed peak performance (And this could require up to like 10 runs). Simply running it for some large number of frames, and measuring the time doesn't fix this variability. For example, if the program runs for 40 seconds on average, it will nevertheless vary by over 1-2 seconds, which makes this test nearly useless for detecting the 1 millisecond per frame performance loss I was dealing with.
Visual Studio's profiling tools also didn't help find this small of an issue, because they also were subject to variation, and in any case, its not necessarily going to tell me the exact offending line, so I have to test solutions, and the profiler is not very effective at confirming a proposed solution's efficacy.
I realize this all may sound like premature optimization, but I don't think it is because I'm optimizing only after finishing complete features; I'm just trying to monitor changes in performance regularly so that issues like the above don't slip in and just get added to the apparent cost of the new feature.
Anyways, my question is simply whether there's a way to objectively determine the "real" speed of an application, discounting the effect of variation. Or, failing that, how do developers deal with such issues? I doubt that my current process is the ideal one.
There are lots of profilers for both c++ and openGL. For those who just need the links, here are they.
OpenGL debugger-profiler
C++ profilers but I recommend Google orbit because it has dark theme.
My eyes stopped at
Objectively measure performance
As you mentioned the speed varies from run to run because it's too complex system. It helps if the scope is small and it only tests some key algorithms. It worth to automatize and collect some reference data. As every scientist say one test is not a test, you should rely on regular tests with controlled environments.
And here comes some tricks that can be used to measure performance.
In the comments others said, an average based on several runs may help you. It softens the noise from the outside.
Process priority or processor affinity could help you control the environment. By giving low priority to other processes your program gains more resource.
Measuring the whole execution of a test and compare it against processor time. As several processes runs at the same time, processor time may differs from execution time.
Update your reference values if you do a software update. Perhaps one update comes with performance boost while other with security patch.
Give a performance range for your program instead of one specific number. Perhaps the temperature messed up your measurement and the clock speed was decreased.
If a test runs too fast to measure, execute the most critical part several times in a test case. Too fast depend on how accurate you can measure. On ms basis it's really hard to decide if a test executed in 2 ms instead of 1 ms is a failure or not. However, if executed 1000 times - 1033 ms compared to 1000 ms gives you better insight.
Only test what is the critical section. Set up the environment and start the stopwatch when everything is ready. The system startup could be another test.

How long may Unit tests take in TDD?

I have a general question about TDD.
As we all know, Test Driven Development requires a lot of testing. In best practice, you should test your code after everytime, you have coded something new, that you can test.
Therefor it is very reasonable, to keep your tests as fast as possible.
The question is now:
How slow is the slowest possible. Do we start in the minutes area? Seconds? What is best?
As Example, I have a test, which has a 3x3 Testmatrix.
Executing this test takes a few seconds.
Assuming, this will sum up, it could by one day take a few minutes, to test a package.
This would mean, that a programmer would waste up to an hour each day, waiting.
So the Question is:
What is the maximum time, a test may take?
There is no minimum or maximum time. Tests should be subjectively fast (the fast will vary from team to team and project to project).
Assuming, this will sum up, it could by one day take a few minutes, to test a package. This would mean, that a programmer would waste up to an hour each day, waiting.
Your entire test suite will eventually grow to few minutes. It is inevitable.
But, you mistakenly assume that you run entire suite with every save. You don't. You only run tests related to the feature you are developing, which in practice is usually tests for class/method you are currently writing.
You still of course run entire suite, but that would be few times a day at best, usually before merging changes or pushing to repository.

Automated performance tests

At our company we have unit tests.
We are thinking of writing some automated performance tests that will also be part of the test suite, so that both developers and the automated build will run them. The tests will do something and then fail if it took more than some pre-estimated time.
The problem is, different computers have different CPU speeds, and also processes running in the background can slow down execution. So how should we go about these tests?
One strategy is to design your performance metrics for the best machine that code will run on; as long as it runs fast enough on worse machines, you're guaranteed to have better performance in production. Basically, include a fudge factor knowing that it will have to run on slower machines, presumably during testing/development.
Another strategy is to do some benchmarking during your test setup, and use that time amount as your "unit time" instead of using seconds. For example, calculating the 20th Fibonacci number using the dog-slow recursive algorithm, and then saying that all the tests have to run within 10 "20-fibs", so while the wall-clock time is going to be slower on slow machines, you have a machine-independant metric for how well it's running.
Processes running in the background is harder. Obviously you usually don't want other things interfering with your test, so one strategy is to try and eliminate that as much as possible - regular developers can probably kill some processes and run again if there's a failure, and your continuous integration box should be kept relatively clear.
If that doesn't work, or isn't good enough, you could try the opposite approach: run a bunch of CPU/IO intensive processes at the same time as your tests to mimic an overloaded system, and if the tests pass with that environment, the performance should be fine in a normal system
Depending on the limiting resource of your program (I/O, CPU, memory), you can get good results with measuring the used CPU time and comparing it to the system speed. For example, the performance tests for my current program obtain the spent CPU time with time and get the CPU speed from /proc/cpuinfo to measure the number of cycles spent for a computation.
This approach has two caveats: Firstly, it does not measure the achieved parallelity, and secondly, it does not measure external performance factors like I/O usage.
If the idea is to understand how code changes affect performance and ensure that the performance is greater than or equal to previous builds then you need to run the tests on a known hardware profile every time. The most accurate way to do this would be to set up a machine(s) that you use for your testing every single time the tests are executed. If many developers need to do this, sometimes simultaneously, perhaps creating a VM image that they could spin up and point to for the tests to execute on would be worthwhile.
You should not run these on the developers boxes themselves because as you mentioned all kinds of factors could affect the outcome of the tests on those boxes.
You should avoid trying to measure performance while under load/strain from outside of the system being tested, (low disk space, network bandwidth, memory, cpu, etc) unless those conditions are specifically set up as part of the test case. For instance, you can have 3 different test runs, one while the machine is under no load, another where you are under medium load (simulating other programs running in the background) and another under high load.
You can also run tests on various hardware profiles as part of your other stress/performance tests but you probably won't get much value out of running them against every build. Again, however, if you want you could do a few different test runs against different hardware profiles, this requires more setup though since you would need additional machines and/or VM images set up and the infrastructure to kick off the tests against these machines, gather the results and report on them.
+1 for Sam's response. I've done this a number of times in the past and it's critical to lock down your performance test environment and ensure you're minimizing any potential flux.
Running the tests on devs' systems may be a useful flag for individual devs, but having a central system to run the tests on is critical. One caveat about doing this in VMs: ensure you understand the load on the VM host system because load there can impact performance in the hosted VMs.
I've had the best, most consistent and useful results when I ran these sorts of suites during a nightly smoke check build.
It is also a question about tolerances (or acceptable capacity ranges) that will make your tests valid. Ideally, as has been stated, you need a predictable, stable and consistent set up for any useful comparison. That said if you understand the basic operational ranges of the SUT (CPU available, Mem Available etc.) then early developer testing can be done on a mix and match of systems and conditions that are within the known resource tolerances.

Best practices for the best max length of time for running unit tests in CI

We are doing continuous integration at our company with TeamCity and we have unit tests running at every commit (1 min window).
Lately, we are debating on how long a batch of unit tests should last but the shortest the better.
However, I would like to know what is the best practice for the length of a batch of unit tests?
You could build priorities into your unit tests, and only use a subset as a check-in gate (Build Verification Test, or BVT). Run lower priority tests less often (e.g. per daily build, per test pass, or per product release). Then place separate execution time limits on each (or each suite) that satisfies your dev team.
I base priorities on how fast we'd jump on fixing the bug signaled by a test failure. P0 means "must-fix, even if we have to slip the schedule", P3 means "may never fix".
One of the teams I worked on said no more than 2 minutes per feature for BVTs, and placed no time restriction on lower priority tests. The devs had to run about 5 test suites, and it was reasonable with our volume of check-ins to queue up 10 minute buddy builds. But our "unit tests" were big-huge, special-environment-required integration tests, so YMMV.
Unit tests should be run until they are all complete; don't restrict the set of unit tests based upon the runtime. If you have lots of unit tests, and they're taking a long time to run, investigate getting faster hardware to run the CI system on; buying more expensive hardware is far cheaper than not putting in place the unit tests that would detect a problem before it becomes a major bug.
"As short as possible."
But really, it depends on exactly what you're asking. Should you remove tests to shorten the build? Probably not. Might you limit the scope of the tests run on a per-commit build? Maybe so. Should you limit the tests run on a nightly build? Probably not. Exactly how long it's okay for a build to take really depends on your team, your process, and how you integrate CI into them.
Very simple make the (unit-)tests faster or parallelize them...unit tests should work...You shouldn't limit them by runtime...