Nightly Builds vs. Continuous Integration: Long-Running Automated Tests

Nightly Builds vs. Continuous Integration: Long-Running Automated Tests - unit-testing

We have the "problem" of a large automated suite of integration tests. While our build times are reasonable (< 1 hour), the tests typically take > 6 hours to complete.
While it's great to have this large chunk of functionality tested in our build runs, it obviously is a barrier to implementing CI, which I've found to be a very helpful for keeping source trees in a "always buildable" state.
I've reviewed threads of discussion like this one, which elaborate on the distinctions.
This leads me to a few questions:
Does CI dictate or recommend Unit vs. Integration testing automation? I've heard Unit-only in the past, but am not finding any such statements (or rationale) for this in a quick search.
What is a good "best practice" for combined build + automated test times/ratios to have effective CI for a team? My gut tells me that this should be < 2 hours as a worst case, and probably < 1 hour to be really effective. In theory, we could break up the tests to run in parallel and probably get them running in under 2 hours, but this would still be a 3 hour run.
What's the best way forward from long-running Nightly Builds + Integration Tests to CI? I'm thinking of a CI build with a few skeletal Unit Tests only, in combination with nightly builds that continue with the integration tests.
Any tooling recommendations are also welcome (Windows-only C#/C++ codebase)

For most projects, however, the XP
guideline of a ten minute build is
perfectly within reason. Most of our
modern projects achieve this. It's
worth putting in concentrated effort
to make it happen, because every
minute you reduce off the build time
is a minute saved for each developer
every time they commit. Since CI
demands frequent commits, this adds up
to a lot of time.
Source:
http://martinfowler.com/articles/continuousIntegration.html#KeepTheBuildFast
Why does it takes 6 hours? How many tests do you have? What are the ratio of the unit-test compared to integrated ones? You probrably have many more integrated tests or your unit-test are not really unit. Are your unit tests touching the DB? This may be the problem.
6 hours is a long long time. The article above has some tips.

There are a few things here.
In general you will have a number of builds, one that compiles & runs unit tests, one that does that and runs local acceptance tests, and one that runs integration tests.
You definately don't need a single build that does everything.
Your build times to me sound pretty long - remember that the point here is to give quick feedback that something has gone awry. I don't know much about your project - but i would think that you should look to get your compile and unit test build down to under two to three minutes. This is perfectly achievable, in all but very large projects, so if your unit tests take along time, then its time to start asking why.
6 hours is also a very long time. are you sure that your tests are testing the right stuff? do you have too many wide scope tests? are you using "sleep()" everywhere to makeup for the fact that you haven't modeled asynchrony well in your test code?
You should probably get hold of Jez Humbles book "Continuous Delivery", and take a look at Growing Object Oriented Software as how to write unit / integration tests. GOOS uses Java as an implementation language, but all the concepts are the same.

Related

Smart unit tests choosing after commit

I'm looking for a tool that can solve following problem:
Our complete unit tests suite takes hours to complete. So when programmer commits code, he gets tests result after few hours. What we would like to achieve is to shorten time for finding simple bugs. This could be done by smart selection of few unit tests which would be run just before/right after the commit. Of course we don't want to randomly pick this unit test - we want unit tests that will more likely find a bug.
Idea to solve this problem without additional software:
Make code coverage for each single unit test. Knowing which files are "touched" by which unit test, we can pick this unit test if user changed any of this files.
This solution has obvious disadvantage - we have to manually store and update list of covered files for each unit test.
I wonder, if there is any tool that helps selecting tests to run?
Project uses C++ and works under Linux.

In Working Effectively with Legacy Code, Michael Feathers writes that a unit test that takes 10ms is a slow unit test. You must root out the slow tests. Do not hack up subset runners based on coverage guesses that will eventually be wrong and bite you.
Keep in mind the distinction between unit tests and integration tests. Unit tests do not touch the filesystem, talk over the network, or communicate with a database: those are integration tests. Yes, integration tests are often easier to write, but that is a strong indication that your software could be factored better—and as a happy coincidence, easier to test.
My suspicion is your integration tests are the ones taking so long. Move those to a separate suite that runs less frequently than on every checkin, say nightly.

Howto overcome Unit Test Regression Problems...?

I was looking for some kind of a solution for software development teams which spend too much time handling unit test regression problems (about 30% of the time in my case!!!), i.e., dealing with unit tests which fails on a day to day basis.
Following is one solution I'm familiar with, which analyzes which of the latest code changes caused a certain unit test to fail:
Unit Test Regression Analysis Tool
I wanted to know if anyone knows similar tools so I can benchmark them.
As well, if anyone can recommand another approach to handle this annoying problem.
Thanks at Advanced

You have our sympathy. It sounds like you have brittle test syndrome. Ideally, a single change to a unit test should only break a single test-- and it should be a real problem. Like I said, "ideally". But this type of behavior common and treatable.
I would recommend spending some time with the team doing some root cause analysis of why all these tests are breaking. Yep, there are some fancy tools that keep track of which tests fail most often, and which ones fail together. Some continuous integration servers have this built in. That's great. But I suspect if you just ask each other, you'll know. I've been though this and the team always just knows from their experience.
Anywho, a few other things I've seen that cause this:
Unit tests generally shouldn't depend on more than the class and method they are testing. Look for dependencies that have crept in. Make sure you're using dependency injection to make testing easier.
Are these truly unique tests? Or are they testing the same thing over and over? If they are always going to fail together, why not just remove all but one?
Many people favor integration over unit tests, since they get more coverage for their buck. But with these, a single change can break lots of tests. Maybe you're writing integration tests?
Perhaps they are all running through some common set-up code for lots of tests, causing them to break in unison. Maybe this can be mocked out to isolate behaviors.

Test often, commit often.
If you don't do that already, I suggest to use a Continuous Integration tool, and ask/require the developers to run the automated tests before committing. At least a subset of the tests. If running all tests takes too long, then use a CI tools that spawns a build (which includes running all automated tests) for each commit, so you can easily see which commit broke the build.
If the automated tests are too fragile, maybe they don't test the functionality, but the implementation details? Sometimes testing the implementation details is a good idea, but it can be problematic.

Regarding running a subset of most probable test to fail - since it's usually fails due to other team members (at least in my case), I need to ask others to run my test - which might be 'politically problematic' in some of the development environments ;). Any other suggestions will be appriciated. Thanks a lot – SpeeDev Sep 30 '10 at 23:18
If you have to "ask others" to run your test then that suggests a serious problem with your test infrastructure. All tests (regardless of who wrote them) should be run automatically. The responsibility for fixing a failing test should lie with the person who committed the change not the test author.

How slow is too slow for unit tests?

Michael Feathers, in Working Effectively With Legacy Code, on pages 13-14 mentions:
A unit test that takes 1/10th of a
second to run is a slow unit test...
If [unit tests] don't run fast, they
aren't unit tests.
I can understand why 1/10th a second is too slow if one has 30,000 tests, as it would take close to an hour to run. However, does this mean 1/11th of a second is any better? No, not really (as it's only 5 minutes faster). So a hard fast rule probably isn't perfect.
Thus when considering how slow is too slow for a unit tests, perhaps I should rephrase the question. How long is too long for a developer to wait for the unit test suite to complete?
To give an example of test speeds. Take a look at several MSTest unit test duration timings:
0.2637638 seconds
0.0589954
0.0272193
0.0209824
0.0199389
0.0088322
0.0033815
0.0028137
0.0027601
0.0008775
0.0008171
0.0007351
0.0007147
0.0005898
0.0004937
0.0004624
0.00045
0.0004397
0.0004385
0.0004376
0.0003329
The average for all 21 of these unit tests comes to 0.019785 seconds. Note the slowest test is due to it using Microsoft Moles to mock/isolate the file system.
So with this example, if my unit test suite grows to 10,000 tests, it could take over 3 minutes to run.

I've looked at one such project where the number of unit tests made the system take too long to test everything. "Too long" meaning that you basically didn't do that as part of your normal development routine.
However, what they had done was to categorize the unit tests into two parts. Critical tests, and "everything else".
Critical tests took just a few seconds to run, and tested only the most critical parts of the system, where "critical" here meant "if something is wrong here, everything is going to be wrong".
Tests that made the entire run take too long was relegated to the "everything else" section, and was only run on the build server.
Whenever someone committed code to the source control repository, the critical tests would again run first, and then a "full run" was scheduled a few minutes into the future. If nobody checked in code during that interval, the full tests was run. Granted, they didn't take 30 minutes, more like 8-10.
This was done using TeamCity, so even if one build agent was busy with the full unit test suit, the other build agents could still pick up normal commits and run the critical unit tests as often as needed.

I've only ever worked on projects where the test suite took at least ten minutes to run. The bigger ones, it was more like hours. And we sucked it up and waited, because they were pretty much guaranteed to find at least one problem in anything you threw at them. The projects were that big and hairy.
I wanna know what these projects are that can be tested comprehensively in seconds.
(The secret to getting things done when your project's unit tests take hours is to have four or five things you're working on at the same time. You throw one set of patches at the test suite and you task-switch, and by the time you're done with the thing you switched to, maybe your results have come back.)

I've got unit tests that takes a few seconds to execute. I've got a method which does very complicated computing and billions and billions of operations. There are a few know good values that we use as the basis for unit testing when we refactor this tricky and uber-fast method (which we must optimize the crap out of it because, as I said, it is performing billions and billions of computations).
Rules don't adapt to every domain / problem space.
We can't "divide" this method into smaller methods that we could unit test: it is a tiny but very complicated method (making use of insanely huge precomputed tables that can't be re-created fast enough on the fly etc.).
We have unit tests for that method. They are unit tests. They takes seconds to execute. It is a Good Thing [TM].
Now of course I don't dispute that you use unit testing libraries like JUnit for things that aren't unit testing: for example we also use JUnit to test complex multi-threaded scenario. These ones aren't "unit test" but you bet that JUnit still rules the day :)

EDIT See my comment to another answer (Link). Please note that there was a lot of back and forth about Unit Testing so before you decide to upvote or downvote this answer, please read all the comments on that answer.
Next, use a tool like Might-Moose (Mighty-Moose was abandoned but there are other tools) that only runs the tests affected by your code change (instead of your entire test library) every time you check-in a file.

So what's your question? :-) I agree, the true metric here is how long developers have to wait for a complete run of the unit tests. Too long and they'll start cutting corners before committing code. I'd like to see a complete commit build take less than a minute or two, but that's not always possible. At my work, a commit build used to take 8 minutes and people just started only running small parts of it before committing - so we bought more powerful machines :-)

How long is too long for a developer to wait for the unit test suite to complete?
It really depends how long the devs are happy to wait for feedback of their change. I'd say if you start talking minutes than it's too slow and you should probably break up the test suite into individual test projects and run them separately.

Can you perform unit / integration tests without creating test codes?

In our project, test procedures and expected test results (test specifications)
are created in a document.
Then, we perform testing on a built product / release.
Here, no test codes nor test tools are involved.
Is this acceptable for unit / integration testing?

What you are doing is "manual testing".
Manual testing per definition is not, and can never be unit testing.
Manual testing can be used for integration testing, and in fact should be used to some degree, because automated tests cannot spot all forms of unexpected error conditions. Especially bugs having to do with layout and things not "looking right" (which happen to be rather common in web apps).
However, if you have no automated tests at all, it means that your application is insufficiently tested, period. Because it's completely infeasible to manually test every detailed aspect of an application for every release - no organization would be willing or able to pay for the effort that would require.

Is this acceptable for unit /
integration testing?
No. what you describe is neither unit nor integration testing, it is taking the build for a walk around the block to get a cup of coffee.

Unit testing is - as I understand it - the testing of individual units of code. Relatively low level and usually developed at the same time as the code itself.
To do that you need to be working in code as well and ultimately the code that performs these tests is testing tool even if for some reason you aren't using a framework.
So no, if you aren't using testing tools or testing code, you aren't doing unit testing.
Theoretically you could be doing integration testing manually, but it's still unreliable because people tend to be inconsistent and expensive because people are slower than machines.
Ultimately the more testing you can automate, the faster and more accurate your tests will be and the more you will free your QA personel to test things that can only be tested manually.

Unit and Integration testing are two very different things, and what constitutes "acceptable" depends entirely on your organisation. It may very well be acceptable to test the system, rather than each unit separately.
Personally, I'm not a fan of automated unit testing, since the vast majority of issues I encounter are the sort of things that only ever come to light in the context of a system test.
I tend to develop incrementally, so that as what I'm working on grows, it becomes its own test harness, and foundations are proved to be solid before building anything on them.
I'd love to be able to automate system testing. It reveals all the things I would never have thought of in a million years of writing unit tests.

Are unit tests and acceptance tests enough?

If I have unit tests for each class and/or member function and acceptance tests for every user story do I have enough tests to ensure the project functions as expected?
For instance if I have unit tests and acceptance tests for a feature do I still need integration tests or should the unit and acceptance tests cover the same ground? Is there overlap between test types?
I'm talking about automated tests here. I know manual testing is still needed for things like ease of use, etc.

If I have unit tests for each class and/or member function and acceptance tests for every user story do I have enough tests to ensure the project functions as expected?
No. Tests can only verify what you have thought of. Not what you haven't thought of.

I'd recommend reading chapters 20 - 22 in the 2nd edition of Code Complete. It covers software quality very well.
Here's a quick breakdown of some of the key points (all credit goes to McConnell, 2004)
Chapter 20 - The Software-Quality Landscape:
No single defect-detection technique is completely effective by itself
The earlier you find a defect, the less intertwined it will become with the rest of your code and the less damage it will cause
Chapter 21 - Collaborative Construction:
Collaborative development practices tend to find a higher percentage of defects than testing and to find them more efficiently
Collaborative development practices tend to find different kinds of errors than testing does, implying that you need to use both reviews and testing to ensure the quality of your software
Pair programming typically costs the about the same as inspections and produces similar quality code
Chapter 22 - Developer Testing:
Automated testing is useful in general and is essential for regression testing
The best way to improve your testing process is to make it regular, measure it, and use what you learn to improve it
Writing test cases before the code takes the same amount of time and effort as writing the test cases after the code, but it shortens defect-detection-debug-correction-cycles (Test Driven Development)
As far as how you are formulating your unit tests, you should consider basis testing, data-flow analysis, boundary analysis etc. All of these are explained in great detail in the book (which also includes many other references for further reading).
Maybe this isn't exactly what you were asking, but I would say automated testing is definitely not enough of a strategy. You should also consider such things as pair programming, formal reviews (or informal reviews, depending on the size of the project) and test scaffolding along with your automated testing (unit tests, regression testing etc.).

The idea of multiple testing cycles is to catch problems as early as possible when things change.
Unit tests should be done by the developers to ensure the units work in isolation.
Acceptance tests should be done by the client to ensure the system meets the requirements.
However, something has changed between those two points that should also be tested. That's the integration of units into a product before being given to the client.
That's something that should first be tested by the product creator, not the client. The minute you invlove the client, things slow down so the more fixes you can do before they get their grubby little hands on it, the better.
In a big shop (like ours), there are unit tests, integration tests, globalization tests, master-build tests and so on at each point where the deliverable product changes. Only once all high severity bugs are fixed (and a plan for fixing low priority bugs is in place) do we unleash the product to our beta clients.
We do not want to give them a dodgy product simply because fixing a bug at that stage is a lot more expensive (especially in terms of administrivia) than anything we do in-house.

It's really impossible to know whether or not you have enough tests based simply on whether you have a test for every method and feature. Typically I will combine testing with coverage analysis to ensure that all of my code paths are exercised in my unit tests. Even this is not really enough, but it can be a guide to where you may have introduced code that isn't exercised by your tests. This should be an indication that more tests need to be written or, if you're doing TDD, you need to slow down and be more disciplined. :-)
Tests should cover both good and bad paths, especially in unit tests. Your acceptance tests may be more or less concerned with the bad path behavior but should at least address common errors that may be made. Depending on how complete your stories are, the acceptance tests may or may not be adequate. Often there is a many-to-one relationship between acceptance tests and stories. If you only have one automated acceptance test for every story, you probably don't have enough unless you have different stories for alternate paths.

Multiple layers of testing can be very useful. Unit tests to make sure the pieces behave; integration to show that clusters of cooperating units cooperate as expected, and "acceptance" tests to show that the program functions as expected. Each can catch problems during development. Overlap per se isn't a bad thing, though too much of it becomes waste.
That said, the sad truth is that you can never ensure that the product behaves "as expected", because expectation is a fickle, human thing that gets translated very poorly onto paper. Good test coverage won't prevent a customer from saying "that's not quite what I had in mind...". Frequent feedback loops help there. Consider frequent demos as a "sanity test" to add to your manual mix.

Probably not, unless your software is really, really simple and has only one component.
Unit tests are very specific, and you should cover everything thoroughly with them. Go for high code-coverage here. However, they only cover one piece of functionality at a time and not how things work together. Acceptance tests should cover only what the customer really cares about at a high level, and while it will catch some bugs in how things work together, it won't catch everything as the person writing such tests will not know about the system in depth.
Most importantly, these tests may not be written by a tester. Unit tests should be written by developers and run frequently (up to every couple minutes, depending on coding style) by the devs (and by the build system too, ideally). Acceptance tests are often written by the customer or someone on behalf of the customer, thinking about what matters to the customer. However, you also need tests written by a tester, thinking like a tester (and not like a dev or customer).
You should also consider the following sorts of tests, which are generally written by testers:
Functional tests, which will cover pieces of functionality. This may include API testing and component-level testing. You will generally want good code-coverage here as well.
Integration tests, which put two or more components together to make sure that they work together. You don't want one component to put out the position in the array where the object is (0-based) when the other component expects the count of the object ("nth object", which is 1-based), for example. Here, the focus is not on code coverage but on coverage of the interfaces (general interfaces, not code interfaces) between components.
System-level testing, where you put everything together and make sure it works end-to-end.
Testing for non-functional features, like performance, reliability, scalability, security, and user-friendliness (there are others; not all will relate to every project).

Integration tests are for when your code integrates with other systems such as 3rd party applications, or other in house systems such as the environment, database etc. Use integration tests to ensure that the behavior of the code is still as expected.

In short no.
To begin with, your story cards should have acceptance criteria. That is, acceptance criteria specified by the product owner in conjunction with the analyst specifying the behavior required and if meet, the story card will be accepted.
The acceptance criteria should drive the automated unit test (done via TDD) and the automated regression/ functional tests which should be run daily. Remember we want to move defects to the left, that is, the sooner we find ‘em the cheaper and faster they are to fix. Furthermore, continuous testing enables us to refactor with confidence. This is required to maintain a sustainable pace for development.
In addition, you need automated performance test. Running a profiler daily or overnight would provide insight into the consumption of CPU and memory and if any memory leaks exist. Furthermore, a tool like loadrunner will enable you to place a load on the system that reflects actual usage. You will be able to measure response times and CPU and memory consumption on the production like machine running loadrunner.
The automated performance test should reflect actual usage of the app. You measure the number of business transactions (i.e., if a web application the clicking on a page and the response to the users or round trips to the server). and determine the mix of such transaction along with the reate they arrive per second. Such information will enable you to design properly the automated loadrunner test required to performance test the application. As is often the case, some of the performance issues will trace back to the implementation of the application while other will be determined by the configuration of the server environment.
Remember, your application will be performance tested. The question is, will the first performance test happen before or after you release the software. Believe me, the worse place to have a performance problem is in production. Performance issues can be the hardest to fix and can cause a deployed to all users to fail thus cancelling the project.
Finally, there is User Acceptance Testing (UAT). These are test designed by the production owner/ business partner to test the overall system prior to release. In generally, because of all the other testing, it is not uncommon for the application to return zero defects during UAT.

It depends on how complex your system is. If your acceptance tests (which satisfy the customers requirements) exercise your system from front to back, then no you don't.
However, if your product relies on other tiers (like backend middleware/database) then you do need a test that proves that your product can happily link up end-to-end.
As other people have commented, tests don't necessarily prove the project functions as expected, just how you expect it to work.
Frequent feedback loops to the customer and/or tests that are written/parsable in a way the customer understands (say for example in a BDD style ) can really help.

If I have unit tests for each class
and/or member function and acceptance
tests for every user story do I have
enough tests to ensure the project
functions as expected?
This is enough to show your software is functionally correct, at least as much as your test coverage is sufficient. Now, depending on what you're developing, there certainly are non-functional requirements that matter, think about reliability, performance and scability.

Technically, a full suit of acceptance tests should cover everything. That being said, they're not "enough" for most definitions of enough. By having unit tests and integration tests, you can catch bugs/issues earlier and in a more localized manner, making them much easier to analyze and fix.
Consider that a full suit of manually executed tests, with the directions written on paper, would be enough to validate that everything works as expected. However, if you can automate the tests, you'd be much better off because it makes doing the testing that much easier. The paper version is "complete", but not "enough". In the same way, each layer of tests add more to the value of "enough".
It's also worth noting that the different sets of tests tend to test the product/code from a different "viewpoint". Much the same way QA may pick up bugs that dev never thought to test for, one set of tests may find things the other set wouldn't.

Acceptance testing can even be made manually by the client if the system in hand is small.
Unit and small integration tests (consisting of unit like tests) are there for you to build a sustainable system.
Don't try to write test for each part of the system. That is brittle (easy to break) and overwhelming.
Decide on the critical parts of the system that takes too much amount of time to manually test and write acceptance tests only for that parts to make things easy for everyone.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js