Is there any way to do automated profiling of unit tests when we run them via TeamCity?
The reason I'm asking is that while we should, and most of the time do, have focus on not creating performance-wise bad code, sometimes code slips through that seems to be OK, and indeed works correctly, but the routine is used multiple places and in some cases, the elapsed run-time of a method now takes 10x the time it did before.
This is not necessarily a bug, but it would be nice to be told "Hey, did you know? One of your unit-tests now takes 10x the time it did before you checked in this code.".
So I'm wondering, is there any way to do this?
Note that I say TeamCity because that's what will ultimately run the code, tools, whatever (if something is found), but of course it could be a wholly standalone tool that we could integrate ourselves.
I also see that TeamCity is gathering elapsed time statistics for our unit tests, so my thought was that perhaps there was a tool that could analyze that set of data, to compare latest elapsed time against statistical trends, etc.
Perhaps it's as "easy" as making our own test-runner program?
Has anyone done this, or seen/know of a potential solution for this?
I'm running TeamCity Professional Version 4.5.5 (build 9103). Does the "test" tab under each individual build do what you need? I'm seeing statistical trends for each test as a function of either each build or averaged over time.
Related
I would like to use Visual Studio vsinstr.exe tool for instrumenting an unmanaged c++ executable (legacy app). It is a very large project and this would be a way how to map our huge test automation content to actual code, to identify what test cases are affected when a change is made to code base.
I'm however concerned about performance of such instrumented executable, because we basically need to run whole test automation content to get coverage data (or update it when code is changed) and this would be done each night. To get the picture, the test automation run takes maybe 10 hours (GUI tests, no unit tests because of legacy architecture)
Does anybody have real experience regarding performance of instrumented executables?
I realize this question is getting long in the tooth (getting old) so my answer is intended for other users who stumble across this question.
From my real world experience, instrumented binaries do run significantly slower, often by orders of magnitude. However, I have only instrumented MANAGED binaries and the OP specifically stated unmanaged C++ so "your mileage may vary."
My suggestion would be to run a subset of the tests that take between 2-3 minutes. Run that subset 3 times and average the actual run time results. Then instrument the binaries and run the same tests 3 times and compute the average. Fewer tests and the data might be skewed because of application initialization. More tests and you MAY end up waiting an hour for each instrumented test.
I used to be coding in C# in a TDD style - write/or change a small chunk of code, re-compile in 10 seconds the whole solution, re-run the tests and again. Easy...
That development methodology worked very well for me for a few years, until a last year when I had to go back to C++ coding and it really feels that my productivity has dramatically decreased since. The C++ as a language is not a problem - I had quite a lot of C++ dev experience... but in the past.
My productivity is still OK for a small projects, but it gets worse when with the increase of the project size and once compilation time hits 10+ minutes it gets really bad. And if I find the error I have to start compilation again, etc. That is just purely frustrating.
Thus I concluded that in a small chunks (as before) is not acceptable - any recommendations how can I get myself into the old gone habit of coding for an hour or so, when reviewing the code manually (without relying on a fast C# compiler), and only recompiling/re-running unit tests once in a couple of hours.
With a C# and TDD it was very easy to write a code in a evolutionary way - after a dozen of iterations whatever crap I started with was ending up in a good code, but it just does not work for me anymore (in a slow compilation environment).
Would really appreciate your inputs and recos.
p.s. not sure how to tag the question - anyone is welcome to re-tag the question appropriately.
Cheers.
I've found that recompiling and testing sort of pulls me out of the "zone", so in order to have the benefits of TDD, I commit fairly often into a git repository, and run a background process that checks out any new commit, runs the full test suite and annotates the commit object in git with the result. When I get around to it (usually in the evening), I then go back to the test results, fix any issues and "rewrite history", then re-run the tests on the new history. This way I don't have to interrupt my work even for the short times it takes to recompile (most of) my projects.
Sometimes you can avoid the long compile. Aside from improving the quality of your build files/process, you may be able to pick just a small thing to build. If the file you're working on is a .cpp file, just compile that one TU and unit-test it in isolation from the rest of the project. If it's a header (perhaps containing inline functions and templates), do the same with a small number of TUs that between them reference most of the functionality (if no such set of TUs exists, write unit tests for the header file and use those). This lets you quickly detect obvious stupid errors (like typos) that don't compile, and runs the subset of tests you believe to be relevant to the changes you're making. Once you have something that might vaguely work, do a proper build/test of the project to ensure you haven't broken anything you didn't realise was relevant.
Where a long compile/test cycle is unavoidable, I work on two things at once. For this to be efficient, one of them needs to be simple enough that it can just be dropped when the main task is ready to be resumed, and picked up again immediately when the main task's compile/test cycle is finished. This takes a bit of planning. And of course the secondary task has its own build/test cycle, so sometimes you want to work in separate checked-out copies of the source so that errors in one don't block the other.
The secondary task could for example be, "speed up the partial compilation time of the main task by reducing inter-component dependencies". Even so you may have hit a hard limit once it's taking 10 minutes just to link your program's executable, since splitting the thing into multiple dlls just as a development hack probably isn't a good idea. The key thing to avoid is for the secondary task to be, "hit SO", or this.
Since a simple change triggers a 10 minutes recompilation, that means you have a bad build system. Your build should recompile only changed files and files depending on the changed files.
Other then that, there are other techniques to speed up the build time (For example, try to remove unneeded includes. Then instead of including a header, use forward declaration. etc ), but the speed up of these things is not that important as what is recompiled on a change.
I don't see why you can't use TDD with C++. I used CppUnit back in 2001, so I assume it's still in place.
You don't say what IDE or build tool you're using, so I can't comment on how those affect your pace. But small, incremental compiles and running unit tests are both still possible.
Perhaps looking into Cruise Control, Team City, or another hands-off build and test process would be your cup of tea. You can just check in as fast as you can and let the automated build happen on another server.
I was looking for some kind of a solution for software development teams which spend too much time handling unit test regression problems (about 30% of the time in my case!!!), i.e., dealing with unit tests which fails on a day to day basis.
Following is one solution I'm familiar with, which analyzes which of the latest code changes caused a certain unit test to fail:
Unit Test Regression Analysis Tool
I wanted to know if anyone knows similar tools so I can benchmark them.
As well, if anyone can recommand another approach to handle this annoying problem.
Thanks at Advanced
You have our sympathy. It sounds like you have brittle test syndrome. Ideally, a single change to a unit test should only break a single test-- and it should be a real problem. Like I said, "ideally". But this type of behavior common and treatable.
I would recommend spending some time with the team doing some root cause analysis of why all these tests are breaking. Yep, there are some fancy tools that keep track of which tests fail most often, and which ones fail together. Some continuous integration servers have this built in. That's great. But I suspect if you just ask each other, you'll know. I've been though this and the team always just knows from their experience.
Anywho, a few other things I've seen that cause this:
Unit tests generally shouldn't depend on more than the class and method they are testing. Look for dependencies that have crept in. Make sure you're using dependency injection to make testing easier.
Are these truly unique tests? Or are they testing the same thing over and over? If they are always going to fail together, why not just remove all but one?
Many people favor integration over unit tests, since they get more coverage for their buck. But with these, a single change can break lots of tests. Maybe you're writing integration tests?
Perhaps they are all running through some common set-up code for lots of tests, causing them to break in unison. Maybe this can be mocked out to isolate behaviors.
Test often, commit often.
If you don't do that already, I suggest to use a Continuous Integration tool, and ask/require the developers to run the automated tests before committing. At least a subset of the tests. If running all tests takes too long, then use a CI tools that spawns a build (which includes running all automated tests) for each commit, so you can easily see which commit broke the build.
If the automated tests are too fragile, maybe they don't test the functionality, but the implementation details? Sometimes testing the implementation details is a good idea, but it can be problematic.
Regarding running a subset of most probable test to fail - since it's usually fails due to other team members (at least in my case), I need to ask others to run my test - which might be 'politically problematic' in some of the development environments ;). Any other suggestions will be appriciated. Thanks a lot – SpeeDev Sep 30 '10 at 23:18
If you have to "ask others" to run your test then that suggests a serious problem with your test infrastructure. All tests (regardless of who wrote them) should be run automatically. The responsibility for fixing a failing test should lie with the person who committed the change not the test author.
I just looked back through the project that nearly finished recently and found a very serious problem. I spent most of bank time on testing the code, reproducing the different situations "may" cause code errors.
Do you have any idea or experience to share on how to reduce the time spent on testing, so that makes the development much more smoothly?
I tried follow the concept of test-driven for all my code , but I found it really hard to achieve this, really need some help from the senior guys here.
Thanks
Re: all
Thanks for the answers above here, initially my question was how to reduce the time on general testing, but now, the problem is down to how to write the effecient automate test code.
I will try to improve my skills on how to write the test suit to cut down this part of time.
However, I still really struggle with how to reduce the time I spent on reproduce the errors , for instance, A standard blog project will be easy to reproduce the situations may cause the errors but a complicate bespoke internal system may "never" can be tested throught out easily, is it worthy ? Do you have any idea on how to build a test plan on this kind of project ?
Thanks for the further answers still.
Test driven design is not about testing (quality assurance). It has been poorly named from the outset.
It's about having machine runnable assumptions and specifications of program behavior and is done by programmers during programming to ensure that assumptions are explicit.
Since those tasks have to be done at some point in the product lifecycle, it's simply a shift of the work. Whether it's more or less efficient is a debate for another time.
What you refer to I would not call testing. Having strong TDD does mean that the testing phase does not have to be relied upon as heavily for errors which would be caught long before they reach a test build (as they are with experience programmers with a good spec and responsive stakeholders in a non-TDD environment).
If you think the upfront tests (runnable spec) is a serious problem, I guess it comes down to how much work the relative stages of development are expected to cost in time and money?
I think I understand. Above the developer-test level, you have the customer test level, and it sounds like, at that level, you are finding a lot of bugs.
For every bug you find, you have to stop, take your testing hat off, put your reproduction hat on, and figure out a precise reproduction strategy. Then you have to document the bug, perhaps put it in a bug-tracking system. Then you have to put the testing hat on. In the mean time, you've lost whatever setup you were working on and lost track of where you were on whatever test plan you were following.
Now - if that didn't have to happen - if you had far few bugs - you could zip along right through testing, right?
It's doubtful that GUI-driving test automation will help with this problem. You'll spend a great amount of time recording and maintaining the tests, and those regression tests will take a fair amount of time to return the investment. Initially, you'll go much slower with GUI-Driving customer-facing tests.
So (I submit) that what might really help is higher /initial/ code quality coming out of development activities. Micro-tests -- also called developer-tests or test-driven-development in the original sense - might really help with that. Another thing that can help is pair programming.
Assuming you can't grab someone else to pair, I'd spend an hour looking at your bug tracking system. I would look at the past 100 defects and try to categorize them into root causes. "Training issue" is not a cause, but "off by one error" might be.
Once you have them categorized and counted, put them in a spreadsheet and sort. Whatever root cause occurs the most often is the root cause you prevent first. If you really want to get fancy, multiply the root cause by some number that is the pain amount it causes. (Example: If in those 100 bugs you have 30 typos on menus, which as easy to fix, and 10 hard-to-reproduce javascript errors, you may want to fix the javascript issue first.)
This assumes you can apply some magical 'fix' to each of those root causes, but it's worth a shot. For example: Transparent icons in IE6 may be because IE6 can not easily process .png files. So have a version control trigger that rejects .gif's on checkin and the issue is fixed.
I hope that helps.
The Subversion team has developed some pretty good test routines, by automating the whole process.
I've begun using this process myself, for example by writing tests before implementing the new features. It works very well, and generates consistent testing through the whole programming process.
SQLite also have a decent test system with some very good documentation about how it's done.
In my experience with test driven development, the time saving comes well after you have written out the tests, or at least after you have written the base test cases. The key thing being here is that you actually have to write our your automated tests. The way your phrased your question leads me to believe you weren't actually writing automated tests. After you have your tests written you can easily go back later and update the tests to cover bugs they didn't previously find (for better regression testing) and you can easily and relatively quickly refactor your code with the ease of mind that the code will still work as expected after you have substantially changed it.
You wrote:
"Thanks for the answers above here,
initially my question was how to
reduce the time on general testing,
but now, the problem is down to how to
write the efficient automate test
code."
One method that has been proven in multiple empirical studies to work extremely well to maximize testing efficiency is combinatorial testing. In this approach, a tester will identify WHAT KINDS of things should be tested (and input it into a simple tool) and the tool will identify HOW to test the application. Specifically, the tool will generate test cases that specify what combinations of test conditions should be executed in which test script and the order that each test script should be executed in.
In the August, 2009 IEEE Computer article I co-wrote with Dr. Rick Kuhn, Dr. Raghu Kacker, and Dr. Jeff Lei, for example, we highlight a 10 project study I led where one group of testers used their standard test design methods and a second group of testers, testing the same application, used a combinatorial test case generator to identify test cases for them. The teams using the combinatorial test case generator found, on average, more than twice as many defects per tester hour. That is strong evidence for efficiency. In addition, the combinatorial testers found 13% more defects overall. That is strong evidence for quality/thoroughness.
Those results are not unusual. Additional information about this approach can be found at http://www.combinatorialtesting.com/clear-introductions-1 and our tool overview here. It contains screen shots and and explanation of how of our the tool makes testing more efficient by identifying a subset of tests that maximize coverage.
Also free version of our Hexawise test case generator can be found at www.hexawise.com/users/new
There is nothing inherently wrong with spending a lot of time testing if you are testing productively. Keep in mind, test-driven development means writing the (mostly automated) tests first (this can legitimately take a long time if you write a thorough test suite). Running the tests shouldn't take much time.
It sounds like your problem is you are not doing automatic testing. Using automated unit and integration tests can greatly reduce the amount of time you spend testing.
First, it's good that you recognise that you need help -- now go and find some :)
The idea is to use the tests to help you think about what the code should do, they're part of your design time.
You should also think about the total cost of ownership of the code. What is the cost of a bug making it through to production rather than being fixed first? If you're in a bank, are there serious implications about getting the numbers wrong? Sometimes, the right stuff just takes time.
One of the hardest things about any project of significant size is designing the underlying archetecture, and the API. All of this is exposed at the level of unit tests. If you're writing your tests first, then that aspect of design happens when your coding your tests, rather than the program logic. This is compounded by added effort of making code testable. Once you've got your tests, the program logic is usually quite obvious.
That being said, there seem to be some interesting automatic test builders on the horizon.
I'm looking to unit testing as a means of regression testing on a project.
However, my issue is that the project is basically a glorified DIR command -- it performs regular expression tests and MD5 filters on the results, and allows many criteria to be specified, but the entire thing is designed to process input from the system on which it runs.
I'm also a one-man development team, and I question the value of a test for code written by me which is written by me.
Is unit testing worthwhile in this situation? If so, how might such tests be accomplished?
EDIT: MD5 and Regex functions aren't provided by me -- they are provided by the Crypto++ library and Boost, respectively. Therefore I don't gain much by testing them. Most of the code I have simply feeds data into the libraries, and the prints out the results.
The value of test-after, the way you are asking, can indeed be limited in certain circumstances, but the way to unit test, from the description would be to isolate the regular expression tests and MD5 filters into one section of code, and abstract the feeding of the input so that in production it feeds from the system, and during the unit test, your test class passes in that input.
You then collect a sampling of the different scenarios you intend to support, and feed those in via different unit tests that exercise each scenario.
I think the value of the unit test will come through if you have to change the code to handle new scenarios. You will be confident that the old scenarios don't break as you make changes.
Is unit testing worthwhile in this situation?
Not necessarily: especially for a one-man team I think it may be sufficient to have automated testing of something larger than a "unit" ... further details at "Should one test internal implementation, or only test public behaviour?"
Unit testing can still provide value in a one-man show. It gives you confidence in the functionality and correctness (at some level) of the module. But some design considerations may be needed to help make testing more applicable to your code. Modularization makes a big difference, especially if combined with some kind of dependency injection, instead of tight coupling. This allows test versions of collaborators to be used for testing a module in isolation. In your case, a mock file system object could return a predictable set of data, so your filtering and criteria code can be evaluated.
The value of regression testing is often not realized until it's automated. Once that's done, things become a lot easier.
That means you have to be able to start from a known position (if you're generating MD5s on files, you have to start with the same files each time). Then get one successful run where you can save the output - that's the baseline.
From that point on, regression testing is simply a push-button job. Start your test, collect the output and compare it to your known baseline (of course, if the output ever changes, you'll need to check it manually, or with another independent script before saving it as the new baseline).
Keep in mind the idea of regression testing is to catch any bugs introduced by new code (i.e., regressing the software). It's not to test the functionality of that new code.
The more you can automate this, the better, even as a one-man development team.
When you were writing the code, did you test it as you went? Those tests could be written into an automated script, so that when you have to make a change in 2 months time, you can re-run all those tests to ensure you haven't regressed something.
In my experience the chance of regression increases sharply depending on how much time goes by after the time you finish version 1 and start coding version 2, because you'll typically forget the subtle nuances of how it works under different conditions - your unit tests are a way of encoding those nuances.
An integration test against the filesystem would be worthwhile. Just make sure it does what it needs to do.
Is unit testing valuable in a one-man shop scenario? Absolutely! There's nothing better than being able to refactor your code with absolute confidence you haven't broken anything.
How do I unit test this? Mock the system calls, and then test the rest of your logic.
I question the value of a test for code written by me which is written by me
Well, that's true now but in a year it will be you, the one-year-more-experienced developer developing against software written by you-now, the less experienced and knowledgeable developer (by comparison). Won't you want the code written by that less experienced guy (you a year ago) to be properly tested so you can make changes with confidence that nothing has broken?