Unit testing a very specific function - unit-testing

To make sure this question is concrete enough for the standards in the FAQ, I am just asking the following: What are some sources that discuss the most common ways to apply unit testing to a very specific function, generally a function that relies on vendor data or other very specific data such that synthetic data is unhelpful in the test? If you're interested in more background, read below.
Background:
I write unit tests often in my daily code development, but I also try to make my code as abstract and reusable as possible. In a new project that I've joined, there are many cases where the code consists of very specific functions that are meant to accept very specifically formatted input data and store output data to database tables. Much of the input data consists of vendor data or other in-house data, and is accessed through calls to both vendor and in-house APIs.
The only idea I have so far is to test the kinds of failures hit upon when input data is poorly formatted. I will definitely write this test, but it's pretty useless for our team as far as tests go. Much more useful tests ought to check that the logic of these data manipulations is correct, which involves checking the accuracy of the output data based on the input data.
Unfortunately, I don't have any benchmark data sets where I definitively know what the output should be. Others have suggested to create my own synthetic input data (like a matrix of all 1's or something contrived where I can predict what the output should be). Unfortunately, the operations performed by the function are very non-linear (binning things by weighted percentiles and getting aggregate statistics over each percentile grouping). Any test of this that's based off of totally contrived synthetic input data won't be very useful for us either, and the time cost of formatting it and then writing to some synthetic output database table and reading it to check in the unit test kind of makes such a test worthless.
I know that unit tests should test for just one behavior. I'm just not sure how to break apart a function that does something like aggregating complicated statistics across weighted percentile groupings and boil that down to "just one thing" to test.
What are some standards used in this setting?

I've run into similar issues with very large methods. My advice would be to refactor the code utilizing Dependency Injection and adhering to the Single Responsibility Principle. Then test each class according to its responsibility.

Related

Test driven development for signal processing libraries

I work with audio manipulation, generally using Matlab for prototyping, and C++ for implementation. Recently, I have been reading up on TDD. I have looked over a few basic examples and am quite enthusiastic about the paradigm.
At the moment, I use what I would consider a global 'test-assisted' approach. For this, I write signal processing blocks in C++, and then I make a simple Matlab mex file that can interface with my classes. I subsequently add functionality, checking that the results match up with an equivalent Matlab script as I go. This works ok, but the tests become obsolete quickly as the system evolves. Furtermore, I am testing the whole system, not just units.
It would be nice to use an established TDD framework where I can have a test suite, but I don't see how I can validate the functionality of the processing blocks without tests that are equally as complex as the code under test. How would I generate the reference signals in a C++ test to validate a processing block without the test being a form of self-fulfilling prophecy?
If anyone has experience in this area, or can suggest some methodologies that I could read into, then that would be great.
I think it's great to apply the TDD approach to signal processing (it would have saved me months of time if I knew about it years ago when I was doing signal processing myself). I think the key is to break down your system into the lowest level components that can be independently tested, eg:
FFTs: test signals at known frequencies: DC, Fs/Nfft, Fs/2 and different phases etc. Check the peaks and phase are as you expect, check the normalisation constant is as you expect
peak picking: test that you correctly find maxima/minima
Filters: generate input at known frequencies and check the output amplitude and phase is as expected.
You are unlikely to get exactly the same results out between C++ and Matlab, so you'll have to supply error bounds on some of the tests. TDD is a great way of not only verifying the correctness of the code you have but is really useful when trying out different implementations. For example if you want to replace one FFT implementation with another, there are often slight differences with the way the data is packed, or the normalisation constant that is used. TDD will give you a high degree of confidence the new library is correctly integrated.
I do something similar for heuristics detection, and we have loads and loads of capture files and a framework to be able to load and inject them for testing. Do you have the possibility to capture the reference signals in a file and do the same?
As for my 2 cents regarding TDD, its a great way to develop, but as with most paradigms, you dont always have to follow it to the letter, there are times when you should know how to bend the rules a bit, so as not to write too much throw-away code/tests. I read about one approach that said absolutely no code should be written until a test is developed, which at times can be way too strict.
On the other hand, I always like to say: "If its not tested, its broken" :)
It's OK for the test to be as complex or more complex than the code under development. If you change (update, refactor, bug fix) the code and not the test, the unit test will warn you that something changed and needs to be reviewed (was a bug fix for mode A supposed to change mode B?, etc.)
Furthermore, you can maintain the APIs for the individual compute components, and not just for the entire end-to-end system.
I've only just starting thinking about TDD in the context of signal processing, so I can only add a bit to the previous answers. What I've done is exploit a bit of superposition to test primitives. For example, testing an IIR filter, I independently verified b0, b1, and b2 elements with unit and scaled gains, and then verified a1 and a2 elements that followed easily modeled decays. My test signal was a combination of ramp functions for the numerator and impulse functions for the denominator. I know it's a trivial example, but the process should work for plenty of linear operations. Tests should also exercise unstable regions and show that outputs explode appropriately.
In general, I expect that impulse responses are going to do a lot of the work for me, since many situations will see them reduce to trigonometric functions, which can be independently calculated. Similarly, if your operation has a series expansion, your test function could perform the expansion to a relevant order and compare against against your processing block. It'll be slow, but it should work.

Best practice approach for automated testing

This is a very strange request for advice for which I truly feel there is no real answer. In my project I have archiving routines on various objects that have been consumed for logical calculations, I archive these items for the sake of audit trail and to check up on calculation errors or prove correctiveness at a later stage. I am working with Entity Framework and things are slightly different to perhaps your own project.
I consume the original object, modify it directly, create a clone of the modified item, revert the original item from store and save changes accordingly. An object is not reverted to original if never consumed by a calculation, in these instances, I save directly over that object along with the various relationships that exist with further objects.
This may sound long winded, but I assure you - it seems the easiest so far in terms of my workings with EF in my situation.
My trouble with these archiving routines is, that over time as I introduce further functionality - I sometimes, without knowing, break critical code to a point where I have to regression test the entire solution over, from beginning to end, to ensure that the archiving requirements remain intact.
Is there any unit test approach or automated methodology for testing these sorts of requirements. It would speed up deployment of packages cutting down on my own manual testing.
Any advice or links to simlar situations appreciated.
I think there are two pieces to this problem you are describing:
First you need some unit tests that you can build which will represent technical requirements of the system. Think of the unit tests as the rules which you have set up to technically accomplish the goal that the end user desires. In this way, I would craft unit tests that you can feel confident will break if a technical assumption you had made about the system fails because of a code change. Remember to keep the unit tests at the unit level so that you don't have a large amount of dependencies interacting to fail a test. A unit test should test exactly one thing. If you do this, when you make code changes you can run all your unit tests and immediately know what assumptions you had made about the system which are now not being met.
I would also set up some sort of integration functional tests which are automated. I think in your problem domain it would make sense to set up integrated tests which are similar to unit tests (you can use the same tool.) Here you will want to take bigger pieces of functionality, perhaps pipes which data flows through the system and test that the correct series of transformations occur on the data.
One best practice is to make sure the tests can be run in any order. You could separate the produce routines from the archive routines, perhaps by using "gold" data on the archive routing.
The number one best practice for unit tests is just do it! Beyond that, I'd like to recommend xUnit Test Patterns: Refactoring Test Code by Gerard Meszaros.

Is it possible to use TDD with image processing algorithms?

Recently, I have worked in a project were TDD (Test Driven Development) was used. The project was a web application developed in Java and, although unit-testing web applications may not be trivial, it was possible using mocking (we have used the Mockito framework).
Now I will start a project where I will use C++ to work with image processing (mostly image segmentation) and I'm not sure whether using TDD is a good idea. The problem is that is very hard to tell whether the result of a segmentation is right or not, and the same problem applies to many other image processing algorithms.
So, what I would like to know is if someone here have successfully used TDD with image segmentation algorithms (not necessarily segmentation algorithms).
at a minimum you can use the tests for regression testing. For example, suppose you have 5 test images for a particular segmentation algorithm. You run the 5 images through the code and manually verify the results. The results, when correct, are stored on disk somewhere, and future executions of these tests compare the generated results to the stored results.
that way, if you ever make a breaking change, you'll catch it, but more importantly you only have to go through a (correct) manual test cycle once.
Whenever I do any computer-vision related development TDD is almost standard practice. You have images and something you want to measure. Step one is to hand-label a (large) subset of the images. This gives you test data. The process (for full correctness) is then to divide your test-set in two, a "development set" and a "verification set". You do repeated development cycles until your algorithm is accurate enough when applied to the development set. Then you verify the result on the veriication set (so that you're not overtraining on some weird aspect of your development set.
This is test driven development at its purest.
Note that you're testing two different things when developing heavily algorithm dependent software like this.
The regular bugs you'll get in your software. These can be tested using "normal" TDD techniques
The performance of your algorithm, for which you need a system outlined above.
A program can be bug free according to (1) but not quite according to (2). For example, a very simple image segmentation algorithm says: "the left half of the image is one segment, the right half is another segment. This program can be made bug free according to (1) quite easily. It is another matter entirely wether it satisfies your performance needs. Don't confuse the two aspects, and don't let one interfere with the other.
More specifically, I'd advice you to develop the algorithm first, buggy warts and all, and then use TDD with the algorithm (not the code!) and perhaps other requirements of the software as specification for a separate TDDevelopment process. Doing unit tests for small temporary helper functions deep within some reasonably complex algorithm under heavy development is a waste of time and effort.
TDD in image processing only makes sense for deterministic problems like:
image arithmetic
histogram generation
and so on..
However TDD is not suitable for feature extraction algorithms like:
edge detection
segmentation
corner detection
... since no algorithm can solve this kind of problems for all images perfectly.
I think the best you can do is test the simple, mathematically well-defined building blocks your algorithm consists of, like linear filters, morphological operations, FFT, wavelet transforms etc. These are often tricky enough to implement efficiently and correctly for all border cases so verifying them does make sense.
For an actual algorithm like image segmentation, TDD doesn't make much sense IMHO. I don't even think unit-tests make sense here. Sure, you can write tests, but those will always be extremely fragile. A typical image processing algorithm needs a few parameters that have to be adjusted for the desired results (a process that can't be automated, and can't be done before the algorithm is working). The results of a segmentation algorithm aren't well defined either, but your unit test can only test for some well-defined property. An algorithm can have that property without doing what you want, or the other way round, so your test result isn't very informative. Also, to test the results of a segmentation algorithm you need to write a lot of pretty hard code, while verifying the results visually is pretty easy and you have to do it anyway.
I think in a way it's similar to unit-testing user interfaces: Testing the actual well-defined functionality (e.g. when the user clicks this button, some item is added to this list and this label shows that text...) is relatively easy and can save a lot of work and debugging. But no test in the world will tell you if your UI is usable, understandable or pretty, because these things just aren't well defined.
we had some discussion on the very same "problem" with many remarks mentioned in your comments below those answers here.
We came to the end, that TDD in in computer vision / image processing (concerning the global goal of segmention, detection or sth like that) could be:
get an image/sequence that should be processed and create a test for that image: desired output and a metric to tell how far your result may differ from that "ground truth".
get another image/sequence for a different setting (different lighting, different objects or something like that), where your algorithm fails and write a test for that.
improve your algorithm in a way that it solves all previous tests.
go back to 2.
no idea whether this is applicable, creating the tests will be much more complex than in traditional TDD since it might be hard to define the allowed differences between your ground truth and your algorithm output.
Probably it's better to just use some QualityDrivenDevelopment where your changes just shouldnt make things "worse" (you again have to find a metric for that) than before.
Obiviously you still can use traditional unit testing for deterministic parts of those algorithms, but that's not the real problem of "TDD-in-signal-processing"
The image processing tests that you describe in your question take place at a much higher level than most of the tests that you will write using TDD.
In a true Test Driven Development process you will first write a failing test before adding any new functionality to your software, then write the code that causes the test to pass, rinse and repeat.
This process yields a large library of Unit Tests, sometimes with more LOC of tests than functional code!
Because your analytic algorithms have structured behavior, they would be an excellent match for a TDD approach.
But I think the question you are really asking is "how do I go about executing a suite of Integration Tests against fuzzy image processing software?" You might think I am splitting hairs, but this distinction between Unit Tests and Integration Tests really gets to the heart of what Test Driven Development means. The benefits of the TDD process come from the rich supporting fabric of Unit Tests more than anything else.
In your case I would compare the Integration Test suite to automated performance metrics against a web application. We want to accumulate a historical record of execution times, but we probably don't want to explicitly fail the build for a single poorly performing execution (which might have been affected by network congestion, disk I/O, whatever). You might set some loose tolerances around performance of your test suite and have the Continuous Integration server kick out daily reports that give you a high level overview of the performance of your algorithm.
I'd say TDD is much easier in such an application than in a web one. You have a completely deterministic algorithm you have to test. You don't have to worry about fuzzy stuff like user input and HTML rendering.
Your algorithm consists of a number of steps. Each of these steps can be tested. If you give them fixed, known input, they should yield fixed, known output. So write a test for that. You can't test that the algorithm "is correct" in general, but you can give it data for which you've already precomputed the correct result, so you can verify that it yields the correct output in that case.
I am not really into your problem, so I don't know its hot spots. However, the final result of your algorithm is hopefully deterministic, so you can perform functional testing on it. Of course, you will have to determine a "known good" result. I know of TDD performed on graphic libraries (VTK, to be precise). The comparison is done on the final result image, pixel by pixel. Without going in so much detail, if you have a known good result, you can perform an md5 of the test result and compare it against the md5 of the known-good.
For unit testing, I am pretty sure you can test individual routines. This will force you to have a very fine-grained development style.
Might want to take a look at this paper
If your goal is to optimize an algorithm rather than verifying correctness you need a metric. A good metric would measure the performance criteria underlying in your algorithm. For a segmentation algorithm this could be the sum of standard deviations of pixel data within each segment. Using the metric you can use threshold levels of acceptance or rank versions of the algorithm.
You can use a statistical approach where you have many examples and correct outcomes, and the test runs all of them and evaluates the algorithm on them. It then produces a single number that is the combined success rate of all of them.
This way you are less sensitive to specific failures and your test is more robust.
You can then use a threshold on the success rate to see if the test failed or not.

How does unit testing work when the program doesn't lend itself to a functional style?

I'm thinking of the case where the program doesn't really compute anything, it just DOES a lot. Unit testing makes sense to me when you're writing functions which calculate something and you need to check the result, but what if you aren't calculating anything? For example, a program I maintain at work relies on having the user fill out a form, then opening an external program, and automating the external program to do something based on the user input. The process is fairly involved. There's like 3000 lines of code (spread out across multiple functions*), but I can't think of a single thing which it makes sense to unit test.
That's just an example though. Should you even try to unit test "procedural" programs?
*EDIT
Based on your description these are the places I would look to unit test:
Does the form validation work of user input work correctly
Given valid input from the form is the external program called correctly
Feed in user input to the external program and see if you get the right output
From the sounds of your description the real problem is that the code you're working with is not modular. One of the benefits I find with unit testing is that it code that is difficult to test is either not modular enough or has an awkward interface. Try to break the code down into smaller pieces and you'll find places where it makes sense to write unit tests.
I'm not an expert on this but have been confused for a while for the same reason. Somehow the applications I'm doing just don't fit to the examples given for UNIT testing (very asynchronous and random depending on heavy user interaction)
I realized recently (and please let me know if I'm wrong) that it doesn't make sense to make a sort of global test but rather a myriad of small tests for each component. The easiest is to build the test in the same time or even before creating the actual procedures.
Do you have 3000 lines of code in a single procedure/method? If so, then you probably need to refactor your code into smaller, more understandable pieces to make it maintainable. When you do this, you'll have those parts that you can and should unit test. If not, then you already have those pieces -- the individual procedures/methods that are called by your main program.
Even without unit tests, though, you should still write tests for the code to make sure that you are providing the correct inputs to the external program and testing that you handle the outputs from the program correctly under both normal and exceptional conditions. Techniques used in unit testing -- like mocking -- can be used in these integration tests to ensure that your program is operating correctly without involving the external resource.
An interesting "cut point" for your application is you say "the user fills out a form." If you want to test, you should refactor your code to construct an explicit representation of that form as a data structure. Then you can start collecting forms and testing that the system responds appropriately to each form.
It may be that the actions taken by your system are not observable until something hits the file system. Here are a couple of ideas:
Set up something like a git repository for the initial state of the file system, run a form, and look at the output of git diff. It's likely this is going to feel more like regression testing than unit testing.
Create a new module whose only purpose is to make your program's actions observable. This can be as simple as writing relevant text to a log file or as complex as you like. If necessary, you can use conditional compilation or linking to ensure this module does something only when the system is under test. This is closer to traditional unit testing as you can now write tests that say upon receiving form A, the system should take sequence of actions B. Obviously you have to decide what actions should be observed to form a reasonable test.
I suspect you'll find yourself migrating toward something that looks more like regression testing than unit testing per se. That's not necessarily bad. Don't overlook code coverage!
(A final parenthetical remark: in the bad old days of interactive console applications, Don Libes created a tool called Expect, which was enormously helpful in allowing you to script a program that interacted like a user. In my opinion we desperately need something similar for interacting with web pages. I think I'll post a question about this :-)
You don't necessarily have to implement automated tests that test individual methods or components. You could implement an automated unit test that simulates a user interacting with your application, and test that your application responds in the correct way.
I assume you are manually testing your application currently, if so then think about how you could automate that and work from there. Over time you should be able to break your tests into progressively smaller chunks that test smaller sections of code. Any sort of automated testing is usually a lot better than nothing.
Most programs (regardless of the language paradigm) can be broken into atomic units which take input and provide output. As the other responders have mentioned, look into refactoring the program and breaking it down into smaller pieces. When testing, focus less on the end-to-end functionality and more on the individual steps in which data is processed.
Also, a unit doesn't necessarily need to be an individual function (though this is often the case). A unit is a segment of functionality which can be tested using inputs and measuring outputs. I've seen this when using JUnit to test Java APIs. Individual methods might not necessarily provide the granularity I need for testing, though a series of method calls will. Therefore, the functionality I regard as a "unit" is a little greater than a single method.
You should at least refactor out the stuff that looks like it might be a problem and unit test that. But as a rule, a function shouldn't be that long. You might find something that is unit test worthy once you start refactoring
Good object mentor article on TDD
As a few have answered before, there are a few ways you can test what you have outlined.
First the form input, can be tested in a few ways.
What happens if invalid data is inputted, valid data, etc.
Then each of the function can be tested to see if the functions when supplied with various forms of correct and incorrect data react in the proper manner.
Next you can mock the application that are being called so that you can make sure that your application send and process data to the external programs correctly. Don't for get to make sure your program deals with unexpected data from the external program as well.
Usually, the way I figure out how I want to write tests for a program I have been assigned to maintain, is to see what I am do manually to test the program. Then try and figure how to automate as much of it as possible. Also, don't restrict your testing tools just to the programming language you are writing the code in.
I think a wave of testing paranoia is spreading :) Its good to examine things to see if tests would make sense, sometimes the answer is going to be no.
The only thing that I would test is making sure that bogus form input is handled correctly.. I really don't see where else an automated test would help. I think you'd want the test to be non invasive (i.e. no record is actually saved during testing), so that might rule out the other few possibilities.
If you can't test something how do you know that it works? A key to software design is that the code should be testable. That may make the actual writing of the software more difficult, but it pays off in easier maintenance later.

How do I break this down into Unit Tests?

I have a method that is called on an object to perform some business logic and add it to the database.
The object is a Transaction, and part of the business logic requires searching the databses for related accounts and history items on the account.
There are then a series of comparisons and operations that need to bring back information from the account and apply it to the transaction before the transaction is then passed on to other people and written to the database.
The only way I can think of for testing this currently is within the test to create an account and the relevant history information, then to construct a transaction for each different scenario and capture the information written to the DB for the transaction and information being passed on, however this feels like its testing way too much in one test. Each scenario would be performed in a separate unit test, with the test construction refactored out into separate methods, but the actual piece of code targetted by the test is over 500 lines long.
I guess this question is more about refactoring than unit testing, but in this case they go hand in hand.
If anyone has any advice (good or bad) then I'd be glad to hear it.
EDIT:
Pseudo code:
Find account for transaction
Do validation on transaction codes and values
Update transaction with info from account
Get related history from account Handle different transaction codes and values (6 different combinations, each with different logic)
Update the transaction again with new account info (resulting from business logic)
Send transaction to clients
I would appreciate it if you had some pseudocode on this question, but just following it over I would:
Create interfaces for the data access objects that directly access the database - this way you can pass in an object that only pretends (e.g., mocks) that it accesses the database. This object would then return results consistent with the results your database would return, without actually doing any DB call. Your object could also simulate scenarios such as rolling back data to its original state.
Extract each "scenario" into a single method each - that is the essence of a unit. If your method is 500 lines long then there must be contiguous blocks in there that can be extracted. Write a unit test for each, if appropriate.
If your unit test is testing too much, that probably means your method is doing too much - You can extract methods by identifying the different things you are testing and then putting them in their own methods. Rinse and repeat until you only need one test for each method.
Transactions "passed on to other people" sounds like a code smell - a transaction in and by itself should only be one contiguous unit. If you need different users to finish your transaction, you're doing too much; keep track of your data's state on the DB instead, in terms of flags or such, not in terms of a DB transaction.
Separating out units from existing legacy code can be extremely tricky and time consuming. Check out Working Effectively With Legacy Code for a variety of tried and tested techniques to make things more manageable.
This depends on what you would like to test. Would you like to test the database transaction? Would you like to test the business transaction or something else? Try to use mockups for things you would not like to test. With mockups you can concentrate on certain test objectives.
Yes you -can- rewrite you current code so that it can be unit tested according
to all guidelines and best-practices.
However, that can be expensive, and You should estimate the cost and compare that
against the earnings...
The earnings is that you might discover a problem with the code and also, if done right, the reduction of the complexity as the result of the refactoring.
Both factors might save some time - in the future.
The cost is the time and effort you have to spend both refactor your code,
writing the test cases and also the extra time you might have to spend in the future
to maintain the test-cases and the mocking code - and that can be significant costs.
You are comparing a known cost against a future risk and I am sure a lot of smart guys knows how to do that, but it's obvious that You can actually spend an infinite time refactoring and mocking without ever reducing the risk of failure to zero (or even at all if the code and problem is complex and you are messing things up when refactoring), so you need to find a balance here.
In this case, as the code is old, it might be ok to be "sloppy" or "pragmatic" and do black box testing - or top-down testing and just test the interface (or abstraction) without bothering to mock the database. And yes, you can argue that this is not a unit test but instead a system test or a function test practice.
...But, it might give the best value for your money - or your employers/customers money - or more time with your significant other (or at least more time to watch discovery channel.)
If you have old code, allow black box testing, allow dependencies between the tests and compile a sequence of test that sets up the test data and manipulates it, and its at least tested automatically while not tested 100%.