How to unit test complex methods - unit-testing

I have a method that, given an angle for North and an angle for a bearing, returns a compass point value from 8 possible values (North, NorthEast, East, etc.). I want to create a unit test that gives decent coverage of this method, providing different values for North and Bearing to ensure I have adequate coverage to give me confidence that my method is working.
My original attempt generated all possible whole number values for North from -360 to 360 and tested each Bearing value from -360 to 360. However, my test code ended up being another implementation of the code I was testing. This left me wondering what the best test would be for this such that my test code isn't just going to contain the same errors that my production code might.
My current solution is to spend time writing an XML file with data points and expected results, which I can read in during the test and use to validate the method but this seems exceedingly time consuming. I don't want to write a file that contains the same range of values that my original test contained (that would be a lot of XML) but I do want to include enough to adequately test the method.
How do I test a method without just reimplementing the method?
How do I achieve adequate coverage to have confidence in the method I am testing without having to have test points for all possible inputs and results?
Obviously, don't dwell too much on my specific example as this applies to many situations where there are complex calculations and ranges of data to be tested.
NOTE: I am using Visual Studio and C#, but I believe this question is language-agnostic.

First off, you're right, you do not want your test code to reproduce the same calculation as the code under test. Secondly, your second approach is a step in the right direction. Your tests should contain a specific set of inputs with the pre-computed expected output values for those inputs.
Your XML file should contain just a subset of the input data that you've described. Your tests should ensure that you can handle the extreme ranges of your input domain (-360, 360), a few data points just inside the ends of the range, and a few data points in the middle. Your tests should also check that your code fails gracefully when given values outside the input range (e.g. -361 and +361).
Finally, in your specific case, you may want to have a few more edge cases to make sure that your function correctly handles "switchover" points within your valid input range. These would be the points in your input data where the output is expected to switch from "North" to "Northwest" and from "Northwest" to "West", etc. (don't run your code to find these points, compute them by hand).
Just concentrating on these edge cases and a few cases in between the edges should greatly reduce the amount of points you have to test.

You could possibly re-factor the method into parts that are easier to unit test and write the unit tests for the parts. Then the unit tests for the whole method only need to concentrate on integration issues.

I prefer to do the following.
Create a spreadsheet with right answers. However complex it needs to be is irrelevant. You just need some columns with the case and some columns with the expected results.
For your example, this can be big. But big is okay. You'll have an angle, a bearing and the resulting compass point value. You may have a bunch of intermediate results.
Create a small program that reads the spreadsheet and writes the simplified, bottom-line unittest cases. You want your cases stripped down to
def testCase215n( self ):
self.fixture.setCourse( 215 )
self.fixture.setBearing( 45 )
self.fixture.calculate()
self.assertEquals( "N", self.fixture.compass() )
[That's Python, the same idea would hold for C#.]
The spreadsheet contains the one-and-only authoritative list of right answers. You generate code from this once or twice. Unless, of course, you find an error in your spreadsheet version and have to fix that.
I use a small Python program with xlrd and the Mako template generator to do this. You could do something similar with C# products.

If you can think of a completely different implementation of your method, with completely different places for bugs to hide, you could test against that. I often do things like this when I've got an efficient, but complex implementation of something that could be implemented much more simply but inefficiently. For example, if writing a hash table implementation, I might implement a linear search-based associative array to test it against, and then test using lots of randomly generated input. The linear search AA is very hard to screw up and even harder to screw up such that it's wrong in the same way as the hash table. Therefore, if the hash table has the same observable behavior as the linear search AA, I'd be pretty confident it's correct.
Other examples would include writing a bubble sort to test a heap sort against, or using a known working sort function to find medians and comparing that to the results of an O(N) median finding algorithm implementation.

I believe that your solution is fine, despite using a XML file (I would have used a plain text file). But a more used tactic is to just test limit situations, like using, in your case, a entry value of -360, 360, -361, 361 and 0.

You could try orthogonal array testing to achieve all-pairs coverage instead of all possible combinations. This is a statistical technique based on the theory that most bugs occur due to interactions between pairs of parameters. It can drastically reduce the number of test cases you write.

Not sure how complicated your code is, if it is taking an integer in and dividing it up into 8 or 16 directions on the compass, it is probably a few lines of code yes?
You are going to have a hard time not re-writing your code to test it, depending how you test it. Ideally you want an independent party to write the test code based on the same requirements but without looking at or borrowing your code. This is unlikely to happen in most situations. In this case that may be overkill.
In this specific case I would feed it each number in order from -360 to +360, and print the number and the result (to a text file in a format that can be compiled into another program as a header file). Visually inspect that the direction changes at the desired input. This should be easy to visually inspect and validate. Now you have a table of inputs and valid outputs. Next have a program randomly select from the valid inputs feed it into your code under test and see that the right answer comes out. Do a few hundred of these random tests. At some point you need to validate that numbers less than -360 or greater than +360 are handled per your requirements, either clipping or modulating I assume.

So I took a software testing class link text and basically what you want is to identify the class of inputs.. all real numbers? all integers, only positive, only negative,etc... Then group the output actions. is 360 uniquely different from 359 or do they pretty much end up doing the same thing to the app. Once there do a combination of inputs to outputs.
This all seems abstract and vague but until you provide the method code it's difficult to come up with a perfect strategy.
Another way is to do branch level testing or predicate coverage testing. code coverage isn't fool proof but not covering all your code seems irresponsible.

One approach, probably one to apply in combination with other method of testing, is to see if you can make a function that reverses the method you are testing. In this case, it would take a compass direction(northeast, say), and output a bearing (given the bearing for north). Then, you could test the method by applying it to a series of inputs, then applying the function to reverse the method, and seeing if you get back the original input.
There are complications, particularly if one output corresponds to multiple inputs,but it may be possible in those cases to generate the set of inputs corresponding to a given output, and test that each member of the set (or a certain sample of the elements of the set).
The advantage of this approach is that it doesn't rely on you being able to simulate the method manually, or create an alternative implementation of the method. If the reversal involves a different approach to the problem to that used in the original method, it should reduce the risk of making equivalent mistakes in both.

Psuedocode:
array colors = { red, orange, yellow, green, blue, brown, black, white }
for north = -360 to 361
for bearing = -361 to 361
theColor = colors[dirFunction(north, bearing)] // dirFunction is the one being tested
setColor (theColor)
drawLine (centerX, centerY,
centerX + (cos(north + bearing) * radius),
centerY + (sin(north + bearing) * radius))
Verify Resulting Circle against rotated reference diagram.
When North = 0, you'll get an 8-colored pie chart. As north varies + or -, the pie chart will look the same, but rotated around that many degrees. Test verification is a simple matter of making sure that (a) the image is a properly rotated pie chart and (b) there aren't any green spots in the orange area, etc.
This technique, by the way, is a variation on the world's greatest debugging tool: Have the computer draw you a picture of what =IT= thinks it's doing. (Too often, developers waste hours chasing what they think the computer is doing, only to find that it's doing something completely different.)

Related

Distinguishing between terms of different domains

What I am trying to do:
I am trying to take a list of terms and distinguish which domain they are coming from. For example "intestine" would be from the anatomical domain while the term "cancer" would be from the disease domain. I am getting these terms from different ontologies such as DOID and FMA (they can be found at bioportal.bioontology.org)
The problem:
I am having a hard time realizing the best way to implement this. Currently I am naively taking the terms from the ontologies DOID and FMA and taking difference of any term that is in the FMA list which we know is anatomical from the DOID list (which contains terms that may be anatomical such as colon carcinoma, colon being anatomical and carcinoma being disease).
Thoughts:
I was thinking that I can get root words, prefixes, and postfixes, for the different term domains and try and match it to the terms in the list. Another idea is to take more information from their ontology such as meta data or something and use this to distinguish between the terms.
Any ideas are welcome.
As a first run, you'll probably have the best luck with bigrams. As an initial hypothesis, diseases are usually noun phrases, and usually have a very English-specific structure where NP -> N N, like "liver cancer", which means roughly the same thing as "cancer of the liver." Doctors tend not to use the latter, while the former should be caught with bigrams quite well.
Use the two ontologies you have there as starting points to train some kind of bigram model. Like Rcynic suggested, you can count them up and derive probabilities. A Naive Bayes classifier would work nicely here. The features are the bigrams; classes are anatomy or disease. sklearn has Naive Bayes built in. The "naive" part means, in this case, that all your bigrams are independent of each other. This assumption is fundamentally false, but it works well in a lot of circumstances, so we pretend it's true.
This won't work perfectly. As it's your first pass, you should be prepared to probe the output to understand how it derived the answer it came upon and find cases that failed on. When you find trends of errors, tweak your model, and try again.
I wouldn't recommend WordNet here. It wasn't written by doctors, and since what you're doing relies on precise medical terminology, it's probably going to add bizarre meanings. Consider, from nltk.corpus.wordnet:
>>> livers = reader.synsets("liver")
>>> pprint([l.definition() for l in livers])
[u'large and complicated reddish-brown glandular organ located in the upper right portion of the abdominal cavity; secretes bile and functions in metabolism of protein and carbohydrate and fat; synthesizes substances involved in the clotting of the blood; synthesizes vitamin A; detoxifies poisonous substances and breaks down worn-out erythrocytes',
u'liver of an animal used as meat',
u'a person who has a special life style',
u'someone who lives in a place',
u'having a reddish-brown color']
Only one of these is really of interest to you. As a null hypothesis, there's an 80% chance WordNet will add noise, not knowledge.
The naive approach - what precision and recall is it getting you? If you setup a test case now, then you can track your progress as you apply more sophisticated methods.
I don't know what initial set you are dealing with - but one thing to try is to get your hands on annotated documents(maybe use mechanical turk). The documents need to be tagged as the domains you're looking for - anatomical or disease.
then count and divide will tell you how likely a word you encounter is to belong to a domain. With that the next step and be to tweak some weights.
Another approach (going in a whole other direction) is using WordNet. I don't know if it will be useful for exactly your purposes, but its a massive ontology - so it might help.
Python has bindings to use Wordnet via nltk.
from nltk.corpus import wordnet as wn
wn.synsets('cancer')
gives output = [Synset('cancer.n.01'), Synset('cancer.n.02'), Synset('cancer.n.03'), Synset('cancer.n.04'), Synset('cancer.n.05')]
http://wordnetweb.princeton.edu/perl/webwn
Let us know how it works out.

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.

How to detect and delete noise in rapidminer?

I am new in rapid miner 5, just want to know how to find noise in my data and show them in chart and how to delete them?
A complex problem because it depends what you mean by noise.
If you mean finding individual attributes whose values are plain wrong then you could plot a histogram view and work out some sort of limits on what constitutes a valid value. You could then impose that rule by using Filter Examples to remove them.
If you mean finding attributes that have some sort of random jitter applied to them it would be difficult to detect these. Only by knowing beforehand what the expected shape of the distribution is could you compare with observation and do something about it. However, the action to take is by no means obvious.
If you mean finding examples within an example set that are obviously different from other examples then you could consider using the various outlier functions. The simplest one to get started is Detect Outlier (Distances). This finds a set number of outliers (default 10) based on a distance calculation that uses all the attributes for examples. It creates a new attribute called outlier that is set to true or false. You could then use the Filter Examples operator to remove those that are set to true.
Hope that helps at least as a start.

Neural Network gives same output for different inputs, doesn't learn

I have a neural network written in standard C++11 which I believe follows the back-propagation algorithm correctly (based on this). If I output the error in each step of the algorithm, however, it seems to oscillate without dampening over time. I've tried removing momentum entirely and choosing a very small learning rate (0.02), but it still oscillates at roughly the same amplitude per network (with each network having a different amplitude within a certain range).
Further, all inputs result in the same output (a problem I found posted here before, although for a different language. The author also mentions that he never got it working.)
The code can be found here.
To summarize how I have implemented the network:
Neurons hold the current weights to the neurons ahead of them, previous changes to those weights, and the sum of all inputs.
Neurons can have their value (sum of all inputs) accessed, or can output the result of passing said value through a given activation function.
NeuronLayers act as Neuron containers and set up the actual connections to the next layer.
NeuronLayers can send the actual outputs to the next layer (instead of pulling from the previous).
FFNeuralNetworks act as containers for NeuronLayers and manage forward-propagation, error calculation, and back-propagation. They can also simply process inputs.
The input layer of an FFNeuralNetwork sends its weighted values (value * weight) to the next layer. Each neuron in each layer afterwards outputs the weighted result of the activation function unless it is a bias, or the layer is the output layer (biases output the weighted value, the output layer simply passes the sum through the activation function).
Have I made a fundamental mistake in the implementation (a misunderstanding of the theory), or is there some simple bug I haven't found yet? If it would be a bug, where might it be?
Why might the error oscillate by the amount it does (around +-(0.2 +- learning rate)) even with a very low learning rate? Why might all the outputs be the same, no matter the input?
I've gone over most of it so much that I might be skipping over something, but I think I may have a plain misunderstanding of the theory.
It turns out I was just staring at the FFNeuralNetwork parts too much and accidentally used the wrong input set to confirm the correctness of the network. It actually does work correctly with the right learning rate, momentum, and number of iterations.
Specifically, in main, I was using inputs instead of a smaller array in to test the outputs of the network.

Is it possible to see if two MP3 files are the same song by analyzing the files' bytes?

This is to be done in C++ or C....
I know we can read the MP3s' meta data, but that information can be changed by anyone, can't it?
So is there a way to analyze a file's contents and compare it against another file and determine if it is in fact the same song?
edit
Lots of interesting things coming out that I hadn't thought of. Not at all a good idea to attempt this.
It's possible, but very hard.
Even the same original recording may well be encoded differently by different MP3 encoders or the same encoder with different settings... leading to different results when the MP3 is then decoded. You'd need to work out an aural model to "understand" how big the differences are, and make a judgement.
Then there's the matter of different recordings. If I sing "Once in Royal David's City" and Aled Jones sings it, are those the same song? What if there are two different versions of a song where one has slightly modified lyrics? The key could be different, it could be in a different vocal range - all kinds of things.
How different can two songs be but still count as "the same song"? Once you've decided that, then there's the small matter of implementing it ;)
If I really had to do this, my first attempt would be to take a Fourier transform of both songs and compare the histograms. You can use FFTW (http://www.fftw.org/) to take the Fourier transform, and then compare the histograms by summing the squares of the differences at each frequency. If the resultant sum is greater than some threshold (which you must determine by experimentation) then the songs are deemed to be different, otherwise they are the same.
No. Not SO simple.
You can check they contain the same encoded data, BUT:
Could be a different bitrate
Could be the same song, just a 1/100ths of a second off
In both cases the bytes would not match.
Basically, if a solution looks too simple to be true, it often is.
If you mean "same song" in the iTunes sense of "same recording", it would be possible to compares two audio files, but not by byte-by-byte comparison of an encoded file since even for the same format there are variables such as data rate and compression that are selected at time of encoding.
Also each encoding of the same recording may include different lead-in/lead-out timings, different amplitude and equalisation, and may have come from differing original sources (vinyl, CD, original master etc.). So you need a comparison method that takes all these variables into account, and even then you will end up with a 'likelihood' of a match rather than a definitive match.
If you genuinely mean "same song", i.e. any recording by any artist of the same composition and lyrics, then you are unlikely to get a high statistical correlation in most cases since pitch, tempo, range, instrumental arrangement will be very different.
In the "same recording" scenario, relatively simple signal processing and statistical techniques could be applied, in the "same song" scenario, AI techniques would need to be deployed, and even then the results I suspect would be poor.
If you want to compare MP3 files that originated from the same MP3, but have tagged with metadata differently, it would be straight forward to just compare the actual audio data. Since it originated from the same MP3 encoding, you should be able to do a byte by byte comparison. You would have to compare all byte. It should be sufficient to sample just a few to get a unique key that would be statistically almost impossible to find in another song.
If the files have been produced by different encoders, you would have to extract some "fuzzy" feature keys from the data and compare those keys. In a hurry I would probably construct an algorithm like this:
Decode audio to pulse-code modulation (wave) in a standard bit rate.
Find a fixed number of feature starting points using some dynamic location algorithm. For example find top 10 highest wave peaks ordered from beginning of wave or simply spread evenly across the wave (it would be a good idea to fix the first and last position dynamically though, since different encodings might not start and end at exactly the same point). An improvement would be to select feature points at positions in the wave that are not likely to be too repetitive.
Extract a set of one-dimensional feature key scalars from the feature points. For example, for each feature normalize the following n-sample values and count the number of zero-crossings, peak to average ratio, mean zero-crossing distance, signal-energy. The goal is to extract robust features that are relatively unique, while still characteristic even if some noise and distortion is added to the signal. This can obviously be improved almost infinitely.
Compare the extracted feature keys of the two files using some accuracy measurement (f.eks. 9 out of 10 feature extractions must match at least 99% on 4 out of 5 of their extracted feature keys).
The benefit of a feature extraction approach is that you can build a database of features for all your mp3-files and for a single file ask the question: What other media files have exactly or almost exactly the same feature as this one. The feature lookup could be implemented very efficiently with R*-trees or similar, which could be used to give you a fast distance measurement between the n-dimensional feature sets.
The above technique is essentially a variant of what is used in image search algorithms such as SIFT, which is probably the base of such application as Photosynth and Google Goggles. In image searching you filter the image for good candidate points for relatively unique features (such as corners of shapes), then you normalize the area around that feature to get normalized color, intensity, scale and direction of features. Finally you extract the features and search an n-dimensional database of features of other images and verify that found features in other images are geometrically positioned in the same pattern as in your search image. The technique for searching audio would be the same, only simpler, since audio is one dimensional.
Use the open source EchoPrint library to create a signature of the two audio files, and compare them with each other.
The library is very easy to use, and has clear examples on how to create the signatures.
http://echoprint.me/
You can even query their database with the signature and find matching song metadata (such as title, artist, etc).
I think the Fast Fourier-Transform (FFT) approach hinted by jstanley is pretty good for most use cases; in particular, it works for verifying that the two are the same release/ same recording by the same artist/ same bitrate / audio quality.
To be more explicit, sox and spek (via command line and GUI, respectively) can do this pretty painlessly.
Spek is pretty foolproof -- just open the software and point it to the two audio files in question.
sox can generate spectograms (FFTs) from the command line line so:
sox "$file" -n spectrogram -o "$outfile".
The result from either are two images; if they look basically identical, then for almost all intents and purposes, the two songs will be equivalent.
For example, I wanted to test if these two files:
Soundtrack to an imaginary film mixtape 2011.mp3
DJRUM - Sountrack to an imaginary film mixtape 2011 (for mary-anne hobbs).mp3
were the same. diff reported a difference in the binary files (perhaps due to metadata differences or minor encoding differences), but a quick glance at their spectrograms resolved it: