Algorithm or a tool to compare two spectrogram outputs for unit testing purposes - unit-testing

I am looking for a good algorithm or a tool to compare two spectrogram outputs for unit testing. I can visually confirm the outputs are similar but I would like to automate this process. The basic stats of the outputs, such as min/max/avg/median are kind of the same but this would not catch true differences.

Related

How to determine the contribution rate of a specific unit test case?

Background
I am aware of the principles of TDD (Test Driven Development) and unit testing, as well of different coverage metrics. Currently, i am working on an Linux C/C++ project, where 100% branch coverage should be reached.
Question
Does anybody know a technique/method to automatically identify those unit test cases, that contribute most to reach a specific coverage goal? Each unit-test could then be associated with contribution rate (in percent). Having this numbers, unit-test cases could be ordered by their contribution-rate.
The Greedy algorithm can help here. In simple words:
From all tests select the one with the highest coverage
Calculate the coverage delta between remaining candidates and tests selected already.
Pick the candidate that gives the biggest delta
Repeat as of step 2 until all tests are put into the ranking
As a result you'll get a sorting that looks like the one generated by Squish Coco for GNU coreutils:
Typically the benefit of each extra test will go down the more tests you add. Some of them may be even have zero contribution to the total coverage.
A good case for this sorting is an optimal execution order for smoke tests that only have limited time to run. For a complete testing you better always run the whole suite, of course.

Use Google Test to test file output?

I'm working on a project to create a simulator (for modeling biological systems) in C++. The simulator takes an input file of parameters and then generates an output file with hundreds of molecule counts at different time points from the simulation. I'm using Google Test for all of my unit testing. I also want to include some higher level tests where I supply an input file with various model parameters and then check that the output file matches some reference file. Someone recommended using bash-tap for these higher level tests, but I'd prefer to stick to Google Test if possible. Is it possible to use Google Test for the higher level tests that I've described here?
We write CAE software (simulators) and use Google Test. We face similar issues, so hopefully you'll find the answers practical.
You can write the higher-level tests, but you will often have to do more than just "EXPECT_EQ()" for checking pass/fail. For example, if you had to test the connectivity of two abitrary graphs, it can be difficult if the algorithms are allowed to vary the order of nodes. Or, if you are comparing a matrix, sometimes you can have cases where the matrix rows and columns can be switched with no problem. Perhaps round-off error is ok. Be prepared to deal with these types of problems as they will be much more of an issue with a full simulator than with a unit test.
A more practical issue is when your organization says "run all tests before you check in." Or, maybe they run every time you hit the build button. If that's the case, you need to differentiate these unit tests from the higher level tests. We use Google Test Runner in Visual Studio, and it expects to run everything where the filename is "*Test*". It is best to name the higher level tests something else to be clear.
We also had to turn our entire executable into a DLL so that it could have tests run on top of it. There are other approaches (like scripting) which could be used with Google Test, but we've found the executable-as-a-dll approach to work. Our "real" product executable is simply a main() function that calls app_main() in the dll.
And, one final tip when using the Runner: If your app gets the --gtest_list_tests argument, don't do a bunch of expensive setup:
// Don't run if we are just listing tests.
if (!::testing::GTEST_FLAG(list_tests))
{
// Do expensive setup stuff here.
}
int result = RUN_ALL_TESTS();
if (!::testing::GTEST_FLAG(list_tests))
{
// Do expensive shutdown stuff here.
}

How should I unit-test digital filters?

First the question(s):
How should I write unit tests for a digital filter (band-pass/band-stop) in software? What should I be testing? Is there any sort of canonical test suite for filtering?
How to select test inputs, generate expected outputs, and define "conformance" in a way that I can say the actual output conforms to expected output?
Now the context:
The application I am developing (electromyographic signal acquisition and analysis) needs to use digital filtering, mostly band-pass and band-stop filtering (C#/.Net in Visual Studio).
The previous version of our application has these filters implemented with some legacy code we could use, but we are not sure how mathematically correct it is, since we don't have unit-tests for them.
Besides that we are also evaluating Mathnet.Filtering, but their unit test suite doesn't include subclasses of OnlineFilter yet.
We are not sure how to evaluate one filtering library over the other, and the closest we got is to filter some sine waves to eyeball the differences between them. That is not a good approach regarding unit tests either, which is something we would like to automate (instead of running scripts and evaluating the results elsewhere, even visually).
I imagine a good test suite should test something like?
Linearity and Time-Invariance: how should I write an automated test (with a boolean, "pass or fail" assertion) for that?
Impulse response: feeding an impulse response to the filter, taking its output, and checking if it "conforms to expected", and in that case:
How would I define expected response?
How would I define conformance?
Amplitude response of sinusoidal input;
Amplitude response of step / constant-offset input;
Frequency Response (including Half-Power, Cut-off, Slope, etc.)
I could not be considered an expert in programming or DSP (far from it!) and that's exactly why I am cautious about filters that "seem" to work well. It has been common for us to have clients questioning our filtering algorithms (because they need to publish research where data was captured with our systems), and I would like to have formal proof that the filters are working as expected.
DISCLAIMER: this question was also posted on DSP.StackExchange.com

What form of testing should I perform?

I want to write an algorithm (a bunch of machine learning algorithms) in C/C++ or maybe in Java, possibly in Python. The language doesn't really matter to me - I'm familiar with all of the above.
What matters to me is the testing. I want to train my models using training data. So I have the test input and I know what the output should be and I compare it to the model's output. What kind of test is this? Is it a unit test? How do I approach the problem? I can see that I can write some code to check what I need checking but I want to separate testing from main code. Testing is a well developed field and I've seen this done before but I don't know the name and type of this particular kind of testing so that I can read up on it and not create a mess. I'd be grateful if you could let me know what this testing method is called.
Your best bet is watch the psychology of testing videos from the tetsing God http://misko.hevery.com/
Link of Misko videos:
http://misko.hevery.com/presentations/
And read this Google testing guide http://misko.hevery.com/code-reviewers-guide/
Edited:
Anyone can write tests, they are really simple and there is no magic to write a test, you can simply do something like:
var sut = new MyObject();
var res = sut.IsValid();
if(res != true)
{
throw new ApplicationException("message");
}
That is the theory of course these days we have tools to simplify the tests and we can write something like this:
new MyObject().IsValid().Should().BeTrue();
But what you should do is focus on writing testable code, that's the magic key
Just follow the psychology of testing videos from Misko to get you started
This sounds a lot like Test-Driven Development (TDD), where you create unit-tests ahead of the production code. There are many detailed answers around this site on both topics. I've linked to a couple of pertinent questions to get you started.
If your inputs/outputs are at the external interfaces of your full program, that's black box system testing. If you are going inside your program to zoom in on a particular function, e.g., a search function, providing inputs directly into the function and observing the behavior, that's unit testing. This could be done at function level and/or module level.
If you're writing a machine learning project, the testing and training process isn't really Test-Driven Development. Have you ever heard of co-evolution? You have a set puzzles for your learning system that are, themselves, evolving. Their fitness is determined by how much they confound your cases.
For example, I want to evolve a sorting network. My learning system is the programs that produce networks. My co-evolution system generates inputs that are difficult to sort. The sorting networks are rewarded for producing correct sorts and the co-evolutionary systems are rewarded for how many failures they trigger in the sorting networks.
I've done this with genetic programming projects and it worked quite well.
Probably back testing, which means you have some historical inputs and run your algorithm over them to evaluate the performance of your algorithm. The term you used yourself - training data - is more general and you could search for that to find some useful links.
Its Unit testing. the controllers are tested and the code is checked in and out without really messing up your development code. This process is also called a Test Driven Development(TDD) where your every development cycle is tested before going into the next software iteration or phase.
Although this is a very old post, my 2 cents :)
Once you've decided which algorithmic method to use (your "evaluation protocol", so to say) and tested your algorithm on unitary edge cases, you might be interested in ways to run your algorithm on several datasets and assert that the results are above a certain threshold (individually, or on average, etc.)
This tutorial explains how to do it within the pytest framework, that is the most popular testing framework within python. It is based on an example (comparing polynomial fitting algorithms on several datasets).
(I'm the author, feel free to provide feedback on the github page!)

TDD with diagrams

I have an app which draws a diagram. The diagram follows a certain schema,
for e.g shape X goes within shape Y, shapes {X, Y} belong to a group P ...
The diagram can get large and complicated (think of a circuit diagram).
What is a good approach for writing unit tests for this app?
Find out where the complexity in your code is.
separate it out from the untestable visual presentation
test it
If you don't have any non-visual complexity, you are not writing a program, you are producing a work of art.
Unless you are using a horribly buggy compiler or something, I'd avoid any tests that boil down to 'test source code does what it says it does'. Any test that's functionally equivalent to:
assertEquals (hash(stripComments(loadSourceCode())), 0x87364fg3234);
can be deleted without loss.
It's hard to write defined unit tests for something visual like this unless you really understand the exact sequence of API calls that are going to be built.
To test something "visual" like this, you have three parts.
A "spike" to get the proper look, scaling, colors and all that. In some cases, this is almost the entire application.
A "manual" test of that creates some final images to be sure they look correct to someone's eye. There's no easy way to test this except by actually looking at the actual output. This is hard to automate.
Mock the graphics components to be sure your application calls the graphics components properly.
When you make changes, you have to run both tests: Are the API calls all correct? and Does that sequence of API calls produce the image that looks right?
You can -- if you want to really burst a brain cell -- try to create a PNG file from your graphics and test to see if the PNG file "looks" right. It's hardly worth the effort.
As you go forward, your requirements may change. In this case, you may have to rewrite the spike first and get things to look right. Then, you can pull out the sequence of API calls to create automated unit tests from the spike.
One can argue that creating the spike violates TDD. However, the spike is designed to create a testable graphics module. You can't easily write the test cases first because the test procedure is "show it to a person". It can't be automated.
You might consider first converting the initial input data into some intermediate format, that you can test. Then you forward that intermediate format to the actual drawing function, which you have to test manually.
For example when you have a program that inputs percentages and outputs a pie chart, then you might have an intermediate format that exactly describes the dimensions and position of each sector.
You've described a data model. The application presumably does something, rather than just sitting there with some data in memory. Write tests which exercise the behaviour of the application and verify the outcome is what is expected.