What's the best way to unit test code that generates images? - unit-testing

What is the best way to unit test code that generates images? Say, for example, a class that generates a plot or a chart?

If this class uses a third party library to generate plots/charts (say matplotlib) then you can write tests for the methods that generate input for the the library. This will be fairly easy.
If the output is an image and you are interested in verifying its properties then you will have to dig deeper. External image attributes (size, height, format etc.) can be easily verified but others such as the actual contents of the image would be quite hard. IMHO that wouldn't be worth the trouble.
If the output is non-binary (say SVG) then you can easily write tests to ensure that the output XML contains what you are looking for.

A method that I've used is to generate a "known-good" file, store it in your source tree, and then do a binary compare against it as part of the test. If the file contents match the output hasn't changed.
This doesn't allow you to test all possible combinations of input that would generate the image, but is useful for basic regression tests.

Related

How to document test cases of a program with complex input

I need to document tests that execute a program which takes a xml file as input and then generates multiple .c and .xml files as output.
The existing 300 tests are implemented as JUnit test cases and I started documenting them with doxygen.
The documentation should help to save time when it comes to the questions:
Which test cases have to be modified when a specific feature is modified?
Are all features of the program tested in at least one test case?
My first ideas was to use the classification tree method. The result would look like this example picture:
The program to test uses only a few boolean parameters but also a XML file which is the main input. This XML file contains different lists of nodes and a lot of invariants have to be checked by the program when generating the output .c and .xml files. As part of the tests the input XML file and the generated files are also parsed and compared.
To apply the classification tree method equivalence classes have to be found. In my case all possible contents of the input XML file have to be classified.
How could a structured way of working through all of these possible XML nodes look like? This seems to be a complicated task and I want to proceed in an efficient way.
Maybe using the classification tree method is also not the best choice for this task. Are there other/better options?

How to use .rec format for training in MXNet C++ implementation?

C++ examples of MXNet contain model training examples for MNISTIter, MNIST data set (.idx3-ubyte or .idx1-ubyte). However the same code actually recommends to use im2rec tool to produce the data, and it produces the different .rec format. Looks like the .rec format contains images and labels in the same file, because im2rec takes a prepared .lst file with both (number, label and image file name per each line).
I have produced the code like
auto val_iter = MXDataIter("ImageRecordIter");
setDataIter(&val_iter, "Train", vector < string >
{"output_train.rec", "output_validate.rec"}, batch_size));
with all files present but it fails because four files are still required in the vector (segmentation fault). But why, should not labels be inside the file now?
Digging more into the code, I found that setDataIter actually sets the parameters. Parameters for ImageRecordIter can be found here. I tried to set parameters like path_imgrec, path.imgrec, then call .CreateDataIter() but all this was not helpful - segmentation fault on the first attempt to use the iterator.
I was not able to find a single example in the whole Internet about how to train any MxNet neural network in C++ using .rec file format for training and validation sets. Is it possible? The only work around I found is to try original MNIST tools that produce files covered by MNIST output examples.
Eventually I have used Mnisten to produce the matching data set so that may input format is now the same as MxNet examples use. Mnisten is a good tool to work, just it is important not to forget that it normalizes grayscale pixels into 0..1 range (no more 0..255).
It is a command line tool but with all C++ code available (and there is not really a lot if it), the converter can also be integrated with existing code of the project to handle various specifics. I have never been affiliated with this project before.

Is it considered good practice to store unit test inputs/expected outputs in a flat file?

I'm finding myself writing a lot of boilerplate for my unit tests. I could down on that boilerplate significantly if I stored my unit test inputs along with the expected outputs in a csv file and directed my test suite to read the inputs form that file, pass them to the function being tested, and then compare its output with the values in the file's expected output column.
Is this considered good practice?
Instead of storing this in a separate file, I would recommend to store it in some kind of table (probably an array) inside your test code and iterate over that table. Most testing frameworks have specific support for this: in JUnit the feature is called parameterized tests. Then you even don't have to implement the iteration over that set of inputs and expected outputs yourself.

Reduce a Caffe network model

I'd like to use Caffe to extract image features. However, it takes too long to process an image, so I'm looking for ways to optimize for speed.
One thing I noticed is that the network definition I'm using has four extra layers on top the one from which I'm reading a result (and there are no feedback signals, so they should be safe to delete).
I tried to delete them from the definition file but it had no effect at all. I guess I might need to remove the corresponding part of the file that contains pre-trained weights, too. That is, however, a binary file (a protobuffer) so editing it is not that easy.
Do you think that removing the four layers might have a profound effect of the net performance?
If so then how do I get familiar with the file contents so that I could edit it and how do I know which parts to remove?
first, I don't think removing the binary weights will have any effect.
Second, you can do it easily using the python interface: see this tutorial.
Last but not least, have you tried running caffe time to measure the performance of your net? this may help you identify the bottlenecks of your computations.
PS,
You might find this thread relevant as well.
Caffemodel stores data as key-value pair. Caffe only copies weight for those layers (in train.prototxt) having exactly same name as caffemodel. Hence I don't think removing binary weights will work. If you want to change network structure, just modify train.prototxt and deploy.txt.
If you insist to remove weights from binary file, follow this caffe example.
And to make sure you delete right part, this visualizing tool should help.
I would retrain on a smaller input size, change strides, etc. However if you want to reduce file size, I'd suggest quantizing the weights https://github.com/yuanyuanli85/CaffeModelCompression and then using something like lzma compression (xz for unix). We do this so we can deploy to mobile devices. 8 bit weights compress nicely.

xsd-based code generator to build xml?

I have a schema (xsd), and I want to create xml files that conform to it.
I've found code generators that generate classes which can be loaded from an xml file (CodeSynthesis). But I'm looking to go the other direction.
I want to generate code that will let me build an object which can easily be written out as an xml file. In C++. I might be able to use Java for this, but C++ would be preferable. I'm on solaris, so a VisualStudio plugin won't help me (such as xsd2code).
Is there a code generator that lets me do this?
To close this out: I did wind up using CodeSynthesis. It worked very well, as long as I used a single xsd as its source. Since I actually had two xsds (one imported the other), I had to manually merge them (they did some weird inheritance that needed manual massaging).
But yes, Code Synthesis was the way to go.