How to extract the active code path from a complex algorithm

How to extract the active code path from a complex algorithm - c++

I have been puzzled lately by an intruiging idea.
I wonder if there is a (known) method to extract the executed source code from a large complex algorithm. I will try to elaborate this question:
Scenario:
There is this complex algorithm where a large amount of people have worked on for many years. The algorithm creates measurement descriptions for a complex measurement device.
The input for the algorithm is a large set of input parameters, lets call this the recipe.
Based on this recipe, the algorithm is executed, and the recipe determines which functions, loops and if-then-else constructions are followed within the algorithm. When the algorithm is finished, a set of calculated measurement parameters will form the output. And with these output measurement parameters the device can perform it's measurement.
Now, there is a problem. Since the algorithm has become so complex and large over time, it is very very difficult to find your way in the algorithm when you want to add new functionality for the recipes. Basically a person wants to modify only the functions and code blocks that are affected by its recipe, but he/she has to dig in the whole algorithm and analyze the code to see which code is relevant for his or her recipe, and only after that process new functionality can be added in the right place. Even for simple additions, people tend to get lost in the huge amount of complex code.
Solution: Extract the active code path?
I have been brainstorming on this problem, and I think it would be great if there was a way to process the algorithm with the input parameters (the recipe), and to only extract the active functions and codeblocks into a new set of source files or code structure. I'm actually talking about extracting real source code here.
When the active code is extracted and isolated, this will result in a subset of source code that is only a fraction of the original source code structure, and it will be much easier for the person to analyze the code, understand the code, and make his or her modifications. Eventually the changes could be merged back to the original source code of the algorithm, or maybe the modified extracted source code can also be executed on it's own, as if it is a 'lite' version of the original algorithm.
Extra information:
We are talking about an algorithm with C and C++ code, about 200 files, and maybe 100K lines of code. The code is compiled and build with a custom Visual Studio based build environment.
So...:
I really don't know if this idea is just naive and stupid, or if it is feasible with the right amount of software engineering. I can imagine that there have been more similar situations in the world of software engineering, but I just don't know.
I have quite some experience with software engineering, but definitely not on the level of designing large and complex systems.
I would appreciate any kind of answer, suggestion or comment.
Thanks in advance!

Other naysayers say you can't do this. I disagree.
A standard static analysis is to determine control and data flow paths through code. Sometimes such a tool must make assumptions about what might happen, so such analyses tend to be "conservative" and can include more code than the true minimum. But any elimination of irrelevant code sounds like it will help you.
Furthermore, you could extract the control and data flow paths for a particular program input. Then where the extraction algorithm is unsure about what might happen, it can check what the particular input would have caused to happen. This gives a more precise result at the price of having to provide valid inputs to the tool.
Finally, using a test coverage tool, you can relatively easily determine the code exercised for a particular input of interest, and the code exercised by another input for case that is not so interesting, and compute the set difference. This gives code exercised by the interesting case, that is not in common with the uninteresting case.
My company builds build program analysis tools (see my bio). We do static analysis to extract control and data flow paths on C++ source code, and could fairly easily light up the code involved. We also make C++ test coverage tools, that can collect the interesting- and uninteresting- sets, and show you the difference superimposed over the source code.

I'm afraid what you try is mathematically impossible. The problem is that this
When the algorithm is finished, a set of calculated measurement parameters will form the output.
is impossible to determine by static code analysis.
What you're running into is essentially a variant of the Halting Problem for which has been proven that there can not be an algorithm/program that can determine, if an algorithm passed into it will yield a result in finite time.

Related

I need help developing a polymorphism engine - instruction dependency trees

I am currently trying to write a polymorphism engine in C++ to toy-around with a neat anti-hacking stay-alive checking idea I have. However, writing the polymorphism engine is proving rather difficult - I haven't even established how I should go about doing it. The idea is to stream executable code to the user (that is the application I am protecting) and occasionally send them some code that runs some checksums on the memory image and returns it to the server. The problem is that I don't want someone to simply hijack or programatically crack the stay-alive check; instead each one would be generated on the server working off of a simple stub of code and running it through a polymorphism engine. Each stay-alive check would return a value dependant on the check-sum of the data and a random algorithm sneaked inside of the stay-alive check. If the stub returns incorrectly, I know that the stay-alive check has been tampered with.
What I have to work with:
*The executable images PDB file.
*Assembler & disassembler engine of which I have implemented a interface between them which allows to to relocate code etc...
Here are the steps I was thinking of doing and how I might do them. I am using the x86 instruction set on a windows PE executable
Steps I plan on taking(My problem is with step 2):
Expand instructions
Find simple instructions like mov, or push and replace them with a couple instructions which achieve the same end though with more instructions. In this step I'll also add loads of junk-code.
I plan on doing this just by using a series of translation tables in a database. This shouldn't be very difficult to do.
Shuffling
This is the part I have the most trouble with. I need to isolate the code in to functions. Then I need to establish a series of instruction dependancies trees, and then I need to relocate them based upon which one depend on the other. I can find functions by parsing the pdb files, but creating instruction dependancy trees is the tricky part I am totally lost on.
Compress instructions
Compress instructions and implement a series of uncommon & obscure instructions in the process. And, like the first step, do this by using a database of code signatures.
To clarify again, I need help performing step number 2 and am unsure how I should even begin. I have tried making some diagrams but they become very confusing to follow.
OH: And obviously the protected code is not going to be very optimal - but this is just a security project I wanted to play with for school.

I think what you are after for "instruction dependency trees" is data flow analysis. This is classic compiler technology that determines for each code element (primitive operations in a programming language), what information is delivered to it from other code elements. When you are done, you end up with what amounts to a graph with nodes being code elements (in your case, individual instructions) and directed arcs between them showing what information has to flow so that the later elements can execute on results produced by "earlier" elements in the graph.
You can see some examples of such flow analysis at my website (focused on tool that does program analysis; this tool is probably not appropriate for binary analysis but the examples should be helpful).
There's a ton of literature in the compiler books on doing data flow analysis. See any compiler text book.
There are a number of issues you need to handle:
Parsing the code to extract the code elements. You sound like you have access to all the instructions already.
Determining the operands required by, and the values produced, by each code element. This is pretty simple for "ADD register,register", but you may find this daunting for production x86 CPU which has an astonishingly big and crazy instruction set. You have to collect this for every instruction the CPU might execute, and that means all of them pretty much. Nontheless, this is just sweat and a lot of time spent looking at the instruction reference manuals.
Loops. Values can flow from an instruction, through other instructions, back to that same instruction, so the dataflows can form cycles (lots of cycles for complex loops). The dataflow literature will tell you how to handle this in terms of computing the dataflow arcs in the graph. What these mean for your protection scheme I don't know.
Conservative Analysis: You can't get the ideal data flow, because in fact you are analyzing arbitrary algorithms (e.g., a Turing machine); pointers aggravate this problem pretty severely and machine code is full of pointers. So what the data flow analysis engines often do when unable to decide if "x feeds y", is to simply assume "x (may) feed y". The dataflow graph turns conceptually from "x (must) feed y" into the pragmatic "x (may) feed y" type arcs; the literature in fact is full of "must" and "may" algorithms because of this.
Again, the literature tells you many ways to do [conservative] flow analysis (mostly having different degrees of conservatism; in fact the most conservatinve data flow analysis simply says "every x feeds every y"!). What this means in practice for your scheme, I don't know.
There are a lot of people interested in binary code analysis (e.g., the NSA), and they do data flow analysis on machines instructions complete with pointer analysis. You might find this presentation interesting: http://research.cs.wisc.edu/wisa/presentations/2002/0114/gogul/gogul.1.14.02.pdf

I'm not sure if what you are trying helps to prevent tampering a process. If someone attaches a debugger (process) and breaks on the send / recieve functions the checksum of the memory stays intact all shuffeling will stay as it is and the client will be seen as valid even if it isn't. This debugger or injected code is able to manipulate you when you ask what pages are used by your process (so you won't see injected code since it wouldn't tell you the pages in which it resides).
To your actual question:
Couldn't the shuffeling be implemented by relinking the executable. The linker keeps track of all the symbols that a .o file exports and imports. When all the .o files are read the real addresses of the function are put in the imported placeholders. If you put every function in a seperate cpp file and compile them to a .o file. When reordering the .o files in the linker call all the functions will be on a different address and the executable would still run fine.
I tested this with gcc on windows - and it works. By reordering the .o files when linking all functions are put to a different address.

I can find functions by parsing the pdb files, but creating
instruction dependancy trees is the tricky part I am totally lost on.
Impossible. Welcome to the Halting Problem.

difference between code coverage and profiling

What is difference between code code coverage and profiling.
Which is the best open source tool for code coverage.

Code coverage is an assessment of how much of your code has been run. This is used to see how well your tests have exercised your code.
Profiling is used to see how various parts of your code perform.
The tools depend on the language and platform you are using. I'm guessing that you are using Java, so recommend CodeCover. Though you may find NoUnit easier to use.

Coverage is important to see which parts of the code have not been run.
In my experience it has to be accumulated over multiple use cases, because any single run of the software will only use some of the code.
Profiling means different things at different times. Sometimes it means measuring performance. Sometimes it means diagnosing memory leaks. Sometimes it means getting visibility into multi-threading or other low-level activities.
When the goal is to improve software performance, by finding so-called "bottlenecks" and fixing them, don't just settle for any profiler, not even necessarily a highly-recommended or venerable one.
It is essential to use the kind that gets the right kind of information and presents it to you the right way, because there is a lot of confusion about this.
More on that subject.
Added:
For a coverage tool, I've always done it myself. In nearly every routine and basic block, I insert a call like this: Utils.CovTest("file name, routine name, comment that tells what's being done here").
The routine records the fact that it was called, and when the program finishes, all those comments are appended to a text file.
Then there's a post-processing step where that file is "subtracted" from a complete list (gotten by a grep-like program).
The result is a list of what hasn't been tested, requiring additional test cases.
When not doing coverage testing, Utils.CovTest does nothing. I leave it out of the innermost loops anyway, so it doesn't affect performance much.
In C and C++, I do it with a macro that, during normal use, expands to nothing.

Unit Testing Machine Learning Code

I am writing a fairly complicated machine learning program for my thesis in computer vision. It's working fairly well, but I need to keep trying out new things out and adding new functionality. This is problematic because I sometimes introduce bugs when I am extending the code or trying to simplify an algorithm.
Clearly the correct thing to do is to add unit tests, but it is not clear how to do this. Many components of my program produce a somewhat subjective answer, and I cannot automate sanity checks.
For example, I had some code that approximated a curve with a lower-resolution curve, so that I could do computationally intensive work on the lower-resolution curve. I accidentally introduced a bug into this code, and only found it through a painstaking search when my the results of my entire program got slightly worse.
But, when I tried to write a unit-test for it, it was unclear what I should do. If I make a simple curve that has a clearly correct lower-resolution version, then I'm not really testing out everything that could go wrong. If I make a simple curve and then perturb the points slightly, my code starts producing different answers, even though this particular piece of code really seems to work fine now.

You may not appreciate the irony, but basically what you have there is legacy code: a chunk of software without any unit tests. Naturally you don't know where to begin. So you may find it helpful to read up on handling legacy code.
The definitive thought on this is Michael Feather's book, Working Effectively with Legacy Code. There used to be a helpful summary of that on the ObjectMentor site, but alas the website has gone the way of the company. However WELC has left a legacy in reviews and other articles. Check them out (or just buy the book), although the key lessons are the ones which S.Lott and tvanfosson cover in their replies.
2019 update: I have fixed the link to the WELC summary with a version from the Wayback Machine web archive (thanks #milia).
Also - and despite knowing that answers which comprise mainly links to other sites are low quality answers :) - here is a link to a new (2019 new) Google tutorial on Testing and Debugging ML code. I hope this will be of illumination to future Seekers who stumble across this answer.

"then I'm not really testing out everything that could go wrong."
Correct.
The job of unit tests is not to test everything that could go wrong.
The job of unit tests is to test that what you have does the right thing, given specific inputs and specific expected results. The important part here is the specific visible, external requirements are satisfied by specific test cases. Not that every possible thing that could go wrong is somehow prevented.
Nothing can test everything that could go wrong. You can write a proof, but you'll be hard-pressed to write tests for everything.
Choose your test cases wisely.
Further, the job of unit tests is to test that each small part of the overall application does the right thing -- in isolation.
Your "code that approximated a curve with a lower-resolution curve" for example, probably has several small parts that can be tested as separate units. In isolation. The integrated whole could also be tested to be sure that it works.
Your "computationally intensive work on the lower-resolution curve" for example, probably has several small parts that can be tested as separate units. In isolation.
That point of unit testing is to create small, correct units that are later assembled.

Without seeing your code, it's hard to tell, but I suspect that you are attempting to write tests at too high a level. You might want to think about breaking your methods down into smaller components that are deterministic and testing these. Then test the methods that use these methods by providing mock implementations that return predictable values from the underlying methods (which are probably located on a different object). Then you can write tests that cover the domain of the various methods, ensuring that you have coverage of the full range of possible outcomes. For the small methods you do so by providing values that represent the domain of inputs. For the methods that depend on these, by providing mock implementations that return the range of outcomes from the dependencies.

Your unit tests need to employ some kind of fuzz factor, either by accepting approximations, or using some kind of probabilistic checks.
For example, if you have some function that returns a floating point result, it is almost impossible to write a test that works correctly across all platforms. Your checks would need to perform the approximation.
TEST_ALMOST_EQ(result, 4.0);
Above TEST_ALMOST_EQ might verify that result is between 3.9 and 4.1 (for example).
Alternatively, if your machine learning algorithms are probabilistic, your tests will need to accommodate for it by taking the average of multiple runs and expecting it to be within some range.
x = 0;
for (100 times) {
x += result_probabilistic_test();
}
avg = x/100;
TEST_RANGE(avg, 10.0, 15.0);
Ofcourse, the tests are non-deterministic, so you will need to tune them such that you can get non-flaky tests with a high probability. (E.g., increase the number of trials, or increase the range of error).
You can also use mocks for this (e.g, a mock random number generator for your probabilistic algorithms), and they usually help for deterministically testing specific code paths, but they are a lot of effort to maintain. Ideally, you would use a combination of fuzzy testing and mocks.
HTH.

Generally, for statistical measures you would build in an epsilon for your answer. I.E. the mean square difference of your points would be < 0.01 or some such. Another option is to run several times and if it fails "too often" then you have an issue.

Get an appropriate test dataset (maybe a subset of what your using usually)
Calculate some metric on this dataset (e.g. the accuracy)
Note down the value obtained (cross-validated)
This should give an indication of what to set the threshold for
Of course if can be that when making changes to your code the performance on the dataset will increase a little, but if it ever decreases by large this would be an indication something is going wrong.

Is it possible to use TDD with image processing algorithms?

Recently, I have worked in a project were TDD (Test Driven Development) was used. The project was a web application developed in Java and, although unit-testing web applications may not be trivial, it was possible using mocking (we have used the Mockito framework).
Now I will start a project where I will use C++ to work with image processing (mostly image segmentation) and I'm not sure whether using TDD is a good idea. The problem is that is very hard to tell whether the result of a segmentation is right or not, and the same problem applies to many other image processing algorithms.
So, what I would like to know is if someone here have successfully used TDD with image segmentation algorithms (not necessarily segmentation algorithms).

at a minimum you can use the tests for regression testing. For example, suppose you have 5 test images for a particular segmentation algorithm. You run the 5 images through the code and manually verify the results. The results, when correct, are stored on disk somewhere, and future executions of these tests compare the generated results to the stored results.
that way, if you ever make a breaking change, you'll catch it, but more importantly you only have to go through a (correct) manual test cycle once.

Whenever I do any computer-vision related development TDD is almost standard practice. You have images and something you want to measure. Step one is to hand-label a (large) subset of the images. This gives you test data. The process (for full correctness) is then to divide your test-set in two, a "development set" and a "verification set". You do repeated development cycles until your algorithm is accurate enough when applied to the development set. Then you verify the result on the veriication set (so that you're not overtraining on some weird aspect of your development set.
This is test driven development at its purest.
Note that you're testing two different things when developing heavily algorithm dependent software like this.
The regular bugs you'll get in your software. These can be tested using "normal" TDD techniques
The performance of your algorithm, for which you need a system outlined above.
A program can be bug free according to (1) but not quite according to (2). For example, a very simple image segmentation algorithm says: "the left half of the image is one segment, the right half is another segment. This program can be made bug free according to (1) quite easily. It is another matter entirely wether it satisfies your performance needs. Don't confuse the two aspects, and don't let one interfere with the other.
More specifically, I'd advice you to develop the algorithm first, buggy warts and all, and then use TDD with the algorithm (not the code!) and perhaps other requirements of the software as specification for a separate TDDevelopment process. Doing unit tests for small temporary helper functions deep within some reasonably complex algorithm under heavy development is a waste of time and effort.

TDD in image processing only makes sense for deterministic problems like:
image arithmetic
histogram generation
and so on..
However TDD is not suitable for feature extraction algorithms like:
edge detection
segmentation
corner detection
... since no algorithm can solve this kind of problems for all images perfectly.

I think the best you can do is test the simple, mathematically well-defined building blocks your algorithm consists of, like linear filters, morphological operations, FFT, wavelet transforms etc. These are often tricky enough to implement efficiently and correctly for all border cases so verifying them does make sense.
For an actual algorithm like image segmentation, TDD doesn't make much sense IMHO. I don't even think unit-tests make sense here. Sure, you can write tests, but those will always be extremely fragile. A typical image processing algorithm needs a few parameters that have to be adjusted for the desired results (a process that can't be automated, and can't be done before the algorithm is working). The results of a segmentation algorithm aren't well defined either, but your unit test can only test for some well-defined property. An algorithm can have that property without doing what you want, or the other way round, so your test result isn't very informative. Also, to test the results of a segmentation algorithm you need to write a lot of pretty hard code, while verifying the results visually is pretty easy and you have to do it anyway.
I think in a way it's similar to unit-testing user interfaces: Testing the actual well-defined functionality (e.g. when the user clicks this button, some item is added to this list and this label shows that text...) is relatively easy and can save a lot of work and debugging. But no test in the world will tell you if your UI is usable, understandable or pretty, because these things just aren't well defined.

we had some discussion on the very same "problem" with many remarks mentioned in your comments below those answers here.
We came to the end, that TDD in in computer vision / image processing (concerning the global goal of segmention, detection or sth like that) could be:
get an image/sequence that should be processed and create a test for that image: desired output and a metric to tell how far your result may differ from that "ground truth".
get another image/sequence for a different setting (different lighting, different objects or something like that), where your algorithm fails and write a test for that.
improve your algorithm in a way that it solves all previous tests.
go back to 2.
no idea whether this is applicable, creating the tests will be much more complex than in traditional TDD since it might be hard to define the allowed differences between your ground truth and your algorithm output.
Probably it's better to just use some QualityDrivenDevelopment where your changes just shouldnt make things "worse" (you again have to find a metric for that) than before.
Obiviously you still can use traditional unit testing for deterministic parts of those algorithms, but that's not the real problem of "TDD-in-signal-processing"

The image processing tests that you describe in your question take place at a much higher level than most of the tests that you will write using TDD.
In a true Test Driven Development process you will first write a failing test before adding any new functionality to your software, then write the code that causes the test to pass, rinse and repeat.
This process yields a large library of Unit Tests, sometimes with more LOC of tests than functional code!
Because your analytic algorithms have structured behavior, they would be an excellent match for a TDD approach.
But I think the question you are really asking is "how do I go about executing a suite of Integration Tests against fuzzy image processing software?" You might think I am splitting hairs, but this distinction between Unit Tests and Integration Tests really gets to the heart of what Test Driven Development means. The benefits of the TDD process come from the rich supporting fabric of Unit Tests more than anything else.
In your case I would compare the Integration Test suite to automated performance metrics against a web application. We want to accumulate a historical record of execution times, but we probably don't want to explicitly fail the build for a single poorly performing execution (which might have been affected by network congestion, disk I/O, whatever). You might set some loose tolerances around performance of your test suite and have the Continuous Integration server kick out daily reports that give you a high level overview of the performance of your algorithm.

I'd say TDD is much easier in such an application than in a web one. You have a completely deterministic algorithm you have to test. You don't have to worry about fuzzy stuff like user input and HTML rendering.
Your algorithm consists of a number of steps. Each of these steps can be tested. If you give them fixed, known input, they should yield fixed, known output. So write a test for that. You can't test that the algorithm "is correct" in general, but you can give it data for which you've already precomputed the correct result, so you can verify that it yields the correct output in that case.

I am not really into your problem, so I don't know its hot spots. However, the final result of your algorithm is hopefully deterministic, so you can perform functional testing on it. Of course, you will have to determine a "known good" result. I know of TDD performed on graphic libraries (VTK, to be precise). The comparison is done on the final result image, pixel by pixel. Without going in so much detail, if you have a known good result, you can perform an md5 of the test result and compare it against the md5 of the known-good.
For unit testing, I am pretty sure you can test individual routines. This will force you to have a very fine-grained development style.

Might want to take a look at this paper

If your goal is to optimize an algorithm rather than verifying correctness you need a metric. A good metric would measure the performance criteria underlying in your algorithm. For a segmentation algorithm this could be the sum of standard deviations of pixel data within each segment. Using the metric you can use threshold levels of acceptance or rank versions of the algorithm.

You can use a statistical approach where you have many examples and correct outcomes, and the test runs all of them and evaluates the algorithm on them. It then produces a single number that is the combined success rate of all of them.
This way you are less sensitive to specific failures and your test is more robust.
You can then use a threshold on the success rate to see if the test failed or not.

Testing approach for algorithms with complex outputs

How to test a result of a program that is basically a black box? For example one year ago I had to write a B tree as a homework and I really struggled with testing the correctness. What strategies do you use in such scenarios? Visualization? Robust input-->result sets of testing data? What do you do when it is hard to get such data because the only way how to get them is your proper working program?
EDIT: I think that my question was misunderstood. There was no problem with understanding how B tree works. That is trivial. But writing robust tests for validating its proper functionality is not so trivial. I think that this school problem is similar to many practical REAL word scenarios and test cases. And sometimes understanding the domain is quite different from delivering working and correct program...
EDIT2: And yes, with B tree it is possible to validate proper behavior with pen and paper. But this is really dirty and not fun :) This is not working well with problems that requires huge amount of data for their validation...

I'm not sure these answers really capture the problem at hand. A B-tree's input and output aren't any different from those of any other dictionary---but the algorithm performs better, if it's implemented correctly. It's only really got two functions to test (add, and find) so theoretically, "black-box" testing of this single component should be fine. Designing for testability isn't the issue, since no matter how you do it the whole algorithm will be one component.
So the question is: when you have to implement subtle algorithms, the kinds with complicated output that you can't always understand in your head so well, how do you test them? I think there are three different strategies you can use:
Black-box test basic functionality. For the B-tree case, this is things like cwash suggested, and also, things like making sure that when you add an item, you can then find it, etc.
Test certain invariants that your algorithm should maintain (the B-tree should be balanced, values within nodes should be sorted, etc.)
A few, small "pencil-and-paper" tests may be necessary -- work the algorithm out by hand and check that it matches what your code does. But the big-data tests can all be of type 2. These can also be brittle, so unless you need to be really sure about your algorithm, you may want to avoid them.

If you do not grasp the problem at hand, how can you develop a solution to it? My suggestion would be to understand the domain enough to be able to work out the problem on paper and ensure that your program matches.

Consult with an expert on the subject.
I know if I have a convoluted procedure I'm trying to fix, I have no idea what the output should be after my changes, so I need to consult a fellow developer with more knowledge of the business need, and they are able to verify what I've done is correct.

I would focus on constructing test cases that exercise the functionality of your B-tree algorithm. I haven't looked at it for years, but I'm fairly sure you'll be able to find a documented sequence of steps to insert a set of values in a specific order, then validate that the leaf nodes are as they should be. If you construct your testing along these lines, you should be able to prove your implementation is correct.

The key is to know there is a balance between testing something to death and doing tests that adequately cover what should be covered. Edge cases, e.g null inputs or checking inputs are numeric by testing an alphabet character or a punctuation character, are likely most of the tests you'd need. To complement this there may be one or two common cases to handle to show the program can handle a non-edge case as well. To cover all valid input in most programs is overkill and would result in an overwhelmingly large amount of tests.

I think the answer to the question you're asking boils down to designing for testability. Often you get a testable design for free when you test-drive the development of the solution. But let's face it, when you're implementing a highly mathematical algorithm, this just doesn't fall out.
To make sure you have a testable design, you need to understand what a seam is. Then you need to know a few rules of thumb, such as avoiding statics, using polymorphism, and properly decomposing problems and separating concerns.
Watch "The Clean Code Talks -- Unit Testing" by Misko Hevery, I think it will help you wrap your head around it.

Try looking at it from a requirements point of view, rather than an implementation point of view. Before you write code, you must understand exactly what you want it to do.
Testing and requirements should be a matching pair. If you're having trouble defining tests, maybe it's because the requirements are not well-defined. That in turn implies that you may have bugs that aren't so much implementation bugs, but "lack of clear requirements" bugs. The code writer in that case would be working to a mental list of requirements that he/she thinks is requirements, but can't be sure, and they're not written down for independent understanding and verification.
I've struggled with software where the requirements weren't clear, because the customer couldn't even tell us what they wanted. But when we delivered to them, they sure could tell us then what they didn't like about it! A big part of software engineering is getting the requirements right before the coding begins. This is true on the high-level (overall product, with requirement input from customer) and also the smaller level (modules, individual functions, where requirements are internally defined by software team or individuals). It is still true to some degree I think for iterative development, although the high-level requirements are more fluid.

#Bystrik Jurina,
I often get involved in projects which involve conversions between disparate data formats. Most answers have focused on testing a B-tree or similar algorithm, but it seems that you're looking for a more general answer.
Most of my work is based on the command line. It may sounds like a contradiction, but one of the first tools I use is visualization. I'll write some methods to write out my data structures in a format that's easy to consume. This can (and usually does) include something that's visually clear. But often it also means something that I could easily parse with a smaller test program, or even import into Excel.
I'll start by focusing on the basic outline, and write a program that does the bare minimum of what I need to accomplish. If it's a multi-step process, this might mean implementing one step at a time and validating the results of each step before moving on. Or writing something that works only in specific cases, and then expanding the set of cases where it's expected to work. At first you can validate that the code works in the limited set of cases, such as for known input data. As the project moves forward, you can start logging warnings for cases you might not have tested, or for unexpected types of input data. This has drawbacks, but is a nice approach when you're dealing with a known set of input data
Validation techniques can include formal test cases, or informal programs that work to challenge your assumptions. It could mean writing a basic driver program to exercise the "core" routines. A good example would be to add a record to a database, then read it back and compare the original object against the one loaded from the database.
If you have trouble wrapping your head around the way a program functions, think about what it needs to accomplish. It might be easier to writing code that tests the way different inputs produce different outputs. Producing visualizations is a good help, because the act of deciding how to display the data can make you think about different conditions and focus in on the most critical parts of your data structures.
Often I've found that building a visualization brings me to admit that the way the data is being stored just isn't very clear. For a B-tree, the representation isn't very flexible. But for other cases, you may be using parallel arrays when a nested tree of objects would be more natural.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js