A lot of code in a current project is directly related to displaying things using a 3rd-party 3D rendering engine. As such, it's easy to say "this is a special case, you can't unit test it". But I wonder if this is a valid excuse... it's easy to think "I am special" but rarely actually the case.
Are there types of code which are genuinely not suited for unit-testing? By suitable, I mean "without it taking longer to figure out how to write the test than is worth the effort"... dealing with a ton of 3D math/rendering it could take a lot of work to prove the output of a function is correct compared with just looking at the rendered graphics.
Code that directly relates to displaying information, generating images and even general UI stuff, is sometimes hard to unit-test.
However that mostly applies only to the very top level of that code. Usually 1-2 method calls below the "surface" is code that's easily unit tested.
For example, it may be nontrivial to test that some information is correctly animated into the dialog box when a validation fails. However, it's very easy to check if the validation would fail for any given input.
Make sure to structure your code in a way that the "non-testable" surface area is well-separated from the test and write extensive tests for the non-surface code.
The point of unit-testing your rendering code is not to demonstrate that the third-party-code does the right thing (that is for integration and regression testing). The point is to demonstrate that your code gives the right instructions to the third-party code. In other words, you only have to control the input of your code layer and verify the output (which would become the input of the renderer).
Of course, you can create a mock version of the renderer which does cheap ASCII graphics or something, and then verify the pseudo-graphics if you want and this makes the test clearer if you want, but it is not strictly necessary for a unit test of your code.
If you cannot break your code into units, it is very hard to unit test.
My guess would be that if you have 3D atomic functions (say translate, rotate,
and project a point) they should be easily testable - create a set of test points and test whether the transformation takes a point to where it should.
If you can only reach the 3D code through a limited API, then it would be hard to test.
Please see Misko Hevery's Testability posts and his testability guide.
I think this is a good question. I wrestle with this all the time, and it seems like there are certain types of code that fit into the unit testing paradigm and other types that do not.
What I consider clearly unit-testable is code that obviously has room for being wrong. Examples:
Code to compute hairy math or linear algebra functions. I always write an auxiliary function to check the answers, and run it once in a while.
Hairy data structure code, with cross-references, guids, back-pointers, and methods for incrementally keeping it consistent. These are really easy to break, so unit tests are good for seeing if they are broken.
On the other hand, in code with low redundancy, if the code compiles it may not be clear what being wrong even means. For example, I do pretty complicated UIs using dynamic dialogs, and it's not clear what to test. All the kinds of things like event handling, layout, and showing / hiding / updating of controls that might make this code error-prone are simply dealt with in a well-verified layer underneath.
The kind of testing I find myself needing more than unit-testing is coverage testing. Have I tried all the possible features and combinations of features? Since this is a very large space and it is prohibitive to write automated tests to cover it, I often find myself doing monte-carlo testing instead, where feature selections are chosen at random and submitted to the system. Then the result is examined in an automatic and / or manual way.
If you can grab the rendered image, you can unit test it.
Simply render some images with the current codebase, see if they "look right" (examining them down to the pixel if you have to), and store them for comparison. Your unit tests could then compare to those stored images and see if the result is the same.
Whether or not this is worth the trouble, that's for you to decide.
Break down the rendering into steps and test by comparing the frame buffer for each step to a known good images.
No matter what you have, it can be broken down to numbers which can be compared. The real trick is when you havbe some random number generator in the algorithm, or some other nondeterministic part.
With things like floating point, you might need to subtract the generated data from the expected data and check that the difference is less than some error threshold.
Well you can't unit test certain kinds of exception code but other than that ...
I've got true unit tests for some code that looks impossible to even attach a test harness to and code that looks like it should be unit testable but isn't.
One of the ways you know your code is not unit testable is when it depends on the physical characteristics of the device it runs on. Another kind of not unit-testable code is direct UI code (and I find a lot of breaks in direct UI code).
I've also got a huge chunk of non unit-testable code that has appropriate integration tests.
Related
I work with audio manipulation, generally using Matlab for prototyping, and C++ for implementation. Recently, I have been reading up on TDD. I have looked over a few basic examples and am quite enthusiastic about the paradigm.
At the moment, I use what I would consider a global 'test-assisted' approach. For this, I write signal processing blocks in C++, and then I make a simple Matlab mex file that can interface with my classes. I subsequently add functionality, checking that the results match up with an equivalent Matlab script as I go. This works ok, but the tests become obsolete quickly as the system evolves. Furtermore, I am testing the whole system, not just units.
It would be nice to use an established TDD framework where I can have a test suite, but I don't see how I can validate the functionality of the processing blocks without tests that are equally as complex as the code under test. How would I generate the reference signals in a C++ test to validate a processing block without the test being a form of self-fulfilling prophecy?
If anyone has experience in this area, or can suggest some methodologies that I could read into, then that would be great.
I think it's great to apply the TDD approach to signal processing (it would have saved me months of time if I knew about it years ago when I was doing signal processing myself). I think the key is to break down your system into the lowest level components that can be independently tested, eg:
FFTs: test signals at known frequencies: DC, Fs/Nfft, Fs/2 and different phases etc. Check the peaks and phase are as you expect, check the normalisation constant is as you expect
peak picking: test that you correctly find maxima/minima
Filters: generate input at known frequencies and check the output amplitude and phase is as expected.
You are unlikely to get exactly the same results out between C++ and Matlab, so you'll have to supply error bounds on some of the tests. TDD is a great way of not only verifying the correctness of the code you have but is really useful when trying out different implementations. For example if you want to replace one FFT implementation with another, there are often slight differences with the way the data is packed, or the normalisation constant that is used. TDD will give you a high degree of confidence the new library is correctly integrated.
I do something similar for heuristics detection, and we have loads and loads of capture files and a framework to be able to load and inject them for testing. Do you have the possibility to capture the reference signals in a file and do the same?
As for my 2 cents regarding TDD, its a great way to develop, but as with most paradigms, you dont always have to follow it to the letter, there are times when you should know how to bend the rules a bit, so as not to write too much throw-away code/tests. I read about one approach that said absolutely no code should be written until a test is developed, which at times can be way too strict.
On the other hand, I always like to say: "If its not tested, its broken" :)
It's OK for the test to be as complex or more complex than the code under development. If you change (update, refactor, bug fix) the code and not the test, the unit test will warn you that something changed and needs to be reviewed (was a bug fix for mode A supposed to change mode B?, etc.)
Furthermore, you can maintain the APIs for the individual compute components, and not just for the entire end-to-end system.
I've only just starting thinking about TDD in the context of signal processing, so I can only add a bit to the previous answers. What I've done is exploit a bit of superposition to test primitives. For example, testing an IIR filter, I independently verified b0, b1, and b2 elements with unit and scaled gains, and then verified a1 and a2 elements that followed easily modeled decays. My test signal was a combination of ramp functions for the numerator and impulse functions for the denominator. I know it's a trivial example, but the process should work for plenty of linear operations. Tests should also exercise unstable regions and show that outputs explode appropriately.
In general, I expect that impulse responses are going to do a lot of the work for me, since many situations will see them reduce to trigonometric functions, which can be independently calculated. Similarly, if your operation has a series expansion, your test function could perform the expansion to a relevant order and compare against against your processing block. It'll be slow, but it should work.
I am writing a fairly complicated machine learning program for my thesis in computer vision. It's working fairly well, but I need to keep trying out new things out and adding new functionality. This is problematic because I sometimes introduce bugs when I am extending the code or trying to simplify an algorithm.
Clearly the correct thing to do is to add unit tests, but it is not clear how to do this. Many components of my program produce a somewhat subjective answer, and I cannot automate sanity checks.
For example, I had some code that approximated a curve with a lower-resolution curve, so that I could do computationally intensive work on the lower-resolution curve. I accidentally introduced a bug into this code, and only found it through a painstaking search when my the results of my entire program got slightly worse.
But, when I tried to write a unit-test for it, it was unclear what I should do. If I make a simple curve that has a clearly correct lower-resolution version, then I'm not really testing out everything that could go wrong. If I make a simple curve and then perturb the points slightly, my code starts producing different answers, even though this particular piece of code really seems to work fine now.
You may not appreciate the irony, but basically what you have there is legacy code: a chunk of software without any unit tests. Naturally you don't know where to begin. So you may find it helpful to read up on handling legacy code.
The definitive thought on this is Michael Feather's book, Working Effectively with Legacy Code. There used to be a helpful summary of that on the ObjectMentor site, but alas the website has gone the way of the company. However WELC has left a legacy in reviews and other articles. Check them out (or just buy the book), although the key lessons are the ones which S.Lott and tvanfosson cover in their replies.
2019 update: I have fixed the link to the WELC summary with a version from the Wayback Machine web archive (thanks #milia).
Also - and despite knowing that answers which comprise mainly links to other sites are low quality answers :) - here is a link to a new (2019 new) Google tutorial on Testing and Debugging ML code. I hope this will be of illumination to future Seekers who stumble across this answer.
"then I'm not really testing out everything that could go wrong."
Correct.
The job of unit tests is not to test everything that could go wrong.
The job of unit tests is to test that what you have does the right thing, given specific inputs and specific expected results. The important part here is the specific visible, external requirements are satisfied by specific test cases. Not that every possible thing that could go wrong is somehow prevented.
Nothing can test everything that could go wrong. You can write a proof, but you'll be hard-pressed to write tests for everything.
Choose your test cases wisely.
Further, the job of unit tests is to test that each small part of the overall application does the right thing -- in isolation.
Your "code that approximated a curve with a lower-resolution curve" for example, probably has several small parts that can be tested as separate units. In isolation. The integrated whole could also be tested to be sure that it works.
Your "computationally intensive work on the lower-resolution curve" for example, probably has several small parts that can be tested as separate units. In isolation.
That point of unit testing is to create small, correct units that are later assembled.
Without seeing your code, it's hard to tell, but I suspect that you are attempting to write tests at too high a level. You might want to think about breaking your methods down into smaller components that are deterministic and testing these. Then test the methods that use these methods by providing mock implementations that return predictable values from the underlying methods (which are probably located on a different object). Then you can write tests that cover the domain of the various methods, ensuring that you have coverage of the full range of possible outcomes. For the small methods you do so by providing values that represent the domain of inputs. For the methods that depend on these, by providing mock implementations that return the range of outcomes from the dependencies.
Your unit tests need to employ some kind of fuzz factor, either by accepting approximations, or using some kind of probabilistic checks.
For example, if you have some function that returns a floating point result, it is almost impossible to write a test that works correctly across all platforms. Your checks would need to perform the approximation.
TEST_ALMOST_EQ(result, 4.0);
Above TEST_ALMOST_EQ might verify that result is between 3.9 and 4.1 (for example).
Alternatively, if your machine learning algorithms are probabilistic, your tests will need to accommodate for it by taking the average of multiple runs and expecting it to be within some range.
x = 0;
for (100 times) {
x += result_probabilistic_test();
}
avg = x/100;
TEST_RANGE(avg, 10.0, 15.0);
Ofcourse, the tests are non-deterministic, so you will need to tune them such that you can get non-flaky tests with a high probability. (E.g., increase the number of trials, or increase the range of error).
You can also use mocks for this (e.g, a mock random number generator for your probabilistic algorithms), and they usually help for deterministically testing specific code paths, but they are a lot of effort to maintain. Ideally, you would use a combination of fuzzy testing and mocks.
HTH.
Generally, for statistical measures you would build in an epsilon for your answer. I.E. the mean square difference of your points would be < 0.01 or some such. Another option is to run several times and if it fails "too often" then you have an issue.
Get an appropriate test dataset (maybe a subset of what your using usually)
Calculate some metric on this dataset (e.g. the accuracy)
Note down the value obtained (cross-validated)
This should give an indication of what to set the threshold for
Of course if can be that when making changes to your code the performance on the dataset will increase a little, but if it ever decreases by large this would be an indication something is going wrong.
I'm trying to learn TDD. I've seen examples and discussions about how it's easy to TDD a coffee vending machine firmware from smallest possible functionality up. These examples are either primitive or very well thought-out, it's hard to tell right away. But here's a real world problem.
Linker.
A linker, at its simplest, reads one object file, does magic, and writes one executable file. I don't think I can simplify it further. I do believe the linker design may be evolved, but I have absolutely no idea where to start. Any ideas on how to approach this?
Well, probably the whole linker is too big a problem for the first unit test. I can envision some rough structure beforehand. What a linker does is:
Represents an object file as a collection of segments. Segments contain code, data, symbol definitions and references, debug information etc.
Builds a reference graph and decides which segments to keep.
Packs remaining segments into a contiguous address space according to some rules.
Relocates references.
My main problem is with bullet 1. 2, 3, and 4 basically take a regular data structure and convert it into a platform-dependent mess based on some configuration. I can design that, and the design looks feasible. But 1, it should pick a platform-dependent mess, in one of the several supported formats, and convert it into a regular structure.
The task looks generic enough. It happens everywhere you need to support multiple input formats, be it image processing, document processing, you name it. Is it possible to TDD ? It seems like either test is too simple and I easily hack it to green, or it's a bit more complex and I need to implement the whole object/image/document format reader which is a lot of code. And there is no middle ground.
First, have a look at "Growing Object Oriented Software Guided By Tests" by Freeman & Pryce.
Now, my attempt to answer a difficult question in a few lines.
TDD does require you to think (i.e. design) what you're going to do. You have to:
Think in small steps. Very small steps.
Write a short test, to prove that the next small piece of behaviour works.
Run the test to show that it fails
Do the simplest thing possible to get the test to pass
Refactor ruthlessly to remove duplication and improve the structure of the code
Run the test(s) again to make sure it all still works
Go back to 1.
An initial idea (design) of how your linker might be structured will guide your initial tests. The tests will enforce a modular design (because each test is only testing a single behaviour, and there should be minimal dependencies on other code you've written).
As you proceed you may find your ideas change. The tests you've already written will allow you to refactor with confidence.
The tests should be simple. It is easy to 'hack' a single test to green. But after each 'hack' you refactor. If you see the need for a new class or algorithm during the refactoring, then write tests to drive out its interface. Make sure that the tests only ever test a single behaviour by keeping your modules loosely coupled (dependency injection, abstract base classes, interfaces, function pointers etc.) and use fakes, stubs and mocks to isolate the code under test from the rest of your system.
Finally use 'customer' tests to ensure that you have delivered functional features.
It's a difficult change in mind-set, but a lot of fun and very rewarding. Honest.
You're right, a linker seems a bit bigger than a 'unit' to me, and TDD does not excuse you from sitting down and thinking about how you're going to break down your problem into units. The Sudoku saga is a good illustration of what goes wrong if you don't think first!
Concentrating on your point 1, you have already described a good collection of units (of functionality) by listing the kinds of things that can appear in segments, and hinting that you need to support multiple formats. Why not start by dealing with a simple case like, say, a file containing just a data segment in the binary format of your development platform? You could simply hard-code the file as a binary array in your test, and then check that it interprets just that correctly. Then pick another simple case, and test for that. Keep going.
Now the magic bit is that pretty soon you'll see repeated structures in your code and in your tests, and because you've got tests you can be quite aggressive about refactoring it away. I suspect this is the bit that you haven't experienced yet, because you say "It seems like either test is too simple and I easily hack it to green, or it's a bit more complex and I need to implement the whole object/image/document format reader which is a lot of code. And there is no middle ground." The point is that you should hack them all to green, but as you're doing that you are also searching out the patterns in your hacks.
I wrote a (very simple) compiler in this fashion, and it mostly worked quite well. For each syntactic construction, I wrote the smallest program that I could think of which used it in some observable way, and had the test compile the program and check that it worked as expected. I used a proper parser generator as you can't plausibly TDD your way into one of them (you need to use a little forethought!) After about three cycles, it became obvious that I was repeating the code to walk the syntax tree, so that was refactored into something like a Visitor.
I also had larger-scale acceptance tests, but in the end I don't think these caught much that the unit tests didn't.
This is all very possible.
A sample from the top of my head is NHAML.
This is ASP.NET ViewEngine that converts plain text to the .NET native code.
You can have a look at source code and see how it is tested.
I guess what I do is come up with layers and blocks and sub-divide to the point where I might be thinking about code and then start writing tests.
I think your tests should be quite simple: it's not the individual tests that are the power of TDD but the sum of the tests.
One of the principles I follow is that a method should fit on a screen - when that's the case, the tests are usually simple enough.
Your design should allow you to mock out lower layers so that you're only testing one layer.
TDD is about specification, not test.
From your simplest spec of a linker, your TDD test has just to check whether an executable file has been created during the linker magic if you feed it with an object file.
Then you write a linker that makes your test succeed, e.g.:
check whether input file is an object file
if so, generate a "Hello World!" executable (note that your spec didn't specify that different object files would produce different executables)
Then you refine your spec and your TDD (these are your four bullets).
As long as you can write a specification you can write TDD test cases.
Recently, I have worked in a project were TDD (Test Driven Development) was used. The project was a web application developed in Java and, although unit-testing web applications may not be trivial, it was possible using mocking (we have used the Mockito framework).
Now I will start a project where I will use C++ to work with image processing (mostly image segmentation) and I'm not sure whether using TDD is a good idea. The problem is that is very hard to tell whether the result of a segmentation is right or not, and the same problem applies to many other image processing algorithms.
So, what I would like to know is if someone here have successfully used TDD with image segmentation algorithms (not necessarily segmentation algorithms).
at a minimum you can use the tests for regression testing. For example, suppose you have 5 test images for a particular segmentation algorithm. You run the 5 images through the code and manually verify the results. The results, when correct, are stored on disk somewhere, and future executions of these tests compare the generated results to the stored results.
that way, if you ever make a breaking change, you'll catch it, but more importantly you only have to go through a (correct) manual test cycle once.
Whenever I do any computer-vision related development TDD is almost standard practice. You have images and something you want to measure. Step one is to hand-label a (large) subset of the images. This gives you test data. The process (for full correctness) is then to divide your test-set in two, a "development set" and a "verification set". You do repeated development cycles until your algorithm is accurate enough when applied to the development set. Then you verify the result on the veriication set (so that you're not overtraining on some weird aspect of your development set.
This is test driven development at its purest.
Note that you're testing two different things when developing heavily algorithm dependent software like this.
The regular bugs you'll get in your software. These can be tested using "normal" TDD techniques
The performance of your algorithm, for which you need a system outlined above.
A program can be bug free according to (1) but not quite according to (2). For example, a very simple image segmentation algorithm says: "the left half of the image is one segment, the right half is another segment. This program can be made bug free according to (1) quite easily. It is another matter entirely wether it satisfies your performance needs. Don't confuse the two aspects, and don't let one interfere with the other.
More specifically, I'd advice you to develop the algorithm first, buggy warts and all, and then use TDD with the algorithm (not the code!) and perhaps other requirements of the software as specification for a separate TDDevelopment process. Doing unit tests for small temporary helper functions deep within some reasonably complex algorithm under heavy development is a waste of time and effort.
TDD in image processing only makes sense for deterministic problems like:
image arithmetic
histogram generation
and so on..
However TDD is not suitable for feature extraction algorithms like:
edge detection
segmentation
corner detection
... since no algorithm can solve this kind of problems for all images perfectly.
I think the best you can do is test the simple, mathematically well-defined building blocks your algorithm consists of, like linear filters, morphological operations, FFT, wavelet transforms etc. These are often tricky enough to implement efficiently and correctly for all border cases so verifying them does make sense.
For an actual algorithm like image segmentation, TDD doesn't make much sense IMHO. I don't even think unit-tests make sense here. Sure, you can write tests, but those will always be extremely fragile. A typical image processing algorithm needs a few parameters that have to be adjusted for the desired results (a process that can't be automated, and can't be done before the algorithm is working). The results of a segmentation algorithm aren't well defined either, but your unit test can only test for some well-defined property. An algorithm can have that property without doing what you want, or the other way round, so your test result isn't very informative. Also, to test the results of a segmentation algorithm you need to write a lot of pretty hard code, while verifying the results visually is pretty easy and you have to do it anyway.
I think in a way it's similar to unit-testing user interfaces: Testing the actual well-defined functionality (e.g. when the user clicks this button, some item is added to this list and this label shows that text...) is relatively easy and can save a lot of work and debugging. But no test in the world will tell you if your UI is usable, understandable or pretty, because these things just aren't well defined.
we had some discussion on the very same "problem" with many remarks mentioned in your comments below those answers here.
We came to the end, that TDD in in computer vision / image processing (concerning the global goal of segmention, detection or sth like that) could be:
get an image/sequence that should be processed and create a test for that image: desired output and a metric to tell how far your result may differ from that "ground truth".
get another image/sequence for a different setting (different lighting, different objects or something like that), where your algorithm fails and write a test for that.
improve your algorithm in a way that it solves all previous tests.
go back to 2.
no idea whether this is applicable, creating the tests will be much more complex than in traditional TDD since it might be hard to define the allowed differences between your ground truth and your algorithm output.
Probably it's better to just use some QualityDrivenDevelopment where your changes just shouldnt make things "worse" (you again have to find a metric for that) than before.
Obiviously you still can use traditional unit testing for deterministic parts of those algorithms, but that's not the real problem of "TDD-in-signal-processing"
The image processing tests that you describe in your question take place at a much higher level than most of the tests that you will write using TDD.
In a true Test Driven Development process you will first write a failing test before adding any new functionality to your software, then write the code that causes the test to pass, rinse and repeat.
This process yields a large library of Unit Tests, sometimes with more LOC of tests than functional code!
Because your analytic algorithms have structured behavior, they would be an excellent match for a TDD approach.
But I think the question you are really asking is "how do I go about executing a suite of Integration Tests against fuzzy image processing software?" You might think I am splitting hairs, but this distinction between Unit Tests and Integration Tests really gets to the heart of what Test Driven Development means. The benefits of the TDD process come from the rich supporting fabric of Unit Tests more than anything else.
In your case I would compare the Integration Test suite to automated performance metrics against a web application. We want to accumulate a historical record of execution times, but we probably don't want to explicitly fail the build for a single poorly performing execution (which might have been affected by network congestion, disk I/O, whatever). You might set some loose tolerances around performance of your test suite and have the Continuous Integration server kick out daily reports that give you a high level overview of the performance of your algorithm.
I'd say TDD is much easier in such an application than in a web one. You have a completely deterministic algorithm you have to test. You don't have to worry about fuzzy stuff like user input and HTML rendering.
Your algorithm consists of a number of steps. Each of these steps can be tested. If you give them fixed, known input, they should yield fixed, known output. So write a test for that. You can't test that the algorithm "is correct" in general, but you can give it data for which you've already precomputed the correct result, so you can verify that it yields the correct output in that case.
I am not really into your problem, so I don't know its hot spots. However, the final result of your algorithm is hopefully deterministic, so you can perform functional testing on it. Of course, you will have to determine a "known good" result. I know of TDD performed on graphic libraries (VTK, to be precise). The comparison is done on the final result image, pixel by pixel. Without going in so much detail, if you have a known good result, you can perform an md5 of the test result and compare it against the md5 of the known-good.
For unit testing, I am pretty sure you can test individual routines. This will force you to have a very fine-grained development style.
Might want to take a look at this paper
If your goal is to optimize an algorithm rather than verifying correctness you need a metric. A good metric would measure the performance criteria underlying in your algorithm. For a segmentation algorithm this could be the sum of standard deviations of pixel data within each segment. Using the metric you can use threshold levels of acceptance or rank versions of the algorithm.
You can use a statistical approach where you have many examples and correct outcomes, and the test runs all of them and evaluates the algorithm on them. It then produces a single number that is the combined success rate of all of them.
This way you are less sensitive to specific failures and your test is more robust.
You can then use a threshold on the success rate to see if the test failed or not.
I'm thinking of the case where the program doesn't really compute anything, it just DOES a lot. Unit testing makes sense to me when you're writing functions which calculate something and you need to check the result, but what if you aren't calculating anything? For example, a program I maintain at work relies on having the user fill out a form, then opening an external program, and automating the external program to do something based on the user input. The process is fairly involved. There's like 3000 lines of code (spread out across multiple functions*), but I can't think of a single thing which it makes sense to unit test.
That's just an example though. Should you even try to unit test "procedural" programs?
*EDIT
Based on your description these are the places I would look to unit test:
Does the form validation work of user input work correctly
Given valid input from the form is the external program called correctly
Feed in user input to the external program and see if you get the right output
From the sounds of your description the real problem is that the code you're working with is not modular. One of the benefits I find with unit testing is that it code that is difficult to test is either not modular enough or has an awkward interface. Try to break the code down into smaller pieces and you'll find places where it makes sense to write unit tests.
I'm not an expert on this but have been confused for a while for the same reason. Somehow the applications I'm doing just don't fit to the examples given for UNIT testing (very asynchronous and random depending on heavy user interaction)
I realized recently (and please let me know if I'm wrong) that it doesn't make sense to make a sort of global test but rather a myriad of small tests for each component. The easiest is to build the test in the same time or even before creating the actual procedures.
Do you have 3000 lines of code in a single procedure/method? If so, then you probably need to refactor your code into smaller, more understandable pieces to make it maintainable. When you do this, you'll have those parts that you can and should unit test. If not, then you already have those pieces -- the individual procedures/methods that are called by your main program.
Even without unit tests, though, you should still write tests for the code to make sure that you are providing the correct inputs to the external program and testing that you handle the outputs from the program correctly under both normal and exceptional conditions. Techniques used in unit testing -- like mocking -- can be used in these integration tests to ensure that your program is operating correctly without involving the external resource.
An interesting "cut point" for your application is you say "the user fills out a form." If you want to test, you should refactor your code to construct an explicit representation of that form as a data structure. Then you can start collecting forms and testing that the system responds appropriately to each form.
It may be that the actions taken by your system are not observable until something hits the file system. Here are a couple of ideas:
Set up something like a git repository for the initial state of the file system, run a form, and look at the output of git diff. It's likely this is going to feel more like regression testing than unit testing.
Create a new module whose only purpose is to make your program's actions observable. This can be as simple as writing relevant text to a log file or as complex as you like. If necessary, you can use conditional compilation or linking to ensure this module does something only when the system is under test. This is closer to traditional unit testing as you can now write tests that say upon receiving form A, the system should take sequence of actions B. Obviously you have to decide what actions should be observed to form a reasonable test.
I suspect you'll find yourself migrating toward something that looks more like regression testing than unit testing per se. That's not necessarily bad. Don't overlook code coverage!
(A final parenthetical remark: in the bad old days of interactive console applications, Don Libes created a tool called Expect, which was enormously helpful in allowing you to script a program that interacted like a user. In my opinion we desperately need something similar for interacting with web pages. I think I'll post a question about this :-)
You don't necessarily have to implement automated tests that test individual methods or components. You could implement an automated unit test that simulates a user interacting with your application, and test that your application responds in the correct way.
I assume you are manually testing your application currently, if so then think about how you could automate that and work from there. Over time you should be able to break your tests into progressively smaller chunks that test smaller sections of code. Any sort of automated testing is usually a lot better than nothing.
Most programs (regardless of the language paradigm) can be broken into atomic units which take input and provide output. As the other responders have mentioned, look into refactoring the program and breaking it down into smaller pieces. When testing, focus less on the end-to-end functionality and more on the individual steps in which data is processed.
Also, a unit doesn't necessarily need to be an individual function (though this is often the case). A unit is a segment of functionality which can be tested using inputs and measuring outputs. I've seen this when using JUnit to test Java APIs. Individual methods might not necessarily provide the granularity I need for testing, though a series of method calls will. Therefore, the functionality I regard as a "unit" is a little greater than a single method.
You should at least refactor out the stuff that looks like it might be a problem and unit test that. But as a rule, a function shouldn't be that long. You might find something that is unit test worthy once you start refactoring
Good object mentor article on TDD
As a few have answered before, there are a few ways you can test what you have outlined.
First the form input, can be tested in a few ways.
What happens if invalid data is inputted, valid data, etc.
Then each of the function can be tested to see if the functions when supplied with various forms of correct and incorrect data react in the proper manner.
Next you can mock the application that are being called so that you can make sure that your application send and process data to the external programs correctly. Don't for get to make sure your program deals with unexpected data from the external program as well.
Usually, the way I figure out how I want to write tests for a program I have been assigned to maintain, is to see what I am do manually to test the program. Then try and figure how to automate as much of it as possible. Also, don't restrict your testing tools just to the programming language you are writing the code in.
I think a wave of testing paranoia is spreading :) Its good to examine things to see if tests would make sense, sometimes the answer is going to be no.
The only thing that I would test is making sure that bogus form input is handled correctly.. I really don't see where else an automated test would help. I think you'd want the test to be non invasive (i.e. no record is actually saved during testing), so that might rule out the other few possibilities.
If you can't test something how do you know that it works? A key to software design is that the code should be testable. That may make the actual writing of the software more difficult, but it pays off in easier maintenance later.