How should I unit test a code-generator?

How should I unit test a code-generator? - c++

This is a difficult and open-ended question I know, but I thought I'd throw it to the floor and see if anyone had any interesting suggestions.
I have developed a code-generator that takes our python interface to our C++ code (generated via SWIG) and generates code needed to expose this as WebServices. When I developed this code I did it using TDD, but I've found my tests to be brittle as hell. Because each test essentially wanted to verify that for a given bit of input code (which happens to be a C++ header) I'd get a given bit of outputted code I wrote a small engine that reads test definitions from XML input files and generates test cases from these expectations.
The problem is I dread going in to modify the code at all. That and the fact that the unit tests themselves are a: complex, and b: brittle.
So I'm trying to think of alternative approaches to this problem, and it strikes me I'm perhaps tackling it the wrong way. Maybe I need to focus more on the outcome, IE: does the code I generate actually run and do what I want it to, rather than, does the code look the way I want it to.
Has anyone got any experiences of something similar to this they would care to share?

I started writing up a summary of my experience with my own code generator, then went back and re-read your question and found you had already touched upon the same issues yourself, focus on the execution results instead of the code layout/look.
Problem is, this is hard to test, the generated code might not be suited to actually run in the environment of the unit test system, and how do you encode the expected results?
I've found that you need to break down the code generator into smaller pieces and unit test those. Unit testing a full code generator is more like integration testing than unit testing if you ask me.

Recall that "unit testing" is only one kind of testing. You should be able to unit test the internal pieces of your code generator. What you're really looking at here is system level testing (a.k.a. regression testing). It's not just semantics... there are different mindsets, approaches, expectations, etc. It's certainly more work, but you probably need to bite the bullet and set up an end-to-end regression test suite: fixed C++ files -> SWIG interfaces -> python modules -> known output. You really want to check the known input (fixed C++ code) against expected output (what comes out of the final Python program). Checking the code generator results directly would be like diffing object files...

Yes, results are the ONLY thing that matters. The real chore is writing a framework that allows your generated code to run independently... spend your time there.

If you are running on *nux you might consider dumping the unittest framework in favor of a bash script or makefile. on windows you might consider building a shell app/function that runs the generator and then uses the code (as another process) and unittest that.
A third option would be to generate the code and then build an app from it that includes nothing but a unittest. Again you would need a shell script or whatnot to run this for each input. As to how to encode the expected behavior, it occurs to me that it could be done in much the same way as you would for the C++ code just using the generated interface rather than the C++ one.

Just wanted to point out that you can still achieve fine-grained testing while verifying the results: you can test individual chunks of code by nesting them inside some setup and verification code:
int x = 0;
GENERATED_CODE
assert(x == 100);
Provided you have your generated code assembled from smaller chunks, and the chunks do not change frequently, you can exercise more conditions and test a little better, and hopefully avoid having all your tests break when you change specifics of one chunk.

Unit testing is just that testing a specific unit. So if you are writing a specification for class A, it is ideal if class A does not have the real concrete versions of class B and C.
Ok I noticed afterwards the tag for this question includes C++ / Python, but the principles are the same:
public class A : InterfaceA
{
InterfaceB b;
InterfaceC c;
public A(InterfaceB b, InterfaceC c) {
this._b = b;
this._c = c; }
public string SomeOperation(string input)
{
return this._b.SomeOtherOperation(input)
+ this._c.EvenAnotherOperation(input);
}
}
Because the above System A injects interfaces to systems B and C, you can unit test just system A, without having real functionality being executed by any other system. This is unit testing.
Here is a clever manner for approaching a System from creation to completion, with a different When specification for each piece of behaviour:
public class When_system_A_has_some_operation_called_with_valid_input : SystemASpecification
{
private string _actualString;
private string _expectedString;
private string _input;
private string _returnB;
private string _returnC;
[It]
public void Should_return_the_expected_string()
{
_actualString.Should().Be.EqualTo(this._expectedString);
}
public override void GivenThat()
{
var randomGenerator = new RandomGenerator();
this._input = randomGenerator.Generate<string>();
this._returnB = randomGenerator.Generate<string>();
this._returnC = randomGenerator.Generate<string>();
Dep<InterfaceB>().Stub(b => b.SomeOtherOperation(_input))
.Return(this._returnB);
Dep<InterfaceC>().Stub(c => c.EvenAnotherOperation(_input))
.Return(this._returnC);
this._expectedString = this._returnB + this._returnC;
}
public override void WhenIRun()
{
this._actualString = Sut.SomeOperation(this._input);
}
}
So in conclusion, a single unit / specification can have multiple behaviours, and the specification grows as you develop the unit / system; and if your system under test depends on other concrete systems within it, watch out.

My recommendation would be to figure out a set of known input-output results, such as some simpler cases that you already have in place, and unit test the code that is produced. It's entirely possible that as you change the generator that the exact string that is produced may be slightly different... but what you really care is whether it is interpreted in the same way. Thus, if you test the results as you would test that code if it were your feature, you will find out if it succeeds in the ways you want.
Basically, what you really want to know is whether your generator will produce what you expect without physically testing every possible combination (also: impossible). By ensuring that your generator is consistent in the ways you expect, you can feel better that the generator will succeed in ever-more-complex situations.
In this way, you can also build up a suite of regression tests (unit tests that need to keep working correctly). This will help you make sure that changes to your generator aren't breaking other forms of code. When you encounter a bug that your unit tests didn't catch, you may want to include it to prevent similar breakage.

I find that you need to test what you're generating more than how you generate it.
In my case, the program generates many types of code (C#, HTML, SCSS, JS, etc.) that compile into a web application. The best way I've found to reduce regression bugs overall is to test the web application itself, not by testing the generator.
Don't get me wrong, there are still unit tests checking out some of the generator code, but our biggest bang for our buck has been UI tests on the generated app itself.
Since we're generating it we also generate a nice abstraction in JS we can use to programatically test the app. We followed some ideas outlined here: http://code.tutsplus.com/articles/maintainable-automated-ui-tests--net-35089
The great part is that it really tests your system end-to-end, from code generation out to what you're actually generating. Once a test fails, its easy to track it back to where the generator broke.
It's pretty sweet.
Good luck!

Related

Program written in generated code based on unit tests

As I was doing test driven development I pondered whether a hypothetical program could be completely developed by generated code based on tests. i.e. is there an ability to have a generator that creates the code specifically to pass tests. Would the future of programming languages just be to write tests?

I think this would be a tough one as, at least for the initial generations of such technology, developers would be very skeptical of generated code's correctness. So human review would have to be involved as well.
As a simple illustration of what I mean, suppose you write 10 tests for a function, with sample inputs and expected outputs covering every scenario you can think of. A program could trivially generate code which passed all of these tests with nothing more than a rudimentary switch statement (your ten inputs matched to their expected outputs). This code would obviously not be correct, but it would take a human to see that.
That's just a simple example. It isn't hard to imagine more sophisticated programs which might not generate a switch statement but still produce solutions that aren't actually correct, and which could be wrong in much more subtle ways. Hence my suggestion that any technology along these lines would be met with a deep level of skepticism, at least at first.

If code can be generated completely, then the basis of the generator would have to be a specification that exactly describes the code. This generator would then be something like a compiler that cross compiles one language into an other.
Tests are not such a language. They only assert that a specific aspect of the code functionality is valid and unchanged. By doing so they scaffold the code so that it does not break, even when it is refactored.
But how would I compare these two ways of development?
1) If the generator works correctly, then the specification is always transferred into correct code. I postulate that this code is tested by design and needs no additional test. Better TDD the generator than the generated code.
2) Whether you have a specification that leads to generated code or specifications expressed as tests that ensure that code works is quite equivalent in my eyes.
3) You can combine both ways of development. Generate a program framework with a tested generator from a specification and then enrich the generated code by using TDD. Attention: You then have two different development cycles running in one project. That means, you have to ensure that you always can regenerate the generated code when specifications change und that your additional code still correctly fits into the generated code.
Just one small example: Imagine a tool that can generate code from an UML class diagram. This could be done in an way that you can develop the methods with TDD, but the structure of the classes is defined in UML and you would not need to test this again.

While it's possible sometime in the future, simple tests can be used to generate code:
assertEquals(someclass.get_value(), true)
but getting the correct output from a black-box integration test is what I would guess is an NP-complete problem:
assertEquals(someclass.do_something(1), file_content(/some/file))
assertEquals(someclass.do_something(2), file_content(/some/file))
assertEquals(someclass.do_something(2), file_content(/some/file2))
assertEquals(someclass.do_something(3), file_content(/some/file2))
Does this mean that the resulting code will always write to /some/file? Does it mean that the resulting code should always write to /some/file2? Either could be true. What if it needs to only do the minimal set to get the tests to pass? Without knowing the context and writing very exact and bounding tests, no code could figure out (at this point in time) what the test author intended.

Unit Testing : what to test / what not to test?

Since a few days ago I've started to feel interested in Unit Testing and TDD in C# and VS2010. I've read blog posts, watched youtube tutorials, and plenty more stuff that explains why TDD and Unit Testing are so good for your code, and how to do it.
But the biggest problem I find is, that I don't know what to check in my tests and what not to check.
I understand that I should check all the logical operations, problems with references and dependencies, but for example, should I create an unit test for a string formatting that's supossed to be user-input? Or is it just wasting my time while I just can check it in the actual code?
Is there any guide to clarify this problem?

In TDD every line of code must be justified by a failing test-case written before the code.
This means that you cannot develop any code without a test-case. If you have a line of code (condition, branch, assignment, expression, constant, etc.) that can be modified or deleted without causing any test to fail, it means this line of code is useless and should be deleted (or you have a missing test to support its existence).
That is a bit extreme, but this is how TDD works. That being said if you have a piece of code and you are wondering whether it should be tested or not, you are not doing TDD correctly. But if you have a string formatting routine or variable incrementation or whatever small piece of code out there, there must be a test case supporting it.
UPDATE (use-case suggested by Ed.):
Like for example, adding an object to a list and creating a test to see if it is really inside or there is a duplicate when the list shouldn't allow them.
Here is a counterexample, you would be surprised how hard it is to spot copy-paste errors and how common they are:
private Set<String> inclusions = new HashSet<String>();
private Set<String> exclusions = new HashSet<String>();
public void include(String item) {
inclusions.add(item);
}
public void exclude(String item) {
inclusions.add(item);
}
On the other hand testing include() and exclude() methods alone is an overkill because they do not represent any use-cases by themselves. However, they are probably part of some business use-case, you should test instead.
Obviously you shouldn't test whether x in x = 7 is really 7 after assignment. Also testing generated getters/setters is an overkill. But it is the easiest code that often breaks. All too often due to copy&paste errors or typos (especially in dynamic languages).
See also:
Mutation testing

Your first few TDD projects are going to probably result in worse design/redesign and take longer to complete as you are learning (at least in my experience). This is why you shouldn't jump into using TDD on a large critical project.
My advice is to use "pure" TDD (acceptance/unit test everything test-first) on a few small projects (100-10,000 LOC). Either do the side projects on your own or if you don't code in your free time, use TDD on small internal utility programs for your job.
After you do "pure" TDD on about 6-12 projects, you will start to understand how TDD affects design and learn how to design for testability. Once you know how to design for testability, you will need to TDD less and maximize the ROI of unit, regression, acceptance, etc. tests rather than test everything up front.
For me, TDD is more of teaching method for good code design than a practical methodology. However, I still TDD logic code and unit test instead of debug.

There is no simple answer to this question. There is the law of diminishing returns in action, so achieving perfect coverage is seldom worth it. Knowing what to test is a thing of experience, not rules. It’s best to consciously evaluate the process as you go. Did something break? Was it feasible to test? If not, is it possible to rewrite the code to make it more testable? Is it worth it to always test for such cases in the future?
If you split your code into models, views and controllers, you’ll find that most of the critical code is in the models, and those should be fairly testable. (That’s one of the main points of MVC.) If a piece of code is critical, I test it, even if it means that I would have to rewrite it to make it more testable. If a piece of code is easy to get wrong or get broken by future updates, it gets a test. I seldom test controllers and views, as it’s not proving worth the trouble for me.

The way I see it all of your code falls into one of three buckets:
Code that is easy to test: This includes your own deterministic public methods.
Code that is difficult to test: This includes GUI, non-deterministic methods, private methods, and methods with complex setup.
Code that you don't want to test: This includes 3rd party code, and code that is difficult to test and not worth the effort.
Of the three, you should focus on testing the easy code. The difficult to test code should be refactored so that into two parts: code that you don't want to test and easy code. And of course, you should test the refactored easy code.

I think you should only unit test entry points to behavior of the system. This include public methods, public accessors and public fields, but not constants (constant fields, enums, methods, etc.). It also includes any code which directly deals with IO, I explain why further below.
My reasoning is as follows:
Everything that's public is basically an entry point to a behavior of the system. A unit test should therefore be written that guarantees that the expected behavior of that entry point works as required. You shouldn't test all possible ways of calling the entry point, only the ones that you explicitly require. Your unit tests are therefore also the specs of what behavior your system supports and your documentation of how to use it.
Things that are not public can basically be deleted/re-factored at will with no impact to the behavior of the system. If you were to test those, you'd create a hard dependency from your unit test to that code, which would prevent you from doing refactoring on it. That's why you should not test anything else but public methods, fields and accessors.
Constants by design are not behavior, but axioms. A unit test that verifies a constant is itself a constant, so it would only be duplicated code and useless effort to write a test for constants.
So to answer your specific example:
should I create an unit test for a string formatting that's supossed
to be user-input?
Yes, absolutely. All methods which receive or send external input/output (which can be summed up as receiving IO), should be unit tested. This is probably the only case where I'd say non-public things that receive IO should also be unit tested. That's because I consider IO to be a public entry. Anything that's an entry point to an external actor I consider public.
So unit test public methods, public fields, public accessors, even when those are static constructs and also unit test anything which receives or sends data from an external actor, be it a user, a database, a protocol, etc.
NOTE: You can write temporary unit tests on non public things as a way for you to help make sure your implementation works. This is more of a way to help you figure out how to implement it properly, and to make sure your implementation works as you intend. After you've tested that it works though, you should delete the unit test or disable it from your test suite.

Kent Beck, in Extreme Programming Explained, said you only need to test the things that need to work in production.
That's a brusque way of encapsulating both test-driven development, where every change in production code is supported by a test that fails when the change is not present; and You Ain't Gonna Need It, which says there's no value in creating general-purpose classes for applications that only deal with a couple of specific cases.

I think you have to change your point of view.
In a pure form TDD requires the red-green-refactor workflow:
write test (it must fail) RED
write code to satisfy test GREEN
refactor your code
So the question "What I have to test?" has a response like: "You have to write a test that correspond to a feature or a particular requirements".
In this way you get must code coverage and also a better code design (remember that TDD stands also for Test Driven "Design").
Generally speaking you have to test ALL public method/interfaces.

should I create an unit test for a string formatting that's supossed
to be user-input? Or is it just wasting my time while I just can check
it in the actual code?
Not sure I understand what you mean, but the tests you write in TDD are supposed to test your production code. They aren't tests that check user input.
To put it another way, there can be TDD unit tests that test the user input validation code, but there can't be TDD unit tests that validate the user input itself.

Unit testing for a compiler output

As part of a university project, we have to write a compiler for a toy language. In order to do some testing for this, I was considering how best to go about writing something like unit tests. As the compiler is being written in haskell, Hunit and quickcheck are both available, but perhaps not quite appropriate.
How can we do any kind of non-manual testing?
The only idea i've had is effectively compiling to haskell too, seeing what the output is, and using some shell script to compare this to the output of the compiled program - this is quite a bit of work, and isn't too elegant either.
The unit testing is to help us, and isn't part of assessed work itself.

This really depends on what parts of the compiler you are writing. It is nice if you can keep phases distinct to help isolate problems, but, in any phase, and even at the integration level, it is perfectly reasonable to have unit tests that consist of pairs of source code and hand-compiled code. You can start with the simplest legal programs possible, and ensure that your compiler outputs the same thing that you would if compiling by hand.
As complexity increases, and hand-compiling becomes unwieldy, it is helpful for the compiler to keep some kind of log of what it has done. Then you can consult this log to determine whether or not specific transformations or optimizations fired for a given source program.
Depending on your language, you might consider a generator of random programs from a collection of program fragments (in the QuickCheck vein). This generator can test your compiler's stability, and ability to deal with potentially unforeseen inputs.

The unit tests shall test small piece of code, typically one class or one function. The lexical and semantic analysis will each have their unit tests. The Intermediate Represetation generator will also have its own tests.
A unit test covers a simple test case: it invokes the function to be unit tested in a controlled environment and verify (assert) the result of the function execution. A unit test usually test one behavior only and has the following structure, called AAA :
Arrange: create the environment the function will be called in
Act: invoke the function
Assert: verify the result

Have a look at shelltestrunner. Here are some example tests. It is also being used in this compiler project.

One options is to the approach this guy is doing to test real compilers: get together with as many people as you can talk into it and each of you compiles and runs the same set of programs and then compare the outputs. Be sure to add every test case you use as more inputs makes it more effective. A little fun with automation and source control and you can make it fairly easy to maintain.
Be sure to get it OKed by the prof first but as you will only be sharing test cases and outputs I don't see where he will have much room to object.

Testing becomes more difficult once the output of your program goes to the console (such as standard output). Then you have to resort to some external tool, like grep or expect to check the output.
Keep the return values from your functions in data structures for as long as possible. If the output of your compiler is, say, assembly code, build a string in memory (or a list of strings) and output it at the last possible moment. That way you can test the contents of the strings more directly and quickly.

How do i really unit test code?

I was reading the Joel Test 2010 and it reminded me of an issue i had with unit testing.
How do i really unit test something? I dont unit test functions? only full classes? What if i have 15 classes that are <20lines. Should i write a 35line unit test for each class bringing 15*20 lines to 15*(20+35) lines (that's from 300 to 825, nearly 3x more code).
If a class is used by only two other classes in the module, should i unit test it or would the test against the other two classes suffice? what if they are all < 30lines of code should i bother?
If i write code to dump data and i never need to read it such as another app is used. The other app isnt command line or it is but no way to verify if the data is good. Do i still need to unit test it?
What if the app is a utility and the total is <500lines of code. Or is used that week and will be used in the future but always need to be reconfiguration because it is meant for a quick batch process and each project will require tweaks because the desire output is unchanged. (i'm trying to say theres no way around it, for valid reasons it will always be tweaked) do i unit test it and if so how? (maybe we dont care if we break a feature used in the past but not in the present or future).
etc.
I think this should be a wiki. Maybe people would like to say an exactly of what they should unit test (or should not)? maybe links to books are good. I tried one but it never clarified what should be unit tested, just the problems of writing unit testing and solutions.
Also if classes are meant to only be in that project (by design, spec or whatever other reason) and the class isnt useful alone (lets say it generates the html using data that returns html ready comments) do i really need to test it? say by checking if all public functions allow null comment objects when my project doesnt ever use null comment. Its those kind of things that make me wonder if i am unit testing the wrong code. Also tons of classes are throwaway when the project. Its the borderline throwaway or not very useful alone code which bothers me.

Here's what I'm hearing, whether you meant it this way or not: a whole litany of issues and excuses why unit testing might not be applicable to your code. In other words: "I don't see what I'll be getting out of unit tests, and they're a lot of bother to write; maybe they're not for me?"
You know what? You may be right. Unit tests are not a panacea. There are huge, wide swaths of testing that unit testing can't cover.
I think, though, that you're misestimating the cost of maintenance, and what things can break in your code. So here are my thoughts:
Should I test small classes? Yes, if there are things in that class that can possibly break.
Should I test functions? Yes, if there are things in this function that can possibly break. Why wouldn't you? Or is your concern over whether it's considered a unit or not? That's just quibbling over names, and shouldn't have any bearing on whether you should write unit tests for it! But it's common in my experience to see a method or function described as a unit under test.
Should I unit test a class if it's used by two other classes? Yes, if there's anything that can possibly break in that class. Should I test it separately? The advantage of doing so is to be able to isolate breakages straight down to the shared class, instead of hunting through the using classes to see if it was they that broke or one of their dependencies.
Should I test data output from my class if another program will read it? Hell yes, especially if that other program is a 3rd-party one! This is a great application of unit tests (or perhaps system tests, depending on the isolation involved in the test): to prove to yourself that the data you output is precisely what you think you should have output. I think you'll find that has the power to simplify support calls immeasurably. (Though please note it's not a substitute for good acceptance testing on that customer's end.)
Should I test throwaway code? Possibly. Will pursuing a TDD strategy get your throwaway code out the door faster? It might. Will having solid unit-tested chunks that you can adapt to new constraints reduce the need to throw code away? Perhaps.
Should I test code that's constantly changing? Yes. Just make sure all applicable tests are brought up to date and pass! Constantly changing code can be particularly susceptible to errors, after all, and enabling safe change is another of unit testing's great benefits. Plus, it probably puts a burden on your invariant code to be as robust as possible, to enable this velocity of change. And you know how you can convince yourself whether a piece of code is robust...
Should I test features that are no longer needed? No, you can remove the test, and probably the code as well (testing to ensure you didn't break anything in the process, of course!). Don't leave unit test rot around, especially if the test no longer works or runs, or people in your org will move away from unit tests and you'll lose the benefit. I've seen this happen. It's not pretty.
Should I test code that doesn't get used by my project, even if it was written in the context of my project? Depends on what the deliverable of your project is, and what the priorities of your project are. But are you sure nobody outside of your project will use it? If they won't, and you aren't, perhaps it's just dead code, in which case see above. From my point of view, I wouldn't feel I'd done a complete job with a class if my testing didn't cover all its important functionality, whether the project used all that functionality or not. I like classes that feel complete, but I keep an eye towards not overengineering a bunch of stuff I don't need. If I put something in a class, then, I intend for it to be used, and will therefore want to make sure it works. It's an issue of personal quality and satisfaction to me.

Don't get fixated on counting lines of code. Write as much test code as you need to convince yourself that every key piece of functionality is being thoroughly tested. As an extreme example, the SQLite project has a tests:source-code ratio of more than 600:1. I use the term "extreme" in a good sense here; the ludicrous amount of testing that goes on is possibly the predominant reason that SQLite has taken over the world.

How can you do all those calculations? Ideally you should never be in a situation where you could count the lines of your completed class and then start writting the unit test from scratch. Those 2 types of code (real code and test code) should be developed and evolved together, and the only LOC metric that should really worry you in the end is 0 LOCs for test code.

Relative LOC counts for code and tests are pointless. What matters more is test coverage. What matters most is finding the bugs.
When I'm writing unit tests, I tend to focus my efforts on testing complicated code that is more likely to contain bugs. Simple stuff (e.g. simple getter and setter methods) is unlikely to contain bugs, and can be tested indirectly by higher-level unit tests.

Some Time ago, i had The same question you have posted in mind. I studied a lot of articles, Tutorials, books and so on... Although These resources give me a good starting point, i still was insecure about how To apply efficiently Unit Testing code. After coming across xUnit Test Patterns: Refactoring Test Code and put it in my shelf for about one year (You know, we have a lot of stuffs To study), it gives me what i need To apply efficiently Unit Testing code. With a lot of useful patterns (and advices), you will see how you can become an Unit Testing coder. Topics as
Test strategy patterns
Basic patterns
Fixture setup patterns
Result verification patterns
Test double patterns
Test organization patterns
Database patterns
Value patterns
And so on...
I will show you, for instance, derived value pattern
A derived input is often employed when we need to test a method that takes a complex object as an argument. For example, thorough input validation testing requires we exercise the method with each of the attributes of the object set to one or more possible invalid values. Because The first rejected value could cause Termination of The method, we must verify each bad attribute in a separate call. We can instantiate The invalid object easily by first creating a valid object and then replacing one of its attributes with a invalid value.
A Test organization pattern which is related To your question (Testcase class per feature)
As The number of Test methods grows, we need To decide on which Testcase class To put each Test method... Using a Testcase class per feature gives us a systematic way To break up a large Testcase class into several smaller ones without having To change out Test methods.
But before reading
(source: xunitpatterns.com)
My advice: read carefully

You seem to be concerned that there could be more test-code than the code-under-test.
I think the ratios could we be higher than you say. I would expect any serious test to exercise a wide range of inputs. So your 20 line class might well have 200 lines of test code.
I do not see that as a problem. The interesting thing for me is that writing tests doesn't seem to slow me down. Rather it makes me focus on the code as I write it.
So, yes test everything. Try not to think of testing as a chore.

I am part of a team that have just started adding test code to our existing, and rather old, code base.
I use 'test' here because I feel that it can be very vague as to weather it is a unit test, or a system test, or an integration test, or whatever. The differences between the terms have large grey areas, and don't add a lot of value.
Because we live in the real world, we don't have time to add test code for all of the existing functionality. We still have Dave the test guy, who finds most bugs. Instead, as we develop we write tests. You know how you run your code before you tell your boss that it works? Well, use a unit framework (we use Junit) to do those runs. And just keep them all, rather than deleting them. Whatever you normally do to convince yourself that it works. Do that.
If it is easy to write the code, do it. If not, leave it to Dave until you think of a good way to do automate it, or until you get that spare time between projects where 'they' are trying to decide what to put into the next release.

for java u can use junit
JUnit
JUnit is a simple framework to write repeatable tests. It is an instance of the xUnit architecture for unit testing frameworks.
* Getting Started
* Documentation
* JUnit related sites/projects
* Mailing Lists
* Get Involved
Getting Started
To get started with unit testing and JUnit read the article: JUnit Cookbook.
This article describes basic test writing using JUnit 4.
You find additional samples in the org.junit.samples package:
* SimpleTest.java - some simple test cases
* VectorTest.java - test cases for java.util.Vector
JUnit 4.x only comes with a textual TestRunner. For graphical feedback, most major IDE's support JUnit 4. If necessary, you can run JUnit 4 tests in a JUnit 3 environment by adding the following method to each test class:
public static Test suite() {
return new JUnit4TestAdapter(ThisClass.class);
}
Documentation
JUnit Cookbook
A cookbook for implementing tests with JUnit.
Javadoc
API documentation generated with javadoc.
Frequently asked questions
Some frequently asked questions about using JUnit.
Release notes
Latest JUnit release notes
License
The terms of the common public license used for JUnit.
The following documents still describe JUnit 3.8.
The JUnit 3.8 version of this homepage
Test Infected - Programmers Love Writing Tests
An article demonstrating the development process with JUnit.
JUnit - A cooks tour
JUnit Related Projects/Sites
* junit.org - a site for software developers using JUnit. It provides instructions for how to integrate JUnit with development tools like JBuilder and VisualAge/Java. As well as articles about and extensions to JUnit.
* XProgramming.com - various implementations of the xUnit testing framework architecture.
Mailing Lists
There are three junit mailing lists:
* JUnit announce: junit-announce#lists.sourceforge.net Archives/Subscribe/Unsubscribe
* JUnit users list: junit#yahoogroups.com Archives/Subscribe/Unsubscribe
* JUnit developer list: junit-devel#lists.sourceforge.net Archives/Subscribe/Unsubscribe
Get Involved
JUnit celebrates programmers testing their own software. As a result bugs, patches, and feature requests which include JUnit TestCases have a better chance of being addressed than those without.
JUnit source code is now hosted on GitHub.

One possibility is to reduce the 'test code' to a language that describes your tests, and an interpreter to run the tests. Teams I have been a part of have used this to wonderful ends, allowing us to write significantly more tests than the "lines of code" would have indicated.
This allowed our tests to be written much more quickly and greatly increased the test legibility.

I am going to answer what I believe are the main points of your question. First, how much test-code should you write? Well, Test-Driven Development can be of some help here. I do not use it as strictly as it is proposed in theory, but I find that writing a test first often helps me to understand the problem I want to solve much better. Also, it will usually lead to good test-coverage.
Secondly, which classes should you test? Again, TDD (or more precisely some of the principles behind it) can be of help. If you develop your system top down and write your tests first, you will have tests for the outer class when writing the inner class. These tests should fail if the inner class has bugs.
TDD is also tightly coupled with the idea of Design for Testability.
My answer is not intended to solve all your problems, but to give you some ideas.

I think it's impossible to write a comprehensive guide of exactly what you should and shouldn't unit test. There are simply too many permutations and types of objects, classes, and functions, to be able to cover them all.
I suggest applying personal responsibility to the testing, and determining the answer yourself. It's your code, and you're responsible for it working. If it breaks, you have to pay the consequences of fixing the code, repairing the data, taking responsibility for the lost revenue, and apologizing to the people whose application broke while they were trying to use it. Bottom line - your code should never break. So what do you have to do to ensure this?
Sometimes unit testing can work well to help you test out all of the specific methods in a library. Sometimes unit testing is just busy-work, because you can tell the code is working based on your use of the code during higher-level testing. You're the developer, you're responsible for making sure the code never breaks - what do you think is the best way to achieve that?
If you think unit testing is a waste of time in a specific circumstance - it probably is. If you've tested the code in all of the application use-case scenarios and they all work, the code is probably good.
If anything is happening in the code that you don't understand - even if the end result is acceptable - then you need to do some more testing to make sure there's nothing you don't understand.
To me, this seems like common sense.

Unit testing is mostly for testing your units from aspect of functionality. You can test and see if a specific input come, will we receive the expected value or will we throw the right exception?
Unit tests are very useful. I recommend you to write down these tests. However, not everything is required to be tested. For example, you don't need to test simple getters and setters.
If you want to write your unit tests in Java via Eclipse, please look at "How To Write Java Unit Tests". I hope it helps.

What to test when writing Unit Tests?

I want to begin unit testing our application, because I believe that this is the first step to developing a good relationship with testing and will allow me to branch into other forms of testing, most interesting BDD with Cucumber.
We currently generate all of our Base classes using Codesmith which are based entirely on the tables in a database. I am curious as to the benefits of generating test cases with these Base classes? Is this poor testing practices?
This leads me to the ultimate question of my post. What do we test when using Unit Tests?
Do we test the examples we know we want out? or do we test the examples we do not want?
Their can be methods that have multiple ways of Failing and multiple ways of Success, how do we know when to stop?
Take a Summing function for example. Give it 1,2 and expect 3 in the only unit test.. how do we know that 5,6 isn't coming back 35?
Question Recap
Generating unit tests (Good/Bad)
What/How much do we test?

Start with your requirements and write tests that test the expected behavior. From that point on, how many other scenarios you test can be driven by your schedule, or maybe by your recognizing non-success scenarios that are particularly high-risk.
You might consider writing non-success tests only in response to defects you (or your users) discover (the idea being that you write a test that tests the defect fix before you actually fix the defect, so that your test will fail if that defect is re-introduced into your code in future development).

The point of unit tests is to give you confidence (but only in special cases does it give you certainty) that the actual behavior of your public methods matches the expected behavior. Thus, if you have a class Adder
class Adder { public int Add(int x, int y) { return x + y; } }
and a corresponding unit test
[Test]
public void Add_returns_that_one_plus_two_is_three() {
Adder a = new Adder();
int result = a.Add(1, 2);
Assert.AreEqual(3, result);
}
then this gives you some (but not 100%) confidence that the method under test is behaving appropriately. It also gives you some defense against breaking the code upon refactoring.
What do we test when using Unit Tests?
The actual behavior of your public methods against the expected (or specified) behavior.
Do we test the examples we know we want out?
Yes, one way to gain confidence in the correctness of your method is to take some input with known expected output, execute the public method on the input and compare the acutal output to the expected output.

What to test: Everything that has ever gone wrong.
When you find a bug, write a test for the buggy behavior before you fix the code. Then, when the code is working correctly, the test will pass, and you'll have another test in your arsenal.

1) To start, i'd recommend you to test your app's core logic.
2) Then, use code coverage tool in vs to see whether all of your code is used in tests(all branches of if-else, case conditions are invoked).
This is some sort of an answer to your question about testing 1+2 = 3, 5 + 6 = 35: when code is covered, you can feel safe with further experiments.
3)It's a good practice to cover 80-90% of code: the rest of work is usually unefficient: getters-setters, 1-line exception handling, etc.
4) Learn about separation of concerns.
5) Generation unit tests - try it, you'll see, that you can save a pretty lines of code writing them manually. I prefer generating the file with vs, then write the rest TestMethods by myself.

You unittest things where you
want to make sure your algorithm works
want to safeguard against accidental changes in the future
So in your example it would not make much sense to test the generated classes. Test the generator instead.
It's good practice to test the main use cases (what the tested function was designed for) first. Then you test the main error cases. Then you write tests for corner cases (i.e. lower and upper bounds). The unusual error cases are normally so hard to produce that it doesn't make sense to unit-test them.
If you need to verify a large range of parameter sets, use data-driven testing.
How many things you test is a matter of effort vs. return, so it really depends on the individual project. Normally you try to follow the 80/20 rule, but there may be applications where you need more test coverage because a failure would have very serious consequences.
You can dramatically reduce the time you need to write tests if you use a test-driven approach (TDD). That's because code that isn't written with testability in mind is much harder, sometimes near to impossible to test. But since nothing in life is free, the code developed with TDD tends to be more complex itself.

I'm also beginning the process of more consistently using unit tests and what I've found is that the biggest task in unit testing is structuring my code to support testing. As I start to think about how to write tests, it becomes clear where classes have become overly coupled, to the point that the complexity of the 'unit' makes defining tests difficult. I spend as much or more time refactoring my code as I do writing tests. Once the boundaries between testable units become clearer, the question of where to start testing resolves itself; start with your smallest isolated dependencies (or at least the ones you're worried about) and work your way up.

There are three basic events I test for:
min, max, and somewhere between min and max.
And where appropriate two extremes: below min, and above max.
There are obvious exceptions (some code may not have a min or max for example) but I've found that unit testing for these events is a good start and captures a majority of "common" issues with the code.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js