Use of test/resource files to compare unit test output = code smell?

Use of test/resource files to compare unit test output = code smell? - unit-testing

Is it a code smell to use a generated file, like a spreadsheet or XML file, as a source for comparison in Unit tests?
Say that we had to write a lot of classes that produce various XML files for processing later. The unit tests would be a large number of repetitive assertions that foo.getExpectedValue() = expectedValue. Instead of doing this, the developer chooses to use the code they are supposed to be testing to generate the XML, then copy that into test/resources for all future tests load into memory as an object and to run their assertions against. Is this a code smell?

There are two practices in what you describe that classify as test smells.
First, you write that the classes that are to be tested are used to create the XML files that are later used to judge the correctness. This way you can find out if there are changes in the classes, but you can not figure out if the results were correct in the first place.
To avoid any misunderstandings: The smell is not that generated files are used, but that the files are generated with the code under test. The only way how such an approach might make sense would be if the results of the initial run were subject to thorough review. But, these reviews would have to be repeated again whenever the files are re-generated later.
Secondly, using complete XML files for comparison (generated or not) is another test small. The reason is, that these tests are not very focused. Any change to the XML generation will lead to a failing test. That may seem like a good thing, but it even applies to all kinds of intended changes, for example to changes in the indentation. Thus, you only have test that tell you "something changed", but not "something failed".
To have tests that tell you "something failed" you need more specific tests. For example, tests that only look at a certain portion of the generated XML. Some tests would look at the XML structure, others at the data content. You can even have tests to check the indentation. But, how? You could, for example, use regular expressions to see if some interesting portion of a generated XML string looks the way you expected.
Once you have more focused tests, then the rest results in case of intended modifications to your code will look different: When your changes are successfull, only a few test cases of many will fail, and this will be the ones that have tested against the part of the behaviour that you intentionally have changed. All other tests will still work OK and show you that your change did not break something unexpectedly. If in contrast your change was incorrect, then some more/other tests than the expected ones will show you that the change had unexpected effects.

Yes, I wouldn't do that. This is breaking principles of a good test. Mainly, a good test test should be;
Independent - should not rely on other tests. If subsequent tests have to wait for a file generated by the first test,tests are not independent.
Repeatable - these tests introduce the flakiness due to the file/in-memory dependency. Therefore, they may not be repeatable consistently.
Perhaps, you could take a step back and see whether you need to unit test every generated XML. If these file generation follow the same code execution path with (no logical difference), I wouldn't write unit test per each case. If these XML generation is business operation related, I would consider having acceptance tests.

If the files are small, you can use string reader and writer. In general one is supposed to avoid doing i/o in unit tests. However,I see nothing wrong with using files in some non-unit tests.

Related

Code coverage metrics when using groovy AST transforms

We use several AST transforms in our groovy code, such as #ToString and #EqualsAndHashCode. We use these so we don't have to maintain and test them. The problem is that code coverage metrics (using jacoco right now but open to change if it will help) don't know these are autogenerated methods and they cause a lot of code to appear uncovered even though it's not actually code we're writing.
Is there a way to include these from coverage metrics in any tools?
I guess you could argue that since we're putting the annotations we should still be testing the code being generated since a unit test shouldn't care how these methods are created, but just that they work.

I had a similar issue with #Log and the conditionals that it inserts into the code. That gets reported (cobertura) as a lack of branch coverage.
But as you said: it just reports it correctly. The code is not covered.
If you don't need the code, you should not have generated it. If you need it and aim for full test coverage, you must test it or at least "exercise" it, i.e. somehow use it from your test cases even without asserts.
From a test methodology standpoint, not covering generated code is equally questionable as using exclusion patterns. From a pragmatic standpoint, you may just want to live with it.

Is file access in unit tests bad?

When writing unit tests that deal with XML (e.g. test a class that reads/generates XML) I used to write my asserted outcome XML-String / my input XML String in separate files right next to my unit test. Let's say I have a class "MyTransformer" that transformes one XML format into another. Then I would create three files all in the same package:
MyTransformerTest.java
MyTransformerTestSampleInput.xml
MyTransformerTestExpectedOutput.xml
Then my assertion might look like this (simplified pseudo code for reasons of simplicity):
Reader transformed = MyTransformer.transform(getResourceAsStream("MyTransformerTestSampleInput.xml")));
Reader expected = getResourceAsStream("MyTransformerTestExpectedOutput.xml");
assertXMLEqual(expected, transformed);
However a colleague told me that the file access that I have in this unit test is unacceptable. He proposed creating a literal string constant (private static final String) containing my XML file contents, possibly in a separate groovy class because of the benefit of multi line strings rather than writing the XML file into files.
I dislike the idea of the literal string constants, because even if I have multi line strings in groovy, I still loose syntax highlighting and all the other helpful features of my XML editor that tell me right away if my XML has syntax errors etc.
What do you think? Is the file access really bad? If so: Why? If not why is it ok?

Two problems with files in unit tests:
they slow down the testing cycle. You may have thousands of unit tests which, preferably, get run on every build - so they should be as fast as possible. If you can speed them up (eg, by getting rid of I/O operations) you'd want to do that. Surely it's not always feasible, so you normally separate out the "slow" tests via NUnit [Category] or something similar - and then run those special tests less frequently - say, only on Nightly builds.
they introduce additional dependencies. If a test requires a file, it will fail not only when the logic behind the test is wrong, but also when the file is missing, or test runner doesn't have read permissions etc. Which makes debugging and fixing not so pleasing!
That said, I won't be too strict about not using files in the tests. If possible, try to avoid them but don't get mad. Make sure you consider maintainability vs speed - the cleaner the test, the easier it will be to fix and understand it later.

If your Unit Tests access the files to feed the fake test data into the System Under Test, so you can run tests, that's not a problem. That actually helps your to have wider variety of test data to exercise within the system under test.
However if your System Under Test access the file system, when executing from the a test, that's not a Unit Test. That's an Integration Test. This is because you are accessing cross cutting concerns such as file system, and they cannot categorised as Unit Tests.
You would really isolate/fake out the file access and test the behaviour of your code (if any), using Unit Tests. They are faster and easier to run. It gives your a pin point feedback if written correctly.

In these cases, I have a unit test that uses an internal representation of the file, which is a string literal in this case.
I also will have an Integration Test, to test the code works correctly when writing to the file.
So it is all down to the Unit / Integration test definitions. Both are valid tests, just depends which test you are writing at the time.

If the xml is more readable, or easier to work with in a file, and you have a lot of these tests, I would leave them.
Strictly speaking, unit tests should not use the file system because it is slow. However, readability is more important. XML in a file is easier to read, and can be loaded in an XML friendly editor.
If the tests take to long to run (cause you have a lot of them), or your colleagues complain, move them to integration tests.
If you work in Windows and LINUX, you have to be careful that the files are picked up by your build server.
There are no perfect answers.

Program written in generated code based on unit tests

As I was doing test driven development I pondered whether a hypothetical program could be completely developed by generated code based on tests. i.e. is there an ability to have a generator that creates the code specifically to pass tests. Would the future of programming languages just be to write tests?

I think this would be a tough one as, at least for the initial generations of such technology, developers would be very skeptical of generated code's correctness. So human review would have to be involved as well.
As a simple illustration of what I mean, suppose you write 10 tests for a function, with sample inputs and expected outputs covering every scenario you can think of. A program could trivially generate code which passed all of these tests with nothing more than a rudimentary switch statement (your ten inputs matched to their expected outputs). This code would obviously not be correct, but it would take a human to see that.
That's just a simple example. It isn't hard to imagine more sophisticated programs which might not generate a switch statement but still produce solutions that aren't actually correct, and which could be wrong in much more subtle ways. Hence my suggestion that any technology along these lines would be met with a deep level of skepticism, at least at first.

If code can be generated completely, then the basis of the generator would have to be a specification that exactly describes the code. This generator would then be something like a compiler that cross compiles one language into an other.
Tests are not such a language. They only assert that a specific aspect of the code functionality is valid and unchanged. By doing so they scaffold the code so that it does not break, even when it is refactored.
But how would I compare these two ways of development?
1) If the generator works correctly, then the specification is always transferred into correct code. I postulate that this code is tested by design and needs no additional test. Better TDD the generator than the generated code.
2) Whether you have a specification that leads to generated code or specifications expressed as tests that ensure that code works is quite equivalent in my eyes.
3) You can combine both ways of development. Generate a program framework with a tested generator from a specification and then enrich the generated code by using TDD. Attention: You then have two different development cycles running in one project. That means, you have to ensure that you always can regenerate the generated code when specifications change und that your additional code still correctly fits into the generated code.
Just one small example: Imagine a tool that can generate code from an UML class diagram. This could be done in an way that you can develop the methods with TDD, but the structure of the classes is defined in UML and you would not need to test this again.

While it's possible sometime in the future, simple tests can be used to generate code:
assertEquals(someclass.get_value(), true)
but getting the correct output from a black-box integration test is what I would guess is an NP-complete problem:
assertEquals(someclass.do_something(1), file_content(/some/file))
assertEquals(someclass.do_something(2), file_content(/some/file))
assertEquals(someclass.do_something(2), file_content(/some/file2))
assertEquals(someclass.do_something(3), file_content(/some/file2))
Does this mean that the resulting code will always write to /some/file? Does it mean that the resulting code should always write to /some/file2? Either could be true. What if it needs to only do the minimal set to get the tests to pass? Without knowing the context and writing very exact and bounding tests, no code could figure out (at this point in time) what the test author intended.

Unit testing for a compiler output

As part of a university project, we have to write a compiler for a toy language. In order to do some testing for this, I was considering how best to go about writing something like unit tests. As the compiler is being written in haskell, Hunit and quickcheck are both available, but perhaps not quite appropriate.
How can we do any kind of non-manual testing?
The only idea i've had is effectively compiling to haskell too, seeing what the output is, and using some shell script to compare this to the output of the compiled program - this is quite a bit of work, and isn't too elegant either.
The unit testing is to help us, and isn't part of assessed work itself.

This really depends on what parts of the compiler you are writing. It is nice if you can keep phases distinct to help isolate problems, but, in any phase, and even at the integration level, it is perfectly reasonable to have unit tests that consist of pairs of source code and hand-compiled code. You can start with the simplest legal programs possible, and ensure that your compiler outputs the same thing that you would if compiling by hand.
As complexity increases, and hand-compiling becomes unwieldy, it is helpful for the compiler to keep some kind of log of what it has done. Then you can consult this log to determine whether or not specific transformations or optimizations fired for a given source program.
Depending on your language, you might consider a generator of random programs from a collection of program fragments (in the QuickCheck vein). This generator can test your compiler's stability, and ability to deal with potentially unforeseen inputs.

The unit tests shall test small piece of code, typically one class or one function. The lexical and semantic analysis will each have their unit tests. The Intermediate Represetation generator will also have its own tests.
A unit test covers a simple test case: it invokes the function to be unit tested in a controlled environment and verify (assert) the result of the function execution. A unit test usually test one behavior only and has the following structure, called AAA :
Arrange: create the environment the function will be called in
Act: invoke the function
Assert: verify the result

Have a look at shelltestrunner. Here are some example tests. It is also being used in this compiler project.

One options is to the approach this guy is doing to test real compilers: get together with as many people as you can talk into it and each of you compiles and runs the same set of programs and then compare the outputs. Be sure to add every test case you use as more inputs makes it more effective. A little fun with automation and source control and you can make it fairly easy to maintain.
Be sure to get it OKed by the prof first but as you will only be sharing test cases and outputs I don't see where he will have much room to object.

Testing becomes more difficult once the output of your program goes to the console (such as standard output). Then you have to resort to some external tool, like grep or expect to check the output.
Keep the return values from your functions in data structures for as long as possible. If the output of your compiler is, say, assembly code, build a string in memory (or a list of strings) and output it at the last possible moment. That way you can test the contents of the strings more directly and quickly.

How does unit testing work when the program doesn't lend itself to a functional style?

I'm thinking of the case where the program doesn't really compute anything, it just DOES a lot. Unit testing makes sense to me when you're writing functions which calculate something and you need to check the result, but what if you aren't calculating anything? For example, a program I maintain at work relies on having the user fill out a form, then opening an external program, and automating the external program to do something based on the user input. The process is fairly involved. There's like 3000 lines of code (spread out across multiple functions*), but I can't think of a single thing which it makes sense to unit test.
That's just an example though. Should you even try to unit test "procedural" programs?
*EDIT

Based on your description these are the places I would look to unit test:
Does the form validation work of user input work correctly
Given valid input from the form is the external program called correctly
Feed in user input to the external program and see if you get the right output
From the sounds of your description the real problem is that the code you're working with is not modular. One of the benefits I find with unit testing is that it code that is difficult to test is either not modular enough or has an awkward interface. Try to break the code down into smaller pieces and you'll find places where it makes sense to write unit tests.

I'm not an expert on this but have been confused for a while for the same reason. Somehow the applications I'm doing just don't fit to the examples given for UNIT testing (very asynchronous and random depending on heavy user interaction)
I realized recently (and please let me know if I'm wrong) that it doesn't make sense to make a sort of global test but rather a myriad of small tests for each component. The easiest is to build the test in the same time or even before creating the actual procedures.

Do you have 3000 lines of code in a single procedure/method? If so, then you probably need to refactor your code into smaller, more understandable pieces to make it maintainable. When you do this, you'll have those parts that you can and should unit test. If not, then you already have those pieces -- the individual procedures/methods that are called by your main program.
Even without unit tests, though, you should still write tests for the code to make sure that you are providing the correct inputs to the external program and testing that you handle the outputs from the program correctly under both normal and exceptional conditions. Techniques used in unit testing -- like mocking -- can be used in these integration tests to ensure that your program is operating correctly without involving the external resource.

An interesting "cut point" for your application is you say "the user fills out a form." If you want to test, you should refactor your code to construct an explicit representation of that form as a data structure. Then you can start collecting forms and testing that the system responds appropriately to each form.
It may be that the actions taken by your system are not observable until something hits the file system. Here are a couple of ideas:
Set up something like a git repository for the initial state of the file system, run a form, and look at the output of git diff. It's likely this is going to feel more like regression testing than unit testing.
Create a new module whose only purpose is to make your program's actions observable. This can be as simple as writing relevant text to a log file or as complex as you like. If necessary, you can use conditional compilation or linking to ensure this module does something only when the system is under test. This is closer to traditional unit testing as you can now write tests that say upon receiving form A, the system should take sequence of actions B. Obviously you have to decide what actions should be observed to form a reasonable test.
I suspect you'll find yourself migrating toward something that looks more like regression testing than unit testing per se. That's not necessarily bad. Don't overlook code coverage!
(A final parenthetical remark: in the bad old days of interactive console applications, Don Libes created a tool called Expect, which was enormously helpful in allowing you to script a program that interacted like a user. In my opinion we desperately need something similar for interacting with web pages. I think I'll post a question about this :-)

You don't necessarily have to implement automated tests that test individual methods or components. You could implement an automated unit test that simulates a user interacting with your application, and test that your application responds in the correct way.
I assume you are manually testing your application currently, if so then think about how you could automate that and work from there. Over time you should be able to break your tests into progressively smaller chunks that test smaller sections of code. Any sort of automated testing is usually a lot better than nothing.

Most programs (regardless of the language paradigm) can be broken into atomic units which take input and provide output. As the other responders have mentioned, look into refactoring the program and breaking it down into smaller pieces. When testing, focus less on the end-to-end functionality and more on the individual steps in which data is processed.
Also, a unit doesn't necessarily need to be an individual function (though this is often the case). A unit is a segment of functionality which can be tested using inputs and measuring outputs. I've seen this when using JUnit to test Java APIs. Individual methods might not necessarily provide the granularity I need for testing, though a series of method calls will. Therefore, the functionality I regard as a "unit" is a little greater than a single method.

You should at least refactor out the stuff that looks like it might be a problem and unit test that. But as a rule, a function shouldn't be that long. You might find something that is unit test worthy once you start refactoring
Good object mentor article on TDD

As a few have answered before, there are a few ways you can test what you have outlined.
First the form input, can be tested in a few ways.
What happens if invalid data is inputted, valid data, etc.
Then each of the function can be tested to see if the functions when supplied with various forms of correct and incorrect data react in the proper manner.
Next you can mock the application that are being called so that you can make sure that your application send and process data to the external programs correctly. Don't for get to make sure your program deals with unexpected data from the external program as well.
Usually, the way I figure out how I want to write tests for a program I have been assigned to maintain, is to see what I am do manually to test the program. Then try and figure how to automate as much of it as possible. Also, don't restrict your testing tools just to the programming language you are writing the code in.

I think a wave of testing paranoia is spreading :) Its good to examine things to see if tests would make sense, sometimes the answer is going to be no.
The only thing that I would test is making sure that bogus form input is handled correctly.. I really don't see where else an automated test would help. I think you'd want the test to be non invasive (i.e. no record is actually saved during testing), so that might rule out the other few possibilities.

If you can't test something how do you know that it works? A key to software design is that the code should be testable. That may make the actual writing of the software more difficult, but it pays off in easier maintenance later.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js