How to write good Unit Tests in Functional Programming - unit-testing

I'm using functions instead of classes, and I find that I can't tell when another function that it relies on is a dependency that should be individually unit-tested or an internal implementation detail that should not. How can you tell which one it is?
A little context: I'm writing a very simple Lisp interpreter which has an eval() function. It's going to have a lot of responsibilities, too many actually, such as evaluating symbols differently than lists (everything else evaluates to itself). When evaluating symbols, it has its own complex workflow (environment-lookup), and when evaluating lists, it's even more complicated, since the list can be a macro, function, or special-form, each of which have their own complex workflow and set of responsibilities.
I can't tell if my eval_symbol() and eval_list() functions should be considered internal implementation details of eval() which should be tested through eval()'s own unit tests, or genuine dependencies in their own right which should be unit-tested independently of eval()'s unit tests.

A significant motivation for the "unit test" concept is to control the combinatorial explosion of required test cases. Let's look at the examples of eval, eval_symbol and eval_list.
In the case of eval_symbol, we will want to test contingencies where the symbol's binding is:
missing (i.e. the symbol is unbound)
in the global environment
is directly within the current environment
inherited from a containing environment
shadowing another binding
... and so on
In the case of eval_list, we will want to test (among other things) what happens when the list's function position contains a symbol with:
no function or macro binding
a function binding
a macro binding
eval_list will invoke eval_symbol whenever it needs a symbol's binding (assuming a LISP-1, that is). Let's say that there are S test cases for eval_symbol and L symbol-related test cases for eval_list. If we test each of these functions separately, we could get away with roughly S + L symbol-related test cases. However, if we wish to treat eval_list as a black box and to test it exhaustively without any knowledge that it uses eval_symbol internally, then we are faced with S x L symbol-related test cases (e.g. global function binding, global macro binding, local function binding, local macro binding, inherited function binding, inherited macro binding, and so on). That's a lot more cases. eval is even worse: as a black box the number of combinations can become incredibly large -- hence the term combinatorial explosion.
So, we are faced with a choice of theoretical purity versus actual practicality. There is no doubt that a comprehensive set of test cases that exercises only the "public API" (in this case, eval) gives the greatest confidence that there are no bugs. After all, by exercising every possible combination we may turn up subtle integration bugs. However, the number of such combinations may be so prohibitively large as to preclude such testing. Not to mention that the programmer will probably make mistakes (or go insane) reviewing vast numbers of test cases that only differ in subtle ways. By unit-testing the smaller internal components, one can vastly reduce the number of required test cases while still retaining a high level of confidence in the results -- a practical solution.
So, I think the guideline for identifying the granularity of unit testing is this: if the number of test cases is uncomfortably large, start looking for smaller units to test.
In the case at hand, I would absolutely advocate testing eval, eval-list and eval-symbol as separate units precisely because of the combinatorial explosion. When writing the tests for eval-list, you can rely upon eval-symbol being rock solid and confine your attention to the functionality that eval-list adds in its own right. There are likely other testable units within eval-list as well, such as eval-function, eval-macro, eval-lambda, eval-arglist and so on.

My advice is quite simple: "Start somewhere!"
If you see a name of some def (or deffun) that looks like it might be fragile, well, you probably want to test it, don't you?
If you're having some trouble trying to figure out how your client code can interface with some other code unit, well, you probably want to write some tests somewhere that let you create examples of how to properly use that function.
If some function seems sensitive to data values, well, you might want to write some tests that not only verify it can handle any reasonable inputs properly, but also specifically exercise boundary conditions and odd or unusual data inputs.
Whatever seems bug-prone should have tests.
Whatever seems unclear should have tests.
Whatever seems complicated should have tests.
Whatever seems important should have tests.
Later, you can go about increasing your coverage to 100%. But you'll find that you will probably get 80% of your real results from the first 20% of your unit test coding (Inverted "Law of the Critical Few").
So, to review the main point of my humble approach, "Start somewhere!"
Regarding the last part of your question, I would recommend you think about any possible recursion or any additional possible reuse by "client" functions that you or subsequent developers might create in the future that would also call eval_symbol() or eval_list().
Regarding recursion, the functional programming style uses it a lot and it can be difficult to get right, especially for those of us who come from procedural or object-oriented programming, where recursion seems rarely encountered. The best way to get recursion right is to precisely target any recursive features with unit tests to make certain all possible recursive use cases are validated.
Regarding reuse, if your functions are likely to be invoked by anything other than a single use by your eval() function, they should probably be treated as genuine dependencies that deserve independent unit tests.
As a final hint, the term "unit" has a technical definition in the domain of unit testing as "the smallest piece of code software that can be tested in isolation.". That is a very old fundamental definition that may quickly clarify your situation for you.

This is somewhat orthogonal to the content of your question, but directly addresses the question posed in the title.
Idiomatic functional programming involves mostly side effect-free pieces of code, which makes unit testing easier in general. Defining a unit test typically involves asserting a logical property about the function under test, rather than building large amounts of fragile scaffolding just to establish a suitable test environment.
As an example, let's say we're testing extendEnv and lookupEnv functions as part of an interpreter. A good unit test for these functions would check that if we extend an environment twice with the same variable bound to different values, only the most recent value is returned by lookupEnv.
In Haskell, a test for this property might look like:
test =
let env = extendEnv "x" 5 (extendEnv "x" 6 emptyEnv)
in lookupEnv env "x" == Just 5
This test gives us some assurance, and doesn't require any setup or teardown other than creating the env value that we're interested in testing. However, the values under test are very specific. This only tests one particular environment, so a subtle bug could easily slip by. We'd rather make a more general statement: for all variables x and values v and w, an environment env extended twice with x bound to v after x is bound to w, lookupEnv env x == Just w.
In general, we need a formal proof (perhaps mechanized with a proof assistant like Coq, Agda, or Isabelle) in order to show that a property like this holds. However, we can get much closer than specifying test values by using QuickCheck, a library available for most functional languages that generates large amounts of arbitrary test input for properties we define as boolean functions:
prop_test x v w env' =
let env = extendEnv x v (extendEnv x w env')
in lookupEnv env x == Just w
At the prompt, we can have QuickCheck generate arbitrary inputs to this function, and see whether it remains true for all of them:
*Main> quickCheck prop_test
+++ OK, passed 100 tests.
*Main> quickCheckWith (stdArgs { maxSuccess = 1000 }) prop_test
+++ OK, passed 1000 tests.
QuickCheck uses some very nice (and extensible) magic to produce these arbitrary values, but it's functional programming that makes having those values useful. By making side effects the exception (sorry) rather than the rule, unit testing becomes less of a task of manually specifying test cases, and more a matter of asserting generalized properties about the behavior of your functions.
This process will surprise you frequently. Reasoning at this level gives your mind extra chances to notice flaws in your design, making it more likely that you'll catch errors before you even run your code.

I'm not really aware of any particular rule of thumb for this. But it seems like you should be asking yourself two questions:
Can you define the purpose of eval_symbol and eval_list without needing to say "part of the implementation of eval?
If you see a test fail for eval, would it be useful to to see whether any tests for eval_symbol and eval_list also fail?
If the answer to either of those is yes, I would test them separately.

Few months ago I wrote a simple "almost Lisp" interpreter in Python for an assignment. I designed it using Interpreter design pattern, unit tested the evaluation code. Then I added the printing and parsing code and transformed the test fixtures from abstract syntax representation (objects) to concrete syntax strings. Part of the assignment was to program simple recursive list processing functions, so I added them as functional tests.
To answer your question in general, the rules are pretty same like for OO. You should have all your public functions covered. In OO public methods are part of a class or an interface, in functional programming you most often have visibility control based around modules (similar to interfaces). Ideally, you would have full coverage for all functions, but if this isn't possible, consider TDD approach - start by writing tests for what you know you need and implement them. Auxilliary functions will be result of refactoring and as you wrote tests for everything important before, if tests work after refactoring, you are done and can write another test (iterate).
Good luck!

Related

What's a good way to unit test the main function that calls all the simple functions?

I have this code with lots of small functions f_small_1(), f_small_2(), ... f_small_10().
These are easy to unit test individually.
But in the real world, you often have a complex function f_put_things_together() that needs to call the smaller ones.
What is a good way to unit test f_put_things_together ?
func f_put_things_together() {
a = f_small_1()
if a {
f_small_2()
} else {
f_small_3()
}
f_small_4()
...
f_small_10()
}
I started to write tests but I have the impression that Im doing this twice as I have already tested the smaller functions.
I could have f_put_things_together take objects a1, a2, ..., a10 as arguments and call a1.f_small_1(), a2.f_small_2(), ... so that I can mock these objects individually but this doesn't feel right to me: if I didn't have to write unit tests, all these functions would logically belong to the same class, and I don't want to have unclear code for the sake of testing.
This is somehow language agnostic and somehow not, as languages like Python enable you to replace methods of an object. So if you have an answer that is language agnostic, that's best. Else Im currently using Go.
The general case that you've shown in your example demonstrates the need to test both the simple functions and the aggregation of the results of those functions. When testing the aggregating function, you really want to fake the results of the smaller functions the aggregating function depends on. So, you're on the right track.
However, if you're having trouble writing unit tests for your code, then you're probably having one of these classes of problems:
You've somehow violated the SOLID principles (description here). In other words, something is deficient in the micro-architecture of your code.
You're trying to fake someone else's interface and you're having trouble matching the actual behavior of their implementation with your fake implementation. (This doesn't seem to be the case here).
The objects that you're testing with require a bunch of data setup that should be simplified, at least within the context of testing (also, doesn't appear to be the case).
If you're tests are painful to write, they're telling you something! With experience, you'll be able to quickly pick up on the pain point in your implementation that the tests are indicating.
Unfortunately, your example is a bit small and abstract. To be more precise, I don't know what f_small_1 ... f_small_10 do. So, with more details, I might make more precise recommendations for doing some small refactoring that could have a big payoff for your testing.
I can say, however, that it appears that f_put_things_together looks a bit big to me. This could be a violation of the Single Responsibility Principle (the 'S' in SOLID). I see 10 function calls at a minimum along with some branching logic.
You'll need to write a separate test for each branch path through your function. The less branching you have in a particular function, the less tests you'll need to write. For more information, take a look at Cyclomatic Complexity. In this case, it seems the method has a low CC, so this likely isn't the problem.
The ten calls to smaller functions do make me wonder a bit. It looks like for simplicity you've left out capturing the return value of these function calls and the logic for aggregating the results. In that case, yes you really do want to fake the results of the smaller functions then write a few tests to check the algorithm you're using to aggregate everything.
Or, perhaps the functions are all void and you need to verify that everything happened, and maybe that it happened in the right order. In that case, you're looking at writing more of an interaction-based test. You'll still want to put those smaller function calls behind an interface / class / object that you fake. In this case, the fake should capture the calls and the call order so that your test can make the assertions that are really important.
If some of the smaller functions are related to each other, it might make sense to group them together in a single method a separate class. Then, your test for f_put_things_together will have fewer dependencies that need to be faked. You will have a new class that also needs tested, but it's much easier to test two smaller methods than to test one large one that has too much responsibility.
Summary
This is actually a very good question with the exception of it being a bit vague. If you can provide a more detailed example, perhaps I could make more detailed recommendations. The bottom line is this: If your tests are difficult to write then either you need some help / coaching on writing tests or something about the design of your implementation is off and your tests are trying to tell you what it is.

Effective testing of multiple students programming solutions

As part of my lecture in C++ the students will have to solve assignments. Each solution shall implement the same functions with the same functionality and the same parameters (function name, return value, passing parameters). Only the code inside is different.
So i'm thinking about a way to test all solutions (around 30) in an effective way. Maybe the best way is to write a unit test as well as a shell script (or something similar) that compiles each solution once with the unit test and runs it.
But maybe there is a different and much better solution to this problem.
The reason why unit tests are one of the most efficient types of automated testing is because the return of investment is relatively small (compared to other types of testing), so it makes perfect sense to me to write a verification suite of tests.
You might even go so far as to give the students the test suite instead of a specification written in prose. This could introduce them to the concept of Test-Driven Development (although we normally tend to write the tests iteratively, and not in batches).
yes, unit tests are the obvious solution for most cases.
compiler warnings and static analysis is also useful.
timing the program's execution given a set of parameters is another fairly automated option -- depends on what you are interested in evaluating.
creating base classes with good diagnostics (which you can swap out the implementation for your evaluation if you prefer) is another option. you can also provide interfaces they must use, and hold two implementations. then exercise the programs as usual using the diagnostic implementation. it depends on what you are looking for.
Maybe I'm missing something obvious, but wouldn't it be sufficient just to run the code several times with boundary-testing parameter values?

Unit testing - can't move from theory to practice

It seems like every unit test example I've encountered is incredibly obvious and canned. Things like assert that x + 3 == 8, and whatnot. I just have a hard time seeing how I'd unit test real world things, like SQL queries, or if a regEx used for form validation is actually working properly.
Case in point: I'm working on two ASP.NET MVC 2 sites that are DB driven. I have a test unit solution for each of them, but have no idea what kind of tests would be useful. Most of the work the site will do is writing data to, or retrieving and organizing data from the DB. Would I simply test that the various queries successfully accessed the DB? How would I test for correctness (e.g., data being written to the proper fields, or correct data being retrieved)?
I'm just having a hard time transforming my own informal manner of testing and debugging into the more formalized, assert(x) kind of testing.
For unit testing to be feasible, your code will have to apply to principles of cohesion and decoupling. In fact, it will force those principles on your code as you apply it. Meaning, if your code is not well factored (i.e. OO design principles applied correctly), unit testing will be next to impossible and/or useless.
So probably, the better way for you to think about this would be 'How can I divide up all the work of my application to smaller, more cohesive pieces of code that only do one or two things and use those to assemble my application?'
Until you have internalized this mindset in terms of how you think about your code, unit testing will probably not make sense.
First, ask yourself "Why are unit tests hard to write for my real code?" Perhaps the answer is that your real code is doing too much. If you have a single module of code filled with "new" statements and "if" statements and "switch" statements and clever math statements and database access, it's going to be painful to write one test, let alone adequately test the logic and the math. But if you pulled the "new" statements out into a factory method, you could get easily provide mock objects to test with. If you pulled the "if" clauses and "switch" statements out into state machine patterns, you wouldn't have so many combinations to test. If you remove the database access to an external data provider object, you can provide simple test data to execute your math statements. Now you're testing object creation, state transitions, and data access all separately from your clever math statements. All of these steps got easier by simplifying them.
A key reason code is hard to test is that it contains "internal dependencies", such as dependencies that it creates, or dependencies on libraries. If your code says "Foo theFoo = new Foo();" you can't easily substitute a MockFoo to test with. But if your constructor or method asks for theFoo to be passed in instead of constructing itself, your test harness can easily pass in a MockFoo.
When you write code, ask yourself "how can I write a unit test for this code?" If the answer is "it's hard", you might consider altering your code to make it easier to test. What this does is it makes your unit test the first actual consumer of your code - you're testing the interface to your code by writing the test.
By altering your interfaces to make them easier to test, you will find yourself better adhering to the object oriented principles of "tight cohesion" and "loose coupling".
Unit testing isn't just about the tests. Writing unit tests actually improves your designs. Get a little further down the path, and you end up with Test Driven Development.
Good luck!
Well, if x + 3 == 8 isn't enough of a hint, what about x == y?
Said differently, what you're testing for is the correct and incorrect behaviour of types or functions, not just when used with regular conditions, but also under unexpected conditions. With a class for example you need to recognize that just instantiating isn't enough. Are the prerequisites of the class met? What about post-conditions? What should happen when these aren't met? This is where you set the boundaries between you and the person using the class (could also be you, of course) to differentiate between a bug in the class or a bug in the usage of the class by the user. Do instances of your class change state when used with particular coding patterns? If so, how so? If not, why not, and (ideally) under all possible usage conditions; is this behaviour correct?
Unit tests are also a good place for a user of a (for example) class to see how the class is expected to be used, how to avoid using it, and what could happen under exceptional circumstances (where if something goes wrong, your class is supposed to react in some particular way instead of simply breaking). Sort of like built-in documentation for the developer.
Perhaps learning from an example would be most useful for you. You could take a look at the NerdDinner sample app and see what kind of testing it does. You could also look at the MVC source code itself and see how it is tested.

Does YAGNI also apply when writing tests?

When I write code I only write the functions I need as I need them.
Does this approach also apply to writing tests?
Should I write a test in advance for every use-case I can think of just to play it safe or should I only write tests for a use-case as I come upon it?
I think that when you write a method you should test both expected and potential error paths. This doesn't mean that you should expand your design to encompass every potential use -- leave that for when it's needed, but you should make sure that your tests have defined the expected behavior in the face of invalid parameters or other conditions.
YAGNI, as I understand it, means that you shouldn't develop features that are not yet needed. In that sense, you shouldn't write a test that drives you to develop code that's not needed. I suspect, though, that's not what you are asking about.
In this context I'd be more concerned with whether you should write tests that cover unexpected uses -- for example, errors due passing null or out of range parameters -- or repeating tests that only differ with respect to the data, not the functionality. In the former case, as I indicated above, I would say yes. Your tests will document the expected behavior of your method in the face of errors. This is important information to people who use your method.
In the latter case, I'm less able to give you a definitive answer. You certainly want your tests to remain DRY -- don't write a test that simply repeats another test even if it has different data. Alternatively, you may not discover potential design issues unless you exercise the edge cases of your data. A simple example is a method that computes a sum of two integers: what happens if you pass it maxint as both parameters? If you only have one test, then you may miss this behavior. Obviously, this is related to the previous point. Only you can be sure when a test is really needed or not.
Yes YAGNI absolutely applies to writing tests.
As an example, I, for one, do not write tests to check any Properties. I assume that properties work a certain way, and until I come to one that does something different from the norm, I won't have tests for them.
You should always consider the validity of writing any test. If there is no clear benefit to you in writing the test, then I would advise that you don't. However, this is clearly very subjective, since what you might think is not worth it someone else could think is very worth the effort.
Also, would I write tests to validate input? Absolutely. However, I would do it to a point. Say you have a function with 3 parameters that are ints and it returns a double. How many tests are you going to write around that function. I would use YAGNI here to determine which tests are going to get you a good ROI, and which are useless.
Write the test as you need it. Tests are code. Writing a bunch of (initially failing) tests up front breaks the red/fix/green cycle of TDD, and makes it harder to identify valid failures vs. unwritten code.
You should write the tests for the use cases you are going to implement during this phase of development.
This gives the following benefits:
Your tests help define the functionality of this phase.
You know when you've completed this phase because all of your tests pass.
You should write tests that cover all your code, ideally. Otherwise, the rest of your tests lose value, and you will in the end debug that piece of code repeatedly.
So, no. YAGNI does not include tests :)
There is of course no point in writing tests for use cases you're not sure will get implemented at all - that much should be obvious to anyone.
For use cases you know will get implemented, test cases are subject to diminishing returns, i.e. trying to cover each and every possible obscure corner case is not a useful goal when you can cover all important and critical paths with half the work - assuming, of course, that the cost of overlooking a rarely occurring error is endurable; I would certainly not settle for anything less than 100% code and branch coverage when writing avionics software.
You'll probably get some variance here, but generally, the goal of writing tests (to me) is to ensure that all your code is functioning as it should, without side effects, in a predictable fashion and without defects. In my mind, then, the approach you discuss of only writing tests for use cases as they are come upon does you no real good, and may in fact cause harm.
What if the particular use case for the unit under test that you ignore causes a serious defect in the final software? Has the time spent developing tests bought you anything in this scenario beyond a false sense of security?
(For the record, this is one of the issues I have with using code coverage to "measure" test quality -- it's a measurement that, if low, may give an indication that you're not testing enough, but if high, should not be used to assume that you are rock-solid. Get the common cases tested, the edge cases tested, then consider all the ifs, ands and buts of the unit and test them, too.)
Mild Update
I should note that I'm coming from possibly a different perspective than many here. I often find that I'm writing library-style code, that is, code which will be reused in multiple projects, for multiple different clients. As a result, it is generally impossible for me to say with any certainty that certain use cases simply won't happen. The best I can do is either document that they're not expected (and hence may require updating the tests afterward), or -- and this is my preference :) -- just writing the tests. I often find option #2 is for more livable on a day-to-day basis, simply because I have much more confidence when I'm reusing component X in new application Y. And confidence, in my mind, is what automated testing is all about.
You should certainly hold off writing test cases for functionality you're not going to implement yet. Tests should only be written for existing functionality or functionality you're about to put in.
However, use cases are not the same as functionality. You only need to test the valid use cases that you've identified, but there's going to be a lot of other things that might happen, and you want to make sure those inputs get a reasonable response (which could well be an error message).
Obviously, you aren't going to get all the possible use cases; if you could, there'd be no need to worry about computer security. You should get at least the more plausible ones, and as problems come up you should add them to the use cases to test.
I think the answer here is, as it is in so many places, it depends. If the contract that a function presents states that it does X, and I see that it's got associated unit tests, etc., I'm inclined to think it's a well-tested unit and use it as such, even if I don't use it that exact way elsewhere. If that particular usage pattern is untested, then I might get confusing or hard-to-trace errors. For this reason, I think a test should cover all (or most) of the defined, documented behavior of a unit.
If you choose to test more incrementally, I might add to the doc comments that the function is "only tested for [certain kinds of input], results for other inputs are undefined".
I frequently find myself writing tests, TDD, for cases that I don't expect the normal program flow to invoke. The "fake it 'til you make it" approach has me starting, generally, with a null input - just enough to have an idea in mind of what the function call should look like, what types its parameters will have and what type it will return. To be clear, I won't just send null to the function in my test; I'll initialize a typed variable to hold the null value; that way when Eclipse's Quick Fix creates the function for me, it already has the right type. But it's not uncommon that I won't expect the program normally to send a null to the function. So, arguably, I'm writing a test that I AGN. But if I start with values, sometimes it's too big a chunk. I'm both designing the API and pushing its real implementation from the beginning. So, by starting slow and faking it 'til I make it, sometimes I write tests for cases I don't expect to see in production code.
If you're working in a TDD or XP style, you won't be writing anything "in advance" as you say, you'll be working on a very precise bit of functionality at any given moment, so you'll be writing all the necessary tests in order make sure that bit of functionality works as you intend it to.
Test code is similar with "code" itself, you won't be writing code in advance for every use cases your app has, so why would you write test code in advance ?

Can invariant testing replace unit testing?

As a programmer, I have bought whole-heartedly into the TDD philosophy and take the effort to make extensive unit tests for any nontrivial code I write. Sometimes this road can be painful (behavioral changes causing cascading multiple unit test changes; high amounts of scaffolding necessary), but on the whole I refuse to program without tests that I can run after every change, and my code is much less buggy as a result.
Recently, I've been playing with Haskell, and it's resident testing library, QuickCheck. In a fashion distinctly different from TDD, QuickCheck has an emphasis on testing invariants of the code, that is, certain properties that hold over all (or substantive subsets) of inputs. A quick example: a stable sorting algorithm should give the same answer if we run it twice, should have increasing output, should be a permutation of the input, etc. Then, QuickCheck generates a variety of random data in order to test these invariants.
It seems to me, at least for pure functions (that is, functions without side effects--and if you do mocking correctly you can convert dirty functions into pure ones), that invariant testing could supplant unit testing as a strict superset of those capabilities. Each unit test consists of an input and an output (in imperative programming languages, the "output" is not just the return of the function but also any changed state, but this can be encapsulated). One could conceivably created a random input generator that is good enough to cover all of the unit test inputs that you would have manually created (and then some, because it would it would generate cases that you wouldn't have thought of); if you find a bug in your program due to some boundary condition, you improve your random input generator so that it generates that case too.
The challenge, then, is whether or not it's possible to formulate useful invariants for every problem. I'd say it is: it's a lot simpler once you have an answer to see if it's correct than it is to calculate the answer in the first place. Thinking about invariants also helps clarify the specification of a complex algorithm much better than ad hoc test cases, which encourage a kind of case-by-case thinking of the problem. You could use a previous version of your program as a model implementation, or a version of a program in another language. Etc. Eventually, you could cover all of your former test-cases without having to explicitly code an input or an output.
Have I gone insane, or am I on to something?
A year later, I now think I have an answer to this question: No! In particular, unit tests will always be necessary and useful for regression tests, in which a test is attached to a bug report and lives on in the codebase to prevent that bug from ever coming back.
However, I suspect that any unit test can be replaced with a test whose inputs are randomly generated. Even in the case of imperative code, the “input” is the order of imperative statements you need to make. Of course, whether or not it’s worth creating the random data generator, and whether or not you can make the random data generator have the right distribution is another question. Unit testing is simply a degenerate case where the random generator always gives the same result.
What you've brought up is a very good point - when only applied to functional programming. You stated a means of accomplishing this all with imperative code, but you also touched on why it's not done - it's not particularly easy.
I think that's the very reason it won't replace unit testing: it doesn't fit for imperative code as easily.
Doubtful
I've only heard of (not used) these kinds of tests, but I see two potential issues. I would love to have comments about each.
Misleading results
I've heard of tests like:
reverse(reverse(list)) should equal list
unzip(zip(data)) should equal data
It would be great to know that these hold true for a wide range of inputs. But both these tests would pass if the functions just return their input.
It seems to me that you'd want to verify that, eg, reverse([1 2 3]) equals [3 2 1] to prove correct behavior in at least one case, then add some testing with random data.
Test complexity
An invariant test that fully describes the relationship between the input and output might be more complex than the function itself. If it's complex, it could be buggy, but you don't have tests for your tests.
A good unit test, by contrast, is too simple to screw up or misunderstand as a reader. Only a typo could create a bug in "expect reverse([1 2 3]) to equal [3 2 1]".
What you wrote in your original post, reminded me of this problem, which is an open question as to what the loop invariant is to prove the loop correct...
anyways, i am not sure how much you have read in formal spec, but you are heading down that line of thought. david gries's book is one the classics on the subject, I still haven't mastered the concept well enough to use it rapidly in my day to day programming. the usual response to formal spec is, its hard and complicated, and only worth the effort if you are working on safety critical systems. but i think there are back of envelope techniques similar to what quickcheck exposes that can be used.