Related
The question might seem a bit weird, but i'll explain it.
Consider following:
We have a service FirstNameValidator, which i created for other developers so they have a consistent way to validate a person's first name. I want to test it, but because the full set of possible inputs is infinite (or very very big), i only test few cases:
Assert.IsTrue(FirstNameValidator.Validate("John"))
Assert.IsFalse(FirstNameValidator.Validate("$$123"))
I also have LastNameValidator, which is 99% identical, and i wrote a test for it too:
Assert.IsTrue(LastNameValidator.Validate("Doe"))
Assert.IsFalse(LastNameValidator.Validate("__%%"))
But later a new structure appeared - PersonName, which consists of first name and last name. We want to validate it too, so i create a PersonNameValidator. Obviously, for reusability i just call FirstNameValidator and LastNameValidator. Everything is fine till i want to write a test for it.
What should i test?
The fact that FirstNameValidator.Validate was actually called with correct argument?
Or i need to create few cases and test them?
That is actually the question - should we test what service is expected to do? It is expected to validate PersonName, how it does it we actually don't care. So we pass few valid and invalid inputs and expect corresponding return values.
Or, maybe, what it actually does? Like it actually just calls other validators, so test that (.net mocking framework allows it).
Unit tests should be acceptance criteria for a properly functioning unit of code...
they should test what the code should and shouldn't do, you will often find corner cases when you are writing tests.
If you refactor code, you often will have to refactor tests... This should be viewed as part of the original effort, and should bring glee to your soul as you have made the product and process an improvement of such magnitude.
of course if this is a library with outside (or internal, depending on company culture) consumers, you have documentation to consider before you are completely done.
edit: also those tests are pretty weak, you should have a definition of what is legal in each, and actually test inclusion and exclusion of at least all of the classes of glyphps... they can still use related code for testing... ie isValidUsername(name,allowsSpace) could work for both first name and whole name depending on if spaces are allowed.
You have formulated your question a bit strangely: Both options that you describe would test that the function behaves as it should - but in each case on a different level of granularity. In one case you would test the behaviour based on the API that is available to a user of the function. Whether and how the function implements its functionality with the help of other functions/components is not relevant. In the second case you test the behaviour in isolation, including the way the function interacts with its dependended-on components.
On a general level it is not possible to say which is better - depending on the circumstances each option may be the best. In general, isolating a piece of software requires usually more effort to implement the tests and makes the tests more fragile against implementation changes. That means, going for isolation should only be done in situations where there are good reasons for it. Before getting to your specific case, I will describe some situations where isolation is recommendable.
With the original depended-on component (DOC), you may not be able to test everything you want. Assume your code does error handling for the case the DOC returns an error code. But, if the DOC can not easily be made to return an error code, you have difficulty to test your error handling. In this case, if you double the DOC you could make the double return an error code, and thus also test your error handling.
The DOC may have non-deterministic or system-specific behaviour. Some examples are random number generators or date and time functions. If this makes testing your functions difficult, it would be an argument to replace the DOC with some double, so you have control over what is provided to your functions.
The DOC may require a very complex setup. Imagine a complex data base or some complex xml document that needs to be provided. For one thing, this can make your setup quite complicated, for another, your tests get fragile and will likely break if the DOC changes (think that the XML schema changes...).
The setup of the DOC or the calls to the DOC are very time consuming (imagine reading data from a network connection, computing the next chess move, solving the TSP, ...). Or, the use of the DOC prolongs compilation or linking significantly. With a double you can possibly shorten the execution or build time significantly, which is the more interesting the more often you are executing / building the tests.
You may not have a working version of the DOC - possibly the DOC is still under development and is not yet available. Then, with doubles you can start testing nevertheless.
The DOC may be immature, such that with the version you have your tests are instable. In such a case it is likely that you lose trust in your test results and start ignoring failing tests.
The DOC itself may have other dependencies which have some of the problems described above.
These criteria can help to come to an informed decision about whether isolation is necessary. Considering your specific example: The way you have described the situation I get the impression that none of the above criteria is fulfilled. Which for me would lead to the conclusion that I would not isolate the function PersonNameValidator from its DOCs FirstNameValidator and LastNameValidator.
I'm using functions instead of classes, and I find that I can't tell when another function that it relies on is a dependency that should be individually unit-tested or an internal implementation detail that should not. How can you tell which one it is?
A little context: I'm writing a very simple Lisp interpreter which has an eval() function. It's going to have a lot of responsibilities, too many actually, such as evaluating symbols differently than lists (everything else evaluates to itself). When evaluating symbols, it has its own complex workflow (environment-lookup), and when evaluating lists, it's even more complicated, since the list can be a macro, function, or special-form, each of which have their own complex workflow and set of responsibilities.
I can't tell if my eval_symbol() and eval_list() functions should be considered internal implementation details of eval() which should be tested through eval()'s own unit tests, or genuine dependencies in their own right which should be unit-tested independently of eval()'s unit tests.
A significant motivation for the "unit test" concept is to control the combinatorial explosion of required test cases. Let's look at the examples of eval, eval_symbol and eval_list.
In the case of eval_symbol, we will want to test contingencies where the symbol's binding is:
missing (i.e. the symbol is unbound)
in the global environment
is directly within the current environment
inherited from a containing environment
shadowing another binding
... and so on
In the case of eval_list, we will want to test (among other things) what happens when the list's function position contains a symbol with:
no function or macro binding
a function binding
a macro binding
eval_list will invoke eval_symbol whenever it needs a symbol's binding (assuming a LISP-1, that is). Let's say that there are S test cases for eval_symbol and L symbol-related test cases for eval_list. If we test each of these functions separately, we could get away with roughly S + L symbol-related test cases. However, if we wish to treat eval_list as a black box and to test it exhaustively without any knowledge that it uses eval_symbol internally, then we are faced with S x L symbol-related test cases (e.g. global function binding, global macro binding, local function binding, local macro binding, inherited function binding, inherited macro binding, and so on). That's a lot more cases. eval is even worse: as a black box the number of combinations can become incredibly large -- hence the term combinatorial explosion.
So, we are faced with a choice of theoretical purity versus actual practicality. There is no doubt that a comprehensive set of test cases that exercises only the "public API" (in this case, eval) gives the greatest confidence that there are no bugs. After all, by exercising every possible combination we may turn up subtle integration bugs. However, the number of such combinations may be so prohibitively large as to preclude such testing. Not to mention that the programmer will probably make mistakes (or go insane) reviewing vast numbers of test cases that only differ in subtle ways. By unit-testing the smaller internal components, one can vastly reduce the number of required test cases while still retaining a high level of confidence in the results -- a practical solution.
So, I think the guideline for identifying the granularity of unit testing is this: if the number of test cases is uncomfortably large, start looking for smaller units to test.
In the case at hand, I would absolutely advocate testing eval, eval-list and eval-symbol as separate units precisely because of the combinatorial explosion. When writing the tests for eval-list, you can rely upon eval-symbol being rock solid and confine your attention to the functionality that eval-list adds in its own right. There are likely other testable units within eval-list as well, such as eval-function, eval-macro, eval-lambda, eval-arglist and so on.
My advice is quite simple: "Start somewhere!"
If you see a name of some def (or deffun) that looks like it might be fragile, well, you probably want to test it, don't you?
If you're having some trouble trying to figure out how your client code can interface with some other code unit, well, you probably want to write some tests somewhere that let you create examples of how to properly use that function.
If some function seems sensitive to data values, well, you might want to write some tests that not only verify it can handle any reasonable inputs properly, but also specifically exercise boundary conditions and odd or unusual data inputs.
Whatever seems bug-prone should have tests.
Whatever seems unclear should have tests.
Whatever seems complicated should have tests.
Whatever seems important should have tests.
Later, you can go about increasing your coverage to 100%. But you'll find that you will probably get 80% of your real results from the first 20% of your unit test coding (Inverted "Law of the Critical Few").
So, to review the main point of my humble approach, "Start somewhere!"
Regarding the last part of your question, I would recommend you think about any possible recursion or any additional possible reuse by "client" functions that you or subsequent developers might create in the future that would also call eval_symbol() or eval_list().
Regarding recursion, the functional programming style uses it a lot and it can be difficult to get right, especially for those of us who come from procedural or object-oriented programming, where recursion seems rarely encountered. The best way to get recursion right is to precisely target any recursive features with unit tests to make certain all possible recursive use cases are validated.
Regarding reuse, if your functions are likely to be invoked by anything other than a single use by your eval() function, they should probably be treated as genuine dependencies that deserve independent unit tests.
As a final hint, the term "unit" has a technical definition in the domain of unit testing as "the smallest piece of code software that can be tested in isolation.". That is a very old fundamental definition that may quickly clarify your situation for you.
This is somewhat orthogonal to the content of your question, but directly addresses the question posed in the title.
Idiomatic functional programming involves mostly side effect-free pieces of code, which makes unit testing easier in general. Defining a unit test typically involves asserting a logical property about the function under test, rather than building large amounts of fragile scaffolding just to establish a suitable test environment.
As an example, let's say we're testing extendEnv and lookupEnv functions as part of an interpreter. A good unit test for these functions would check that if we extend an environment twice with the same variable bound to different values, only the most recent value is returned by lookupEnv.
In Haskell, a test for this property might look like:
test =
let env = extendEnv "x" 5 (extendEnv "x" 6 emptyEnv)
in lookupEnv env "x" == Just 5
This test gives us some assurance, and doesn't require any setup or teardown other than creating the env value that we're interested in testing. However, the values under test are very specific. This only tests one particular environment, so a subtle bug could easily slip by. We'd rather make a more general statement: for all variables x and values v and w, an environment env extended twice with x bound to v after x is bound to w, lookupEnv env x == Just w.
In general, we need a formal proof (perhaps mechanized with a proof assistant like Coq, Agda, or Isabelle) in order to show that a property like this holds. However, we can get much closer than specifying test values by using QuickCheck, a library available for most functional languages that generates large amounts of arbitrary test input for properties we define as boolean functions:
prop_test x v w env' =
let env = extendEnv x v (extendEnv x w env')
in lookupEnv env x == Just w
At the prompt, we can have QuickCheck generate arbitrary inputs to this function, and see whether it remains true for all of them:
*Main> quickCheck prop_test
+++ OK, passed 100 tests.
*Main> quickCheckWith (stdArgs { maxSuccess = 1000 }) prop_test
+++ OK, passed 1000 tests.
QuickCheck uses some very nice (and extensible) magic to produce these arbitrary values, but it's functional programming that makes having those values useful. By making side effects the exception (sorry) rather than the rule, unit testing becomes less of a task of manually specifying test cases, and more a matter of asserting generalized properties about the behavior of your functions.
This process will surprise you frequently. Reasoning at this level gives your mind extra chances to notice flaws in your design, making it more likely that you'll catch errors before you even run your code.
I'm not really aware of any particular rule of thumb for this. But it seems like you should be asking yourself two questions:
Can you define the purpose of eval_symbol and eval_list without needing to say "part of the implementation of eval?
If you see a test fail for eval, would it be useful to to see whether any tests for eval_symbol and eval_list also fail?
If the answer to either of those is yes, I would test them separately.
Few months ago I wrote a simple "almost Lisp" interpreter in Python for an assignment. I designed it using Interpreter design pattern, unit tested the evaluation code. Then I added the printing and parsing code and transformed the test fixtures from abstract syntax representation (objects) to concrete syntax strings. Part of the assignment was to program simple recursive list processing functions, so I added them as functional tests.
To answer your question in general, the rules are pretty same like for OO. You should have all your public functions covered. In OO public methods are part of a class or an interface, in functional programming you most often have visibility control based around modules (similar to interfaces). Ideally, you would have full coverage for all functions, but if this isn't possible, consider TDD approach - start by writing tests for what you know you need and implement them. Auxilliary functions will be result of refactoring and as you wrote tests for everything important before, if tests work after refactoring, you are done and can write another test (iterate).
Good luck!
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I would like to know from those who document unit tests how they are documenting it. I understand that most TDD followers claim "the code speaks" and thus test documentation is not very important because code should be self-descriptive. Fair enough, but I would like to know how to document unit tests, not whether to document them at all.
My experience as a developer tells me that understanding old code (this includes unit tests) is difficult.
So what is important in a test documentation? When is the test method name not descriptive enough so that documentation is justified?
As requested by Thorsten79, I'll elaborate on my comments as an answer. My original comment was:
"The code speaks" is unfortunately
completely wrong, because a
non-developer cannot read the code,
while he can at least partially read
and understand generated
documentation, and this way he can
know what the tests test. This is
especially important in cases where
the customer fully understands the
domain and just can't read code, and
gets even more important when the unit
tests also test hardware, like in the
embedded world, because then you test
things that can be seen.
When you're doing unit tests, you have to know whether you're writing them just for you (or for your co-workers), or if you're also writing them for other people. Many times, you should be writing code for your readers, rather than for your convenience.
In mixed hardware/software development like in my company, the customers know what they want. If their field device has to do a reset when receiving a certain bus command, there must be a unit test that sends that command and checks whether the device was reset. We're doing this here right now with NUnit as the unit test framework, and some custom software and hardware that makes sending and receiving commands (and even pressing buttons) possible. It's great, because the only alternative would be to do all that manually.
The customer absolutely wants to know which tests are there, and he even wants to run the tests himself. If the tests are not properly documented, he doesn't know what the test does and can't check if all tests he think he'll need are there, and when running the test, he doesn't know what it will do. Because he can't read the code. He knows the used bus system better than our developers, but they just can't read the code. If a test fails, he does not know why and cannot even say what he thinks the test should do. That's not a good thing.
Having documented the unit tests properly, we have
code documentation for the developers
test documentation for the customer, which can be used to prove that the device does what it should do, i.e. what the customer ordered
the ability to generate the documentation in any format, which can even be passed to other involved parties, like the manufacturer
Properly in this context means: Write clear language that can be understood by non-developers. You can stay technical, but don't write things only you can understand. The latter is of course also important for any other comments and any code.
Independent of our exact situation, I think that's what I would want in unit tests all the time, even if they're pure software. A customer can ignore a unit test he doesn't care about, like basic function tests. But just having the docs there does never hurt.
As I've written in a comment to another answer: In addition, the generated documentation is also a good starting point if you (or your boss, or co-worker, or the testing department) wants to examine which tests are there and what they do, because you can browse it without digging through the code.
In the test code itself:
With method level comments explaining
what the test is testing / covering.
At the class level, a comment indicating the actual class being tested (which could actually be inferred from the test class name so that's actually less important than the comments at the method level).
With test coverage reports
Such as Cobertura. That's also documentation, since it indicates what your tests are covering and what they're not.
Comment complex tests or scenarios if required but favour readable tests in the first place.
On the other hand, I try and make my tests speak for themselves. In other words:
[Test]
public void person_should_say_hello() {
// Arrange.
var person = new Person();
// Act.
string result = person.SayHello();
// Assert.
Assert.AreEqual("Hello", result, "Person did not say hello");
}
If I was to look at this test I'd see it used Person (though it would be in PersonTest.cs as a clue ;)) then that if anything breaks it will occur in the SayHello method. The assert message is useful as well, not only for reading tests but when tests are run it's easier to see them in GUI's.
Following the AAA style of Arrange, Act and Assert makes the test essentially document itself. If this test was more complex, you could add comments above the test function explaining what's going on. As always, you should ensure these are kept up to date.
As a side note, using underscore notation for test names makes them much more readably, compare this to:
public void PersonShouldSayHello()
Which for long method names, can make reading the test more difficult. Though this point is often subjective.
When I come back at an old test and don't understand it right away
I refactor if possible
or write that comment that would have made me understand it right away
When you are writing your testcases it is the same as when you are writing your code, everyhting is crystal clear to you. That makes it difficult to envision what you should write to make the code clearer.
Note that this does not mean I never write any comments. There still are plenty of situations when I just know that I will going to have a hard time figuring out what a particular piece of code does.
I usually start with point 1 in these situations...
Improving the unit tests as executable specification is the point of Behaviour-Driven Development : BDD is an evolution of TDD where unit-tests use an Ubiquitous Language (a language based on the business domain and shared by the developers and the stakeholders) and expressive names (testCannotCreateDuplicateEntry) to describe what the code is supposed to do. Some BDD frameworks pushed the idea very far, and show executable written with almost natural language, for example.
I would advice against any detailed documentation separate from code. Why? Because whenever you need it, it will most likely be very outdated. The best place for detailed documentation is the code itself (including comments). BTW, anything you need to say about a specific unit test is very detailed documentation.
A few pointers on how to achieve well self-documented tests:
Follow a standard way to write all tests, like AAA pattern. Use a blank line to separate each part. That makes it much easier for the reader to identify the important bits.
You should include, in every test name: what is being tested, the situation under test and the expected behavior. For example: test__getAccountBalance__NullAccount__raisesNullArgumentException()
Extract out common logic into set up/teardown or helper methods with descriptive names.
Whenever possible use samples from real data for input values. This is much more informative than blank objects or made up JSON.
Use variables with descriptive names.
Think about your future you/teammate, if you remembered nothing about this, would you like any additional information when the test fails? Write that down as comments.
And to complement what other answers have said:
It's great if your customer/Product Owner/boss has a very good idea as to what should be tested and is eager to help, but unit tests are not the best place to do it. You should use acceptance tests for this.
Unit tests should cover specific units of code (methods/functions within classes/modules), if you cover more ground, they will quickly turn into integration tests, which are fine and needed too, but if you do not separate them specifically, people will just get them confused and you will loose some of the benefits of unit testing. For example, when a unit test fails you should get instant bug detection (specially if you follow the naming convention above). When an integration test fails, you know there is a problem, and you know some of its effects, but you might need to debug, sometimes for a long time, to find what it is.
You can use unit testing frameworks for integration tests if you want, but you should know you are not doing unit testing, and you should keep them in separate files/directories.
There are good acceptance/behavior testing frameworks (FitNesse, Robot, Selenium, Cucumber, etc.) that can help business/domain people not just read, but also write the tests themselves. Sure, they will need help from coders to get them to work (specially when starting out), but they will be able to do it, and they do not need to know anything about your modules or classes of functions.
Having just read the first four chapters of Refactoring: Improving the Design of Existing Code, I embarked on my first refactoring and almost immediately came to a roadblock. It stems from the requirement that before you begin refactoring, you should put unit tests around the legacy code. That allows you to be sure your refactoring didn't change what the original code did (only how it did it).
So my first question is this: how do I unit-test a method in legacy code? How can I put a unit test around a 500 line (if I'm lucky) method that doesn't do just one task? It seems to me that I would have to refactor my legacy code just to make it unit-testable.
Does anyone have any experience refactoring using unit tests? And, if so, do you have any practical examples you can share with me?
My second question is somewhat hard to explain. Here's an example: I want to refactor a legacy method that populates an object from a database record. Wouldn't I have to write a unit test that compares an object retrieved using the old method, with an object retrieved using my refactored method? Otherwise, how would I know that my refactored method produces the same results as the old method? If that is true, then how long do I leave the old deprecated method in the source code? Do I just whack it after I test a few different records? Or, do I need to keep it around for a while in case I encounter a bug in my refactored code?
Lastly, since a couple people have asked...the legacy code was originally written in VB6 and then ported to VB.NET with minimal architecture changes.
For instructions on how to refactor legacy code, you might want to read the book Working Effectively with Legacy Code. There's also a short PDF version available here.
Good example of theory meeting reality. Unit tests are meant to test a single operation and many pattern purists insist on Single Responsibilty, so we have lovely clean code and tests to go with it. However, in the real (messy) world, code (especially legacy code) does lots of things and has no tests. What this needs is dose of refactoring to clean the mess.
My approach is to build tests, using the Unit Test tools, that test lots of things in a single test. In one test, I may be checking the DB connection is open, changing lots of data, and doing a before/after check on the DB. I inevitably find myself writing helper classes to do the checking, and more often than not those helpers can then be added into the code base, as they have encapsulated emergent behaviour/logic/requirements. I don't mean I have a single huge test, what I do mean is mnay tests are doing work which a purist would call an integration test - does such a thing still exist? Also I've found it useful to create a test template and then create many tests from that, to check boundary conditions, complex processing etc.
BTW which language environment are we talking about? Some languages lend themselves to refactoring better than others.
From my experience, I'd write tests not for particular methods in the legacy code, but for the overall functionality it provides. These might or might not map closely to existing methods.
Write tests at what ever level of the system you can (if you can), if that means running a database etc then so be it. You will need to write a lot more code to assert what the code is currently doing as a 500 line+ method is going to possibly have a lot of behaviour wrapped up in it. As for comparing the old versus the new, if you write the tests against the old code, they pass and they cover everything it does then when you run them against the new code you are effectively checking the old against the new.
I did this to test a complex sql trigger I wanted to refactor, it was a pain and took time but a month later when we found another issue in that area it was worth having the tests there to rely on.
In my experience this is the reality when working on Legacy code. Book (Working with Legacy..) mentioned by Esko is an excellent work which describes various approaches which can take you there.
I have seen similar issues with out unit-test itself which has grown to become system/functional test. Most important thing to develop tests for Legacy or existing code is to define the term "unit". It can be even functional unit like "reading from database" etc. Identify key functional units and maintain tests which adds value.
As an aside, there was recent talk between Joel S. and Martin F. on TDD/unit-tests. My take is that it is important to define unit and keep focus on it! URLS: Open Letter, Joel's transcript and podcast
That really is one of the key problems of trying to refit legacy code. Are you able to break the problem domain down to something more granular? Does that 500+ line method make anything other than system calls to JDK/Win32/.NET Framework JARs/DLLs/assemblies? I.e. Are there more granular function calls within that 500+ line behemoth that you could unit test?
The following book: The Art of Unit Testing contains a couple of chapters with some interesting ideas on how to deal with legacy code in terms of developing Unit Tests.
I found it quite helpful.
As a programmer, I have bought whole-heartedly into the TDD philosophy and take the effort to make extensive unit tests for any nontrivial code I write. Sometimes this road can be painful (behavioral changes causing cascading multiple unit test changes; high amounts of scaffolding necessary), but on the whole I refuse to program without tests that I can run after every change, and my code is much less buggy as a result.
Recently, I've been playing with Haskell, and it's resident testing library, QuickCheck. In a fashion distinctly different from TDD, QuickCheck has an emphasis on testing invariants of the code, that is, certain properties that hold over all (or substantive subsets) of inputs. A quick example: a stable sorting algorithm should give the same answer if we run it twice, should have increasing output, should be a permutation of the input, etc. Then, QuickCheck generates a variety of random data in order to test these invariants.
It seems to me, at least for pure functions (that is, functions without side effects--and if you do mocking correctly you can convert dirty functions into pure ones), that invariant testing could supplant unit testing as a strict superset of those capabilities. Each unit test consists of an input and an output (in imperative programming languages, the "output" is not just the return of the function but also any changed state, but this can be encapsulated). One could conceivably created a random input generator that is good enough to cover all of the unit test inputs that you would have manually created (and then some, because it would it would generate cases that you wouldn't have thought of); if you find a bug in your program due to some boundary condition, you improve your random input generator so that it generates that case too.
The challenge, then, is whether or not it's possible to formulate useful invariants for every problem. I'd say it is: it's a lot simpler once you have an answer to see if it's correct than it is to calculate the answer in the first place. Thinking about invariants also helps clarify the specification of a complex algorithm much better than ad hoc test cases, which encourage a kind of case-by-case thinking of the problem. You could use a previous version of your program as a model implementation, or a version of a program in another language. Etc. Eventually, you could cover all of your former test-cases without having to explicitly code an input or an output.
Have I gone insane, or am I on to something?
A year later, I now think I have an answer to this question: No! In particular, unit tests will always be necessary and useful for regression tests, in which a test is attached to a bug report and lives on in the codebase to prevent that bug from ever coming back.
However, I suspect that any unit test can be replaced with a test whose inputs are randomly generated. Even in the case of imperative code, the “input” is the order of imperative statements you need to make. Of course, whether or not it’s worth creating the random data generator, and whether or not you can make the random data generator have the right distribution is another question. Unit testing is simply a degenerate case where the random generator always gives the same result.
What you've brought up is a very good point - when only applied to functional programming. You stated a means of accomplishing this all with imperative code, but you also touched on why it's not done - it's not particularly easy.
I think that's the very reason it won't replace unit testing: it doesn't fit for imperative code as easily.
Doubtful
I've only heard of (not used) these kinds of tests, but I see two potential issues. I would love to have comments about each.
Misleading results
I've heard of tests like:
reverse(reverse(list)) should equal list
unzip(zip(data)) should equal data
It would be great to know that these hold true for a wide range of inputs. But both these tests would pass if the functions just return their input.
It seems to me that you'd want to verify that, eg, reverse([1 2 3]) equals [3 2 1] to prove correct behavior in at least one case, then add some testing with random data.
Test complexity
An invariant test that fully describes the relationship between the input and output might be more complex than the function itself. If it's complex, it could be buggy, but you don't have tests for your tests.
A good unit test, by contrast, is too simple to screw up or misunderstand as a reader. Only a typo could create a bug in "expect reverse([1 2 3]) to equal [3 2 1]".
What you wrote in your original post, reminded me of this problem, which is an open question as to what the loop invariant is to prove the loop correct...
anyways, i am not sure how much you have read in formal spec, but you are heading down that line of thought. david gries's book is one the classics on the subject, I still haven't mastered the concept well enough to use it rapidly in my day to day programming. the usual response to formal spec is, its hard and complicated, and only worth the effort if you are working on safety critical systems. but i think there are back of envelope techniques similar to what quickcheck exposes that can be used.