Unit testing: why is the expected argument always first in equality tests? - unit-testing

Why is it that every unit testing framework (that I know of) requires the expected value in equality tests to always be the first argument:
Assert.AreEqual(42, Util.GetAnswerToLifeTheUniverseAndEverything());
assertEquals(42, Util.GetAnswerToLifeTheUniverseAndEverything());
etc.
I'm quite used to it now, but every coder I try to teach unit testing makes the mistake of reversing the arguments, which I understand perfectly. Google didn't help, maybe one of the hard-core unit-testers here knows the answer?

It seems that most early frameworks used expected before actual (for some unknown reason though, dice roll perhaps?). Yet with programming languages development, and increased fluency of the code, that order got reversed. Most fluent interfaces usually try to mimic natural language and unit testing frameworks are no different.
In the assertion, we want to assure that some object matches some conditions. This is the natural language form, as if you were to explain your test code you'd probably say
"In this test, I make sure that computed value is equal to 5"
instead of
"In this test, I make sure that 5 is equal to computed value".
Difference may not be huge, but let's push it further. Consider this:
Assert.That(Roses, Are(Red));
Sounds about right. Now:
Assert.That(Red, Are(Roses));
Hm..? You probably wouldn't be too surprised if somebody told you that roses are red. Other way around, red are roses, raises suspicious questions. Yoda, anybody?
Yoda's making an important point - reversed order forces you to think.
It gets even more unnatural when your assertions are more complex:
Assert.That(Forest, Has.MoreThan(15, Trees));
How would you reverse that one? More than 15 trees are being had by forest?
This claim (fluency as a driving factor for modification) is somehow reflected in the change that NUnit has gone through - originally (Assert.AreEqual) it used expected before actual (old style). Fluent extensions (or to use NUnit's terminology, constraint based - Assert.That) reversed that order.

I think it is just a convention now and as you said it is adopted by "every unit testing framework (I know of)". If you are using a framework it would be annoying to switch to another framework that uses the opposite convention. So (if you are writing a new unit testing framework for example) it would be preferable for you as well to follow the existing convention.
I believe this comes from the way some developers prefer to write their equality tests:
if (4 == myVar)
To avoid any unwanted assignment, by mistake, writing one "=" instead of "==". In this case the compiler will catch this error and you will avoid a lot of troubles trying to fix a weird runtime bug.

Nobody knows and it is the source of never ending confusions. However not all frameworks follow this pattern (to a greater confusion):
FEST-Assert uses normal order:
assertThat(Util.GetAnswerToLifeTheUniverseAndEverything()).isEqualTo(42);
Hamcrest:
assertThat(Util.GetAnswerToLifeTheUniverseAndEverything(), equalTo(42))
ScalaTest doesn't really make a distinction:
Util.GetAnswerToLifeTheUniverseAndEverything() should equal (42)

I don't know but I've been part of several animated discussions about the order of arguments to equality tests in general.
There are a lot of people who think
if (42 == answer) {
doSomething();
}
is preferable to
if (answer == 42) {
doSomething();
}
in C-based languages. The reason for this is that if you accidentally put a single equals sign:
if (42 = answer) {
doSomething();
}
will give you a compiler error, but
if (answer = 42) {
doSomething();
}
might not, and would definitely introduce a bug that might be hard to track down. So who knows, maybe the person/people who set up the unit testing framework were used to thinking of equality tests in this way -- or they were copying other unit testing frameworks that were already set up this way.

I think it's because JUnit was the precursor of most unit testing frameworks (not that it was the first unit testing framework, but it kicked off an explosion in unit testing). Since JUnit did it that way, all the subsequent frameworks copied this form and it became a convention.
why did JUnit do it that way? I don't know, ask Kent Beck!

My view for this would be to avoid any exceptions eg: 42.equals(null) vs null.equals(42)
where 42 is expected
null is actual

Well they had to pick one convention. If you want to reverse it try the Hamcrest matchers. They are meant to help increase readability. Here is a basic sample:
import org.junit.Test;
import static org.junit.Assert.assertThat;
import static org.hamcrest.core.Is.is;
public HamcrestTest{
#Test
public void matcherShouldWork(){
assertThat( Math.pow( 2, 3 ), is( 8 ) );
}
}

Surely it makes logical sense to put the expected value first, as it's the first known value.
Think about it in the context of manual tests. A manual test will have the expected value written in, with the actual value recorded afterwards.

Related

Unit testing checking for nulls

This is a very basic question but I still cannot find the appropriate answer. In my test there is a possibility to have null values and because of that the last stage (Act) starts looking a little bit strange (it is no longer act only). What I mean is the following:
Assert.IsNotNull(variable);
var newVariable = variable.Property;
Assert.IsNotNull(newVariable);
var finalVariable = newVariable.AnotherProperty;
Assert.AreEqual(3, finalVariable.Count);
Now they are obviously related and I have to be sure that the values are not null, but also there are three asserts in one test and the act part starts to look not right.
So what is the general solution in such cases? Is there anything smarter than 3 tests with one assert each and checks for null before the asserts of the last 2?
Basically there are two ways of dealing with your problem:
Guard assertions: extra asserts making sure data is in known state before proper test takes place (that's what you're doing now).
Moving guard assertions to their own tests.
Which option to chose largely depends on code under test. If preconditions would be duplicated in other tests, it's a hint for separate test approach. If precondition has reflection in production code, it's again hint for separate test approach.
On the other hand, if it's only something you do to boost your confidence, maybe separate test is too much (yet as noted in other answers, it might be a sign that you're not in full control of your test or that you're testing too many things at once).
I think you should split this test into three tests and name them accordingly to what's happening. It's perfectly sensible even if your acts in those tests are same, you are testing different scenarios by checking return value of the method.
Nulls are royal pain. The question is, can they legitimately exist?
Let's separate our discussion to code and tests.
If the null shouldn't exist then the code itself, not the tests, should check and verify that they are not null. For this reason each and every method of my code is built using a snippet that checks the arguments:
public VideoPosition(FrameRate theFrameRate, TimeSpan theAirTime)
{
Logger.LogMethod("theVideoMovie", theFrameRate, "theAirTime", theAirTime);
try
{
#region VerifyInputs
Validator.Verify(theFrameRate);
Validator.Verify(theAirTime);
Validator.VerifyTrue(theAirTime.Ticks >= 0, "theAirTime.Ticks >= 0");
If null ARE legitimate in the code, but you are testing a scenario where the returned values shouldn't be null, then of course you have to verify this in your testing code.
In your Unit Test you should be able to control every input to your class under test. This means that you control if your variable has a value or not.
So you would have one unit test that forces your variable to be null andnthen asserts this.
You will then have another test where you can be sure that your variable has a value and you omly need the other asserts.
I wrote a blog about this some time ago. Maybe it can help: Unit Testing, hell or heaven?

Unit testing with random data

I've read that generating random data in unit tests is generally a bad idea (and I do understand why), but testing on random data and then constructing a fixed unit test case from random tests which uncovered bugs seems nice. However I don't understand how to organize it nicely. My question is not related to a specific programming language or to a specific unit test framework actually, so I'll use python and some pseudo unit test framework. Here's how I see coding it:
def random_test_cases():
datasets = [
dataset1,
dataset2,
...
datasetn
]
for dataset in datasets:
assertTrue(...)
assertEquals(...)
assertRaises(...)
# and so on
The problem is: when this test case fails I can't figure out which dataset caused failure. I see two ways of solving it:
Create a single test case per dataset — the problem is load of test cases and code duplication.
Usually test framework lets us pass a message to assert functions (in my example I could do something like assertTrue(..., message = str(dataset))). The problem is that I should pass such a message to each assert, which does not look like elegant too.
Is there a simpler way of doing it?
I still think it's a bad idea.
Unit tests need to be straightforward. Given the same piece of code and the same unit test, you should be able to run it infinitely and never get a different response unless there's an external factor coming in to play. A goal contrary to this will increase maintenance cost of your automation, which defeats the purpose.
Outside of the maintenance aspect, to me it seems lazy. If you put thought in to your functionality and understand the positive as well as the negative test cases, developing unit tests are straightforward.
I also disagree with the user who shows how to do multiple tests cases inside of the same test case. When a test fails, you should be able to tell immediately which test failed and know why it failed. Tests should be as simple as you can make them and as concise/relevant to the code under test as possible.
You could define tests by extension instead of enumeration, or you could call multiple test cases from a single case.
calling multiple test cases from a single test case:
MyTest()
{
MyTest(1, "A")
MyTest(1, "B")
MyTest(2, "A")
MyTest(2, "B")
MyTest(3, "A")
MyTest(3, "B")
}
And there are sometimes elegant ways to achieve this with some testing frameworks. Here is how to do it in NUnit:
[Test, Combinatorial]
public void MyTest(
[Values(1,2,3)] int x,
[Values("A","B")] string s)
{
...
}
I also think it's a bad idea.
Mind you, not throwing random data at your code, but having unit tests doing that. It all boils down to why you unit test in the first place. The answer is "to drive the design of the code". Random data doesn't drive the design of the code, because it depends on a very rigid public interface. Mind you, you can find bugs with it, but that's not what unit tests are about. And let me note that I'm talking about unit tests, and not tests in general.
That being said, I strongly suggest taking a look at QuickCheck. It's Haskell, so it's a bit dodgy on presentation and a bit PhD-ish on documentation, but you should be able to figure it out. I'm going to summarize how it works, though.
After you pick the code you want to test (let's say the sort() function), you establish invariants which should hold. In this examples, you can have the following invariants if result = sort(input):.
Every element in result should be smaller than or equal to the next one.
Every element in input should be present in result the same number of times.
result and input should have the same length (this is repeats the previous, but let's have it for illustration).
You encode each variant in a simple function that takes the result and the output and checks whether those invariants code.
Then, you tell QuickCheck how to generate input. Since this is Haskell and the type system kicks ass, it can see that the function takes a list of integers and it knows how to generate those. It basically generates random lists of random integers and random length. Of course, it can be more fine-grained if you have a more complex data type (for example, only positive integers, only squares, etc.).
Finally, when you have those two, you just run QuickCheck. It generates all that stuff randomly and checks the invariants. If some fail, it will show you exactly which ones. It would also tell you the random seed, so you can rerun this exact failure if you need to. And as an extra bonus, whenever it gets a failed invariant, it will try to reduce the input to the smallest possible subset that fails the invariant (if you think of a tree structure, it will reduce it to the smallest subtree that fails the invariant).
And there you have it. In my opinion, this is how you should go about testing stuff with random data. It's definitely not unit tests and I even think you should run it differently (say, have CI run it every now and then, as opposed to running it on every change (since it will quickly get slow)). And let me repeat, it's a different benefit from unit testing - QuickCheck finds bugs, while unit testing drives design.
Usually the unit test frameworks support 'informative failures' as long as you pick the right assertion method.
However if everything else doesn't work, You could easily trace the dataset to the console/output file. Low tech but should work.
[TestCaseSource("GetDatasets")]
public Test.. (Dataset d)
{
Console.WriteLine(PrettyPrintDataset(d));
// proceed with checks
Console.WriteLine("Worked!");
}
In quickcheck for R we tried to solve this problem as follows
the tests are actually pseudo-random (the seed is fixed) so you can always reproduce your tests results (barring external factors, of course)
the test function returns enough data to reproduce the error, including the assertion that failed and the data that made it fail. A convenience function, repro, called on the return value of test will land you in the debugger at the beginning of the failing assertion, with arguments set to the witnesses of the failure. If the tests are executed in batch mode, equivalent information is stored in a file and the command to retrieve it is printed in stderr. Then you can call repro as before. Whether or not you program in R, I would love to know if this starts to address you requirements. Some aspects of this solution may be hard to implement in languages that are less dynamic or don't have first class functions.

how do I avoid re-implementing the code being tested when I write tests?

(I'm using rspec in RoR, but I believe this question is relevant to any testing system.)
I often find myself doing this kind of thing in my test suite:
actual_value = object_being_tested.tested_method(args)
expected_value = compute_expected_value(args)
assert(actual_value == expected_value, ...)
The problem is that my implementation of compute_expected_value() often ends up mimicking object_being_tested.tested_method(), so it's really not a good test because they may have identical bugs.
This is a rather open-ended question, but what techniques do people use to avoid this trap? (Points awarded for pointers to good treatises on the topic...)
Usually (for manually written unit tests) you would not compute the expected value. Rather, you would just assert against what you expect to be the result from the tested method for the given args. That is, you would have something like this:
actual_value = object_being_tested.tested_method(args)
expected_value = what_you_expect_to_be_the_result
assert(actual_value == expected_value, ...)
In other testing scenarios where the arguments (or even test methods being executed) are generated automatically, you need to devise a simple oracle which will give you the expected result (or an invariant that should hold for the expected result). Tools like Pex, Randoop, ASTGen, and UDITA enable such testing.
Well here are my two cents
a) if the calculation of the expected value is simple and does not encompass any business rules/conditions in there apart from the test case to which it is generating the expected result then it should be good enough... remember your actual code will be as generic as possible.
Well there are cases where you will run into issues in the expected method but you can easily pin point the cos of failure and fix it.
b) there are cases when the expected value cannot be easily calculated in that case probably have flat files with results or probably some kind of constant expected value as naturally you would want that.
Also then there are tests where in you just want to verify whether a particular method was called or not and you are done testing that unit.. remember to use all these different paradigms while testing and always remember KEEP IT SIMPLE
you would not do that.
you do not compute the expected value, you know it already. it should be a constant value defined in your test. (or is constructed from other functions that have already been tested.)

Automated testing feels a lot like duplicating the tested logic, am I doing it right?

I'm implementing automated testing with CppUTest in C++.
I realize I end up almost copying and pasting the logic to be tested on the tests themselves, so I can check the expected outcomes.
Am I doing it right? should it be otherwise?
edit: I'll try to explain better:
The unit being tested takes input A, makes some processing and returns output B
So apart from making some black box checks, like checking that the output lies in an expectable range, I would also like to see if the output B that I got is the right outcome for input A I.E. if the logic is working as expected.
So for example if the unit just makes A times 2 to yield B, then in the test I have no other way of checking than making again the calculation of A times 2 to check against B to be sure it went alright.
That's the duplication I'm talking about.
// Actual function being tested:
int times2( int a )
{
return a * 2;
}
.
// Test:
int test_a;
int expected_b = test_a * 2; // here I'm duplicating times2()'s logic
int actual_b = times2( test_a );
CHECK( actual_b == expected_b );
.
PS: I think I will reformulate this in another question with my actual source code.
If your goal is to build automated tests for your existing code, you're probably doing it wrong. Hopefully you know what the result of frobozz.Gonkulate() should be for various inputs and can write tests to check that Gonkulate() is returning the right thing. If you have to copy Gonkulate()'s convoluted logic to figure out the answer, you might want to ask yourself how well you understand the logic to begin with.
If you're trying to do test-driven development, you're definitely doing it wrong. TDD consists of many quick cycles of:
Writing a test
Watching it fail
Making it pass
Refactoring as necessary to improve the overall design
Step 1 - writing the test first - is an essential part of TDD. I infer from your question that you're writing the code first and the tests later.
So for example if the unit just makes A times 2 to yield B, then in
the test I have no other way of checking than making again the
calculation of A times 2 to check against B to be sure it went
alright.
Yes you do! You know how to calculate A times two, so you don't need to do this in code. if A is 4 then you know the answer is 8. So you can just use it as the expected value.
CHECK( actual_b == 8 )
if you are worried about magic numbers, don't be. Nobody will be confused about the meaning of the hard coded numbers in the following line:
CHECK( times_2(4) == 8 )
If you don't know what the result should be then your unit test is useless. If you need to calculate the expected result, then you are either using the same logic as the function, or using an alternate algorithm to work out the result.In the first case, if the logic that you duplicate is incorrect, your test will still pass! In the second case, you are introducing another place for a bug to occur. If a test fails, you will need to work out whether it failed because the function under test has a bug, or if your test method has a bug.
I think this one is a though to crack because it's essentially a mentality shift. It was somewhat hard for me.
The thing about tests is to have your expectancies nailed down and check if your code really does what you think it does. Think in ways of exercising it, not checking its logic so directly, but as a whole. If that's too hard, maybe your function/method just does too much.
Try to think of your tests as working examples of what your code can do, not as a mathematical proof.
The programming language shouldn't matter.
var ANY_NUMBER = 4;
Assert.That(times_2(ANY_NUMBER), Is.EqualTo(ANY_NUMBER*2)
In this case, I wouldn't mind duplicating the logic. The expected value is readable as compared to 8. Second this logic doesn't look like a change-magnet. Relatively static.
For cases, where the logic is more involved (chunky) and prone to change, duplicating the logic in the test is definitely not recommended. Duplication is evil. Any change to the logic would ripple changes to the test. In that case, I'd use hardcoded input-expected output pairs with some readable pair-names.

Unit testing specific values

Consider the following code (from a requirement that says that 3 is special for some reason):
bool IsSpecial(int value)
if (value == 3)
return true
else
return false
I would unit test this with a couple of functions - one called TEST(3IsSpecial) that asserts that when passed 3 the function returns true and another that passes some random value other than 3 and asserts that the function returns false.
When the requirement changes and say it now becomes 3 and 20 are special, I would write another test that verifies that when called with 20 this function returns true as well. That test would fail and I would then go and update the if condition in the function.
Now, what if there are people on my team who do not believe in unit testing and they make this change. They will directly go and change the code and since my second unit test might not test for 20 (it could be randomly picking an int or have some other int hardcoded). Now my tests aren't in sync with the code. How do I ensure that when they change the code some unit test or the other fails?
I could be doing something grossly wrong here so any other techniques to get around this are also welcome.
That's a good question. As you note a Not3IsNotSpecial test picking a random non-3 value would be the traditional approach. This wouldn't catch a change in the definition of "special".
In a .NET environment you can use the new code contracts capability to write the test predicate (the postcondition) directly in the method. The static analyzer would catch the defect you proposed. For example:
Contract.Ensures(value != 3 && Contract.Result<Boolean>() == false);
I think anybody that's a TDD fan is experimenting with contracts now to see use patterns. The idea that you have tools to prove correctness is very powerful. You can even specify these predicates for an interface.
The only testing approach I've seen that would address this is Model Based Testing. The idea is similar to the contracts approach. You set up the Not3IsNotSpecial condition abstractly (e.g., IsSpecial(x => x != 3) == false)) and let a model execution environment generate concrete tests. I'm not sure but I think these environments do static analysis as well. Anyway, you let the model execution environment run continuously against your SUT. I've never used such an environment, but the concept is interesting.
Unfortunately, that specific scenario is something that is difficult to guard against. With a function like IsSpecial, it's unrealistic to test all four billion negative test cases, so, no, you're not doing something grossly wrong.
Here's what comes to me off the top of my head. Many repositories have hooks that allow you to run some process on each check-in, such as running the unit tests. It's possible to set a criterion that newly checked in code must reach some threshold of code coverage under unit tests. If the commit does not meet certain metrics, it is rejected.
I've never had to set one of these systems up, so I don't know what is involved, but I do know it's possible.
And believe me, I feel your pain. I work with people who are similarly resistant to unit testing.
One thing you need to think about is why 3 is a special character and others are not. If it is defining some aspect of your application, you can take that aspect out and make an enum out of it.
Now you can check here that this test should fail if value doesn't exist in enum. And for enum class write a test to check for possible values. If there is new possible value being added your test should fail.
So your method will become:
bool IsSpecial(int value)
if (SpecialValues.has(value))
return true
else
return false
and your SpecialValues will be an enum like:
enum SpecialValues {
Three(3), Twenty(20)
public int value;
}
and now you should write to test possible values for enum. A simple test can be to check total number of possible values and another test can be to check the possible values itself
The other point to make is that in a less contrived example:
20 might have been some valid condition to test for based on knowledge of the business domain. Writing tests in a BDD style based on knowledge of the business problem might have helped you explicitly catch it.
4 might have been a good value to test for due to its status as a boundary condition. This may have been more likely to change in the real world so would more likely show up in a full test case.