Say I have the following two functions:
add_five (number) -> number + 2
add_six (number) -> add_five(number) + 1
As you can see, add_five has a bug.
If I now test add_six it would fail because the result is incorrect, but the code is correct.
Imagine you have a large tree of functions calling each other, it would be hard to find out which function contains the bug, because all the functions will fail (and not only the one with the bug).
So my question is: should unit tests fail because of incorrect behaviour (wrong results) or because of incorrect code (bugs).
should unit tests fail because of incorrect behaviour (wrong results) or because of incorrect code (bugs)?
Unit tests usually fail because of wrong results. That's what you write in assertions: you call a method and you define the expected result.
Unit tests cannot identify incorrect code. If the operation is return number+5 and your CPU or your RAM has a hardware problem and return something different, then the test will fail as well, even if the code is correct.
Also consider:
public int add_five(int number)
{
Thread.Sleep(5000);
return number+5;
}
How shall the unit test know whether the Sleep is intended or not?
So, if any unit test fails, it's your job to look at it, find out why it fails and if it fails in a different method, write a new unit test for that method so you can exclude that method next time.
Presumably you have a test for add_five/1 and a test for add_six/1. In this case, the test for add_six/1 will fail alongside the test for add_five/1.
Let's assume you decide to check add_six/1 first. You see that it depends on add_five/1, which is also failing. You can immediately assume that the bug in add_five/1 is cascading up.
Your module dependencies form a directed (hopefully acyclic) graph. If a dependency of you function or module is broken, that should be what you target for debugging first.
Another option is to mock out the add_five function when testing your add_sixfunction, but this quickly creates a lot of extra typing and logic duplication. If you change the spec of add_five, you have to change every place you reimplemented it as a mock.
If you use a quickcheck-style testing library, you can test for certain logic errors based on the properties of what you're testing. These bugs are detected using randomly generated cases that produce incorrect results, but all you as a tester write are library-specific definitions of the properties you're testing for. However, this will also suffer from dependency breakages unless you've mocked out dependent modules/functions.
Is it possible to do a compile-time-assertion that a function get called at least one occurrence?
It is not related to how many times the function is executed.
//C.h
class C{ //Callee
void f();
};
//C.cpp
void C::f(){ //note: non static
assert_called_at_least_once(); //<--- I expected something like this $$
............. some complex thing ......
}
//D.cpp
D::f2(){ //Caller
C c;
if(false){
//^ This line is to emphasize that I don't care whether c.f() will be executed
c.f(); //<--- if I comment out this line, an error should occur.
//( Assume that it is only one occurrence of C::f(). )
}
}
It is for debugging purpose.
Edit
As requested, I will provide more information why I want this feature.
My game has many sub-systems. C is one of them.
I want to make sure that my game calls a certain function in a certain sub-system (e.g. Bullet System updates bullet's position) at least once.
Here is the reason.
Sometimes, when I want to narrow the scope that a bug can occur, I disable the function by commenting out this line in my big System (D) :-
bulletSystem->update();
After the bug is found, I sometimes forgot to re-enable it back.
In some cases, it is easier if I can insert an assert line inside the function rather than set breakpoint.
Edit2:
I prefer a solution explaining about C++ feature rather than a solution to solve my specific example about game.
If such solution simply does not exist (suggested by M.M and n.m.), please post as an answer, so I can sadly accept it.
Edit3:
#define #ifdef ... (suggested by rezdm)
or other # seem to be useful.
However, I demand a solution that
- the $$-line is replaced by <= 1 line of simple code, because more complexity = more bugs
- does not require any modification in caller (D)
Based on a poorly stated and recently deleted SO question ("Is it possible to call a function without calling it?") I have a similar question, hopefully put in a more logical perspective.
Is it possible / what are the best practices, to disable a function call from a codebase ? By disabling I don't mean greping through the whole code to manually comment out the function (which is a valid but somewhat tedious task). The only ways I can think of are
Returning as soon as entering function
ret_type foo()
{
return ret_type();
// actual implementation is not allowed to run
}
which would be a bit dangerous when the return code is used by caller functions.
Replace the declaration with an idle macro
ret_type foo();
#define foo() do { void; } while (0);
Is there a standard way, maybe a compiler hook, a pragma directive to do this and if not what are some other ways?
Is there a standard way, maybe a compiler hook, a pragma directive to do this and if not what are some other ways?
Let's just think for a minute, together. Let's consider two main cases:
the function returns void
the function returns something
In the first case you can simply take the body of the function and comment it out. BOOM: disabled.
In the second case you have a return value. Let's consider other two cases:
the returned value is used
the returned value is not used
In the first case you should ask yourself: can I return a dummy value and get away with it? If the answer is yes, then do so. If not, then you can't do anything about it except refactor your entire code.
In the second case you can comment it out, but why you are returning a value in the first place.
Regarding the classic test pattern of Arrange-Act-Assert, I frequently find myself adding a counter-assertion that precedes Act. This way I know that the passing assertion is really passing as the result of the action.
I think of it as analogous to the red in red-green-refactor, where only if I've seen the red bar in the course of my testing do I know that the green bar means I've written code that makes a difference. If I write a passing test, then any code will satisfy it; similarly, with respect to Arrange-Assert-Act-Assert, if my first assertion fails, I know that any Act would have passed the final Assert - so that it wasn't actually verifying anything about the Act.
Do your tests follow this pattern? Why or why not?
Update Clarification: the initial assertion is essentially the opposite of the final assertion. It's not an assertion that Arrange worked; it's an assertion that Act hasn't yet worked.
This is not the most common thing to do, but still common enough to have its own name. This technique is called Guard Assertion. You can find a detailed description of it on page 490 in the excellent book xUnit Test Patterns by Gerard Meszaros (highly recommended).
Normally, I don't use this pattern myself, since I find it more correct to write a specific test that validates whatever precondition I feel the need to ensure. Such a test should always fail if the precondition fails, and this means that I don't need it embedded in all the other tests. This gives a better isolation of concerns, since one test case only verifies one thing.
There may be many preconditions that need to be satisfied for a given test case, so you may need more than one Guard Assertion. Instead of repeating those in all tests, having one (and one only) test for each precondition keeps your test code more mantainable, since you will have less repetition that way.
It could also be specified as Arrange-Assume-Act-Assert.
There is a technical handle for this in NUnit, as in the example here:
http://nunit.org/index.php?p=theory&r=2.5.7
Here's an example.
public void testEncompass() throws Exception {
Range range = new Range(0, 5);
assertFalse(range.includes(7));
range.encompass(7);
assertTrue(range.includes(7));
}
It could be that I wrote Range.includes() to simply return true. I didn't, but I can imagine that I might have. Or I could have written it wrong in any number of other ways. I would hope and expect that with TDD I actually got it right - that includes() just works - but maybe I didn't. So the first assertion is a sanity check, to ensure that the second assertion is really meaningful.
Read by itself, assertTrue(range.includes(7)); is saying: "assert that the modified range includes 7". Read in the context of the first assertion, it's saying: "assert that invoking encompass() causes it to include 7. And since encompass is the unit we're testing, I think that's of some (small) value.
I'm accepting my own answer; a lot of the others misconstrued my question to be about testing the setup. I think this is slightly different.
An Arrange-Assert-Act-Assert test can always be refactored into two tests:
1. Arrange-Assert
and
2. Arrange-Act-Assert
The first test will only assert on that which was set up in the Arrange phase, and the second test will only assert for that which happened in the Act phase.
This has the benefit of giving more precise feedback on whether it's the Arrange or the Act phase that failed, while in the original Arrange-Assert-Act-Assert these are conflated and you would have to dig deeper and examine exactly what assertion failed and why it failed in order to know if it was the Arrange or Act that failed.
It also satisfies the intention of unit testing better, as you are separating your test into smaller independent units.
I am now doing this. A-A-A-A of a different kind
Arrange - setup
Act - what is being tested
Assemble - what is optionally needed to perform the assert
Assert - the actual assertions
Example of an update test:
Arrange:
New object as NewObject
Set properties of NewObject
Save the NewObject
Read the object as ReadObject
Act:
Change the ReadObject
Save the ReadObject
Assemble:
Read the object as ReadUpdated
Assert:
Compare ReadUpdated with ReadObject properties
The reason is so that the ACT does not contain the reading of the ReadUpdated is because it is not part of the act. The act is only changing and saving. So really, ARRANGE ReadUpdated for assertion, I am calling ASSEMBLE for assertion. This is to prevent confusing the ARRANGE section
ASSERT should only contain assertions. That leaves ASSEMBLE between ACT and ASSERT which sets up the assert.
Lastly, if you are failing in the Arrange, your tests are not correct because you should have other tests to prevent/find these trivial bugs. Because for the scenario i present, there should already be other tests which test READ and CREATE. If you create a "Guard Assertion", you may be breaking DRY and creating maintenance.
I don't use that pattern, because I think doing something like:
Arrange
Assert-Not
Act
Assert
May be pointless, because supposedly you know your Arrange part works correctly, which means that whatever is in the Arrange part must be tested aswell or be simple enough to not need tests.
Using your answer's example:
public void testEncompass() throws Exception {
Range range = new Range(0, 5);
assertFalse(range.includes(7)); // <-- Pointless and against DRY if there
// are unit tests for Range(int, int)
range.encompass(7);
assertTrue(range.includes(7));
}
Tossing in a "sanity check" assertion to verify state before you perform the action you're testing is an old technique. I usually write them as test scaffolding to prove to myself that the test does what I expect, and remove them later to avoid cluttering tests with test scaffolding. Sometimes, leaving the scaffolding in helps the test serve as narrative.
I've already read about this technique - possibly from you btw - but I do not use it; mostly because I'm used to the triple A form for my unit tests.
Now, I'm getting curious, and have some questions: how do you write your test, do you cause this assertion to fail, following a red-green-red-green-refactor cycle, or do you add it afterwards ?
Do you fail sometimes, perhaps after you refactor the code ? What does this tell you ? Perhaps you could share an example where it helped. Thanks.
I have done this before when investigating a test that failed.
After considerable head scratching, I determined that the cause was the methods called during "Arrange" were not working correctly. The test failure was misleading. I added a Assert after the arrange. This made the test fail in a place which highlighted the actual problem.
I think there is also a code smell here if the Arrange part of the test is too long and complicated.
In general, I like "Arrange, Act, Assert" very much and use it as my personal standard. The one thing it fails to remind me to do, however, is to dis-arrange what I have arranged when the assertions are done. In most cases, this doesn't cause much annoyance, as most things auto-magically go away via garbage collection, etc. If you have established connections to external resources, however, you will probably want to close those connections when you're done with your assertions or you many have a server or expensive resource out there somewhere holding on to connections or vital resources that it should be able to give away to someone else. This is particularly important if you're one of those developers who does not use TearDown or TestFixtureTearDown to clean up after one or more tests. Of course, "Arrange, Act, Assert" is not responsible for my failure to close what I open; I only mention this "gotcha" because I have not yet found a good "A-word" synonym for "dispose" to recommend! Any suggestions?
Have a look at Wikipedia's entry on Design by Contract. The Arrange-Act-Assert holy trinity is an attempt to encode some of the same concepts and is about proving program correctness. From the article:
The notion of a contract extends down to the method/procedure level; the
contract for each method will normally contain the following pieces of
information:
Acceptable and unacceptable input values or types, and their meanings
Return values or types, and their meanings
Error and exception condition values or types that can occur, and their meanings
Side effects
Preconditions
Postconditions
Invariants
(more rarely) Performance guarantees, e.g. for time or space used
There is a tradeoff between the amount of effort spent on setting this up and the value it adds. A-A-A is a useful reminder for the minimum steps required but shouldn't discourage anyone from creating additional steps.
Depends on your testing environment/language, but usually if something in the Arrange part fails, an exception is thrown and the test fails displaying it instead of starting the Act part. So no, I usually don't use a second Assert part.
Also, in the case that your Arrange part is quite complex and doesn't always throw an exception, you might perhaps consider wrapping it inside some method and writing an own test for it, so you can be sure it won't fail (without throwing an exception).
If you really want to test everything in the example, try more tests... like:
public void testIncludes7() throws Exception {
Range range = new Range(0, 5);
assertFalse(range.includes(7));
}
public void testIncludes5() throws Exception {
Range range = new Range(0, 5);
assertTrue(range.includes(5));
}
public void testIncludes0() throws Exception {
Range range = new Range(0, 5);
assertTrue(range.includes(0));
}
public void testEncompassInc7() throws Exception {
Range range = new Range(0, 5);
range.encompass(7);
assertTrue(range.includes(7));
}
public void testEncompassInc5() throws Exception {
Range range = new Range(0, 5);
range.encompass(7);
assertTrue(range.includes(5));
}
public void testEncompassInc0() throws Exception {
Range range = new Range(0, 5);
range.encompass(7);
assertTrue(range.includes(0));
}
Because otherwise you are missing so many possibilities for error... eg after encompass, the range only inlcudes 7, etc...
There are also tests for length of range (to ensure it didn't also encompass a random value), and another set of tests entirely for trying to encompass 5 in the range... what would we expect - an exception in encompass, or the range to be unaltered?
Anyway, the point is if there are any assumptions in the act that you want to test, put them in their own test, yes?
I use:
1. Setup
2. Act
3. Assert
4. Teardown
Because a clean setup is very important.
In my most C++ project I heavily used ASSERTION statement as following:
int doWonderfulThings(const int* fantasticData)
{
ASSERT(fantasticData);
if(!fantasticData)
return -1;
// ,,,
return WOW_VALUE;
}
But TDD community seems like to enjoy doing something like this:
int doMoreWonderfulThings(const int* fantasticData)
{
if(!fantasticData)
return ERROR_VALUE;
// ...
return AHA_VALUE;
}
TEST(TDD_Enjoy)
{
ASSERT_EQ(ERROR_VALUE, doMoreWonderfulThings(0L));
ASSERT_EQ(AHA_VALUE, doMoreWonderfulThings("Foo"));
}
Just with my experiences first approaches let me remove so many subtle bugs.
But TDD approaches are very smart idea to handle legacy codes.
"Google" - they compare "FIRST METHOD" to "Walk the shore with life-vest, swim ocean without any safe guard".
Which one is better?
Which one makes software robust?
In my (limited) experience the first option is quite a bit safer. In a test-case you only test predefined input and compare the outcome, this works well as long as every possible edge-case has been checked. The first option just checks every input and thus tests the 'live' values, it filters out bugs real quickly, however it comes with a performance penalty.
In Code Complete Steve McConnell learns us the first method can be used successfully to filter out bugs in a debug build. In release build you can filter-out all assertions (for instance with a compiler flag) to get the extra performance.
In my opinion the best way is to use both methods:
Method 1 to catch illegal values
int doWonderfulThings(const int* fantasticData)
{
ASSERT(fantasticData);
ASSERTNOTEQUAL(0, fantasticData)
return WOW_VALUE / fantasticData;
}
and method 2 to test edge-cases of an algorithm.
int doMoreWonderfulThings(const int fantasticNumber)
{
int count = 100;
for(int i = 0; i < fantasticNumber; ++i) {
count += 10 * fantasticNumber;
}
return count;
}
TEST(TDD_Enjoy)
{
// Test lower edge
ASSERT_EQ(0, doMoreWonderfulThings(-1));
ASSERT_EQ(0, doMoreWonderfulThings(0));
ASSERT_EQ(110, doMoreWonderfulThings(1));
//Test some random values
ASSERT_EQ(350, doMoreWonderfulThings(5));
ASSERT_EQ(2350, doMoreWonderfulThings(15));
ASSERT_EQ(225100, doMoreWonderfulThings(150));
}
Both mechanisms have value. Any decent test framework will catch the standard assert() anyway, so a test run that causes the assert to fail will result in a failed test.
I typically have a series of asserts at the start of each c++ method with a comment '// preconditions'; it's just a sanity check on the state I expect the object to have when the method is called. These dovetail nicely into any TDD framework because they not only work at runtime when you're testing functionality but they also work at test time.
There is no reason why your test package cannot catch asserts such as the one in doMoreWonderfulThings. This can be done either by having your ASSERT handler support a callback mechanism, or your test asserts contain a try/catch block.
I don't know which particlar TDD subcommunity you're refering to but the TDD patterns I've come across either use Assert.AreEqual() for positive results or otherwise use an ExpectedException mechanism (e.g., attributes in .NET) to declare the error that should be observed.
In C++, I prefer method 2 when using most testing frameworks. It usually makes for easier to understand failure reports. This is invaluable when a test months to years after the test was written.
My reason is that most C++ testing frameworks will print out the file and line number of where the assert occurred without any kind of stack trace information. So most of the time you will get the reporting line number inside of the function or method and not inside of the test case.
Even if the assert is caught and re-asserted from the caller the reporting line will be with the catch statement and may not be anywhere close to the test case line which called the method or function that asserted. This can be really annoying when the function that asserted may have been used on multiple times in the test case.
There are exceptions though. For example, Google's test framework has a scoped trace statement which will print as part of the trace if an exception occurs. So you can wrap a call to generalized test function with the trace scope and easily tell, within a line or two, which line in the exact test case failed.