cyclomatic complexity = 1 + #if statements? - if-statement

I found the following paragraph regarding cyclomatic complexity on Wikipedia:
It can be shown that the cyclomatic complexity of any structured program with only one entrance point and one exit point is equal to the number of decision points (i.e., "if" statements or conditional loops) contained in that program plus one.
That would imply a cyclomatic complexity of 3 for two arbitrary nested if statements:
if (a)
{
if (b)
{
foo();
}
else
{
bar();
}
}
else
{
baz();
}
Since exactly one of the three functions is going to be called, my gut agrees with 3.
However, two arbitrary if statements could also be written in sequence instead of nesting them:
if (a)
{
foo();
}
else
{
bar();
}
if (b)
{
baz();
}
else
{
qux();
}
Now there are four paths through the code:
foo, baz
foo, qux
bar, baz
bar, qux
Shouldn't the cyclomatic complexity of this fragment hence be 4 instead of 3?
Am I misunderstanding the quoted paragraph?

Cyclomatic complexity is defined as the number of linearly independent paths through the code.
In your second example we have the following paths that runs...
| # | A | B | Nodes hit |
| 1 | true | true | foo() baz() |
| 2 | true | false | foo() qux() |
| 3 | false | true | bar() baz() |
| 4 | false | false | bar() qux() |
You are completely correct that the number of execution paths here is 4. And yet the cyclomatic complexity is 3.
The key is in understanding what cyclomatic complexity measures:
Definition:
A linearly independent path is any path through the program that
introduces at least one new edge that is not included in any other
linearly independent paths.
from http://www.ironiacorp.com/
The 4th path is not linearly independent of the first three paths, as it does not introduce any new nodes / program statements that were not included in the first three paths.
As mentioned on the wikipedia article, the cyclomatic complexity is always less than or equal to the number of theoretical unique control flow paths, and is always greater than or equal to the minimum number of actually achievable execution paths.
(to verify the second statement, imagine if b == a was always true when entering the code block you described).

I agree with the explanation of perfectionist. Here is a more informal explanation in the case of the Java language:
McCabe's Cyclomatic Complexity (McCC) for a method is expressed as the number of independent control flow paths in it. It represents a lower bound for the number of possible execution paths in the source code and at the same time it is an upper bound for the minimum number of test cases needed for achieving full branch test coverage. The value of the metric is calculated as the number of the following instructions plus 1: if, for, foreach, while, do-while, case label (which belongs to a switch instruction), catch, conditional statement (?:). Moreover, logical “and” (&&) and logical “or” (||) expressions also add 1 to the value because their short-circuit evaluation can cause branching depending on the first operand. The following instructions are not included: else, switch, default label (which belongs to a switch instruction), try, finally.

Related

Confusion while evaluating conditions with more than one logical NOT operator

I'm confused when code includes more than one NOT operator like:
if (!x != !0)
;
or similar. I can read it as: if NOT x is NOT equal to NOT zero,
but in my mind I'm totally confused about what it actually means.
Do you have any advice regarding this? I.e. how to understand such code, where to start reading or etc.
Another example:
if(!x == !1)
You can use truth table if you are not sure. For instance
x | 0 | x!=0 | !x | !0 | !x != !0
0 | 0 | 0 | 1 | 1 | 0
1 | 0 | 1 | 0 | 1 | 1
If have problems with many && and ||, use de Morgan's laws
to make things simpler, evaluate the operator! first then read L->R.
Things to remember:
!0 = 1 // true
!1 = 0 // false
so your condition can be simplified to:
if (!x != true) // !0
if (!x == false) // !1
Now, any non-zero value when inverted will be zero.
int x = 10;
!x // zero
Zero when inverted is true.
int x = 0;
!x // one
In C or C++, true is 1 and false is 0
I got troubled as well with simalar syntax when I started developing on PowerBuilder, then I realized I just need to imagine it as a nested if checking false.
For example !x become if(x)=false so it makes more clear that is true when x is false or zero.
In C 0 is false, whatever is not zero is true.
In the same logic !1 is always false and !0 is always true, despite I cannot see the reason to type it in this confusing way, maybe the code you are looking at is coming out from a sort of automatic generator / converter.
First take a look at operator precedence. You will see that logical operators like ! take precedence over relational operators like !=.
Secondly, what is !0 - this suspiciously sounds like there is an implicit conversion from int to bool there - otherwise !0 would make no sense at all.
In your example, you need to evaluate the logical operators first (i.e. !x and !0), then check if they are not equal !=. That said, this kind of code is really bad as it is really hard to read - avoid writing code like this, if possible (and consider refactoring it - while covered by unit tests - if you encounter it in "the wild")
I assume that the question is about C, and therefore I will give the answer in C.
You need to know the precedence rules - the ! binds stronger than the !=. Therefore the expression can be clarified using parentheses as (!x) != (!0). Now, the next thing is to know what ! will do - the result will be 0 if the operand is non-zero, 1 if it is zero. Therefore, !0 can be constant-folded to 1.
Now we have (!x) != 1. Since the !x is 1 iff x is zero, the result of this entire expression will be 1 iff x is non-zero, 0 otherwise.
We can therefore reduce this expression into the more idiomatic double-negation: !!x. However, the if clause in itself tests whether the expression is non-zero, therefore the entire if statement can be changed to if (x) ; (and since the expression does guard only a null-statement, it can be elided altogether)
This kind of code is designed to trick you. Your brain have difficulty to process double-negation or triple negation (and you are not the only one).
You need to know what are the priority rules, and apply it:
(!x == !1) is equal to ((!x) == (!1))
If you see this kind of code during a code review, you should definitely highlight it, and ask for an update.
Please note that in C++ you can also use not instead of !. It can makes things easier to understand:
(!x == !1) is equal to ((not x) == (not 1))
In C and C++, a value of zero is considered "false", and a nonzero value is considered "true". But it can occasionally be confusing, and occasionally causes minor problems, that two different values can both be considered true even though they're, well, different.
For example, suppose you had a function is_rich() that told you whether a person was rich or not. And suppose you wrote code like this:
int harry_is_rich = is_rich("Harry");
int sally_is_rich = is_rich("Sally");
Now the variable harry_is_rich is zero or nonzero according to whether Harry is poor or rich, and sally_is_rich is zero or nonzero according to whether Sally is poor or rich.
Now suppose you're interested in knowing whether Harry and Sally are both rich, or both poor. Your first thought might be
if(harry_is_rich == sally_is_rich)
But wait. Since any nonzero value is considered "true" in C, what if, for some reason, the is_rich() function returned 2 when Harry was rich, and returned 3 when Sally was rich? That wouldn't be a bug, per se -- the is_rich() function perfectly meets its specification of returning nonzero if the person is rich -- but it leads to the situation that you can't write
if(harry_is_rich == sally_is_rich)
(Well, of course you can write it, but like any buggy code, it might not work right.)
So what can you do instead? Well, one possibility is to write
if(!harry_is_rich == !sally_is_rich)
You can read this as "If it's not the case that Harry is rich has the same truth value as not the case that Sally is rich". And, while it's obviously a little contorted, you can kind of convince yourself that it "means" the same thing.
And, although it's a little confusing, it has the advantage of working. It works because of the other confusing aspect of true/false values in C and C++.
Although, as I said, zero is considered false and any nonzero value is considered true, the built-in operators that generate true/false values -- the ones like &&, ||, and ! -- are in fact guaranteed to give you exactly 0 or 1. (That is, the built-in functions are significantly different that functions like is_rich() in this regard. In general, is_rich() might return 2, or 3, or 37 for "true". But &&, ||, and ! are guaranteed to return 1 for true, and 0 for false.)
So when you say !harry_is_rich, you'll get 0 if Harry is rich, and 1 if Harry is not rich. When you say !sally_is_rich, you'll get 0 if Sally is rich, and 1 if Sally is not rich. And if you say
if(!harry_is_rich == !sally_is_rich)
you'll correctly discover whether Harry and Sally are both rich, or both poor, regardless of what values is_rich() chooses to return.
One more thing, though. In all of this I've been considering integers (or other types that might have lots of values). But both C and C++ have bool types, that more succinctly represent true/false values. For a bool type, there are exactly two values, true and false (which are represented as 1 and 0). So for a bool type, you can't have a true value that's anything other than 1. So if the is_rich() function was declared as returning bool, or if you declared your variables as bool, that would be another way of forcing the values to 0/1, and in that case the condition
if(harry_is_rich == sally_is_rich)
would work fine. (Furthermore, when I said that operators like &&, ||, and ! are guaranteed to return exactly 1 or 0, that's basically because these operators don't "return" arbitrary integers, but instead "return" bools.)
See also the C FAQ list, question 9.2.
(Oh, wait. One more thing. The opposite of "rich" isn't necessarily "poor", of course. :-) )

Cyclomatic complexity and basis path

Lets, consider we have the following method:
public void testIt(boolean a, boolean b){
if (a && b){
...
}
if (a){
....
}
}
Cyclomatic complexity of this method =3. So, according to basis path testing we should have 3 tests in order to achieve statement and decision coverage.
However, I see that I can use only two tests (true, true) and (false,false) to achieve statement and decision coverage. Where is my mistake?
Yes, you are right. The Cyclomatic complexity is 3 and the cases you should verify are:
a is false, b - we don't care (I explain later...) -> nothing happened
a is true, b is false -> only the second condition was executed
a and b are true -> both the first and the second conditions were executed
If you look only through the arguments the first option I mentioned has 2 different inputs(b true/false), however in both cases the same should happen, so I offer you to verify it only once or to use the equivalent to C# test case attribute
answer from #OldFox is correct, just some additions:
Where is my mistake?
CC is upper bound for branch coverage, but it is lower bound for paths coverage (not the same as line/statement coverage, see below). So you need up to 3 tests to cover all branches/conditions and at least 3 tests to cover all paths.
Here is the graph of your function:
CC = 6 - 5 + 1*2 = 3 according to definition
To cover all branches you need up to 3 tests, and actually you need all 3 to cover (true, true), (true, false), and (false, *)
To cover all paths you need at least 3 tests, there are 3 possible independent paths in graph, so you need only 3 tests to cover all paths.
There could be some confusion about number of different inputs, which is 4, but not all paths formed by these inputs are independent, two of them (when a is false) are actually the same, so there are only 3 independent paths
conclusion, 3 test cases are necessary and sufficient to provide both branch and paths coverage for your function.
now about line/statement coverage: you need to execute every line at least ones, to achieve this you need only one test (true, true), but clearly it is not enough for other types of coverage.

Calculating combinations with duplicated values and stacks that hold more than one value

Brace yourself. Given a set of number with various values repeated (i.e. 1,2,2,2,3,4,5,5,6,7,7) and special slots that can hold an infinite amount of values, how would one calculate the possible combinations where each value is distributed into one of the slots once and only once? Another restriction is that each slot has to have at least one value. The final restriction is that value combinations within a single slot cannot be repeated within a single trial. For instance:
1 | 2,7 | 2,3,4,5 | 2,7 | 6
This would be illegal because "2,7" is repeated within a single set (meaning one combination). The numbers above act as a single combination, or one trial. To press the return key would initialize a second trial (combination), in which "2,7" could be repeated with no error. Whereas:
1,2 | 2,2,3,4| 5,5,6 | 7,7
and
1,2 | 2,2,3| 5,5,6 | 4,7,7
would be legal because "5,5,6" is only repeated in a separate trial. Above are two separate combinations (two trials). "5,5,6" is, indeed, repeated but the repetition clause only applies and is illegal when repetition is present within one combination.
I'm not sure how to apply basic combination arithmetic to this problem or even if basic formulas could apply. How would this problem be calculated? Help.

Initiating Short circuit rule in Bison for && and || operations

I'm programming a simple calculator in Bison & Flex , using C/C++ (The logic is done in Bison , and the C/C++ part is responsible for the data structures , e.g. STL and more) .
I have the following problem :
In my calculator the dollar sign $ means i++ and ++i (both prefix and postfix) , e.g. :
int y = 3;
-> $y = 4
-> y$ = 4
When the user hits : int_expression1 && int_expression2 , if int_expression1 is evaluated to 0 (i.e. false) , then I don't wan't bison to evaluate int_expression2 !
For example :
int a = 0 ;
int x = 2 ;
and the user hits : int z = a&&x$ ...
So , the variable a is evaluated to 0 , hence , I don't want to evaluate x , however it still grows by 1 ... here is the code of the bison/c++ :
%union
{
int int_value;
double double_value;
char* string_value;
}
%type <int_value> int_expr
%type <double_value> double_expr
%type <double_value> cmp_expr
int_expr:
| int_expr '&&' int_expr { /* And operation between two integers */
if ($1 == 0)
$$ = 0;
else // calc
$$ = $1 && $3;
}
How can I tell bison to not evaluate the second expression , if the first one was already evaluated to false (i.e. 0) ?
Converting extensive commentary into an answer:
How can I tell Bison to not evaluate the second expression if the first one was already evaluated to false?
It's your code that does the evaluation, not Bison; put the 'blame' where it belongs.
You need to detect that you're dealing with an && rule before the RHS is evaluated. The chances are that you need to insert some code after the && and before the second int_expr that suspends evaluation if the first int_expr evaluates to 0. You'll also need to modify all the other evaluation code to check for and obey a 'do not evaluate' flag.
Alternatively, you have the Bison do the parsing and create a program that you execute when the parse is complete, rather than evaluating as you parse. That is a much bigger set of changes.
Are you sure regarding putting some code before the second int_expr ? I can't seem to find a plausible way to do that. It's a nice trick, but I can't find a way to actually tell Bison not to evaluate the second int_expr, without ruining the entire evaluation.
You have to write your code so that it does not evaluate when it is not supposed to evaluate. The Bison syntax is:
| int_expr '&&' {...code 1...} int_expr {...code 2...}
'Code 1' will check on $1 and arrange to stop evaluating (set a global variable or something similar). 'Code 2' will conditionally evaluate $4 (4 because 'code 1' is now $3). All evaluation code must obey the dictates of 'code 1' — it must not evaluate if 'code 1' says 'do not evaluate'. Or you can do what I suggested and aselle suggested; parse and evaluate separately.
I second aselle's suggestion about The UNIX Programming Environment. There's a whole chapter in there about developing a calculator (they call it hoc for higher-order calculator) which is worth reading. Be aware, though, that the book was published in 1984, and pre-dates the C standard by a good margin. There are no prototypes in the C code, and (by modern standards) it takes a few liberties. I do have hoc6 (the last version of hoc they describe; also versions 1-3) in modern C — contact me if you want it (see my profile).
That's the problem: I can't stop evaluating in the middle of the rule, since I cannot use return (I can, but of no use; it causes the program to exit). | intExpr '&&' { if ($1 == 0) {/* turn off a flag */ } } intExpr { /* code */} After I exit $3 the $4 is being evaluated automatically.
You can stop evaluating in the middle of a rule, but you have to code your expression evaluation code block to take the possibility into account. And when I said 'stop evaluating', I meant 'stop doing the calculations', not 'stop the parser in its tracks'. The parsing must continue; your code that calculates values must only evaluate when evaluation is required, not when no evaluation is required. This might be an (ugh!) global flag, or you may have some other mechanism.
It's probably best to convert your parser into a code generator and execute the code after you've parsed it. This sort of complication is why that is a good strategy.
#JonathanLeffler: You're indeed the king ! This should be an answer !!!
Now it is an answer.
You almost assuredly want to generate some other representation before evaluating in your calculator. A parse tree or ast are classic methods, but a simple stack machine is also popular. There are many great examples of how to do this, but my favorite is
http://www.amazon.com/Unix-Programming-Environment-Prentice-Hall-Software/dp/013937681X
That shows how to take a simple direct evaluation tool like you have made in yacc (old bison) and take it all the way to a programming language that is almost as powerful as BASIC. All in very few pages. It's a very old book but well worth the read.
You can also look at SeExpr http://www.disneyanimation.com/technology/seexpr.html
which is a simple expression language calculator for scalars and 3 vectors. If you look at https://github.com/wdas/SeExpr/blob/master/src/SeExpr/SeExprNode.cpp
on line 313 you will see the && implementation of he eval() function:
void
SeExprAndNode::eval(SeVec3d& result) const
{
// operands and result must be scalar
SeVec3d a, b;
child(0)->eval(a);
if (!a[0]) {
result[0] = 0;
} else {
child(1)->eval(b);
result[0] = (b[0] != 0.0);
}
}
That file contains all objects that represent operations in the parse tree. These objects are generated as the code is parsed (these are the actions in yacc). Hope this helps.

improving performance of a dpll algorithm

I'm implementing a DPLL algorithm in C++ as described in wikipedia:
function DPLL(Φ)
if Φ is a consistent set of literals
then return true;
if Φ contains an empty clause
then return false;
for every unit clause l in Φ
Φ ← unit-propagate(l, Φ);
for every literal l that occurs pure in Φ
Φ ← pure-literal-assign(l, Φ);
l ← choose-literal(Φ);
return DPLL(Φ ∧ l) or DPLL(Φ ∧ not(l));
but having an awful performance. In this step:
return DPLL(Φ ∧ l) or DPLL(Φ ∧ not(l));
currently I'm trying to avoid creating copies of Φ but instead adding l or not(l) to the one and only copy of Φ and remove them when/if DPLL()'s return false. This seems to break the algorithm giving wrong results (UNSATISFIABLE even though the set is SATISFIABLE).
Any suggestions on how to avoid explicit copies in this step?
A less naive approach to DPLL avoids copying the formula by recording the variable assignments and the changes made to the clauses in the unit-propagation and pure-literal assignment steps and then undoes the changes (backtracks) when an empty clause is produced. So when a variable x is assigned true, you would mark all clauses containing a positive literal of x as inactive (and ignore them thereafter since they are satisfied) and remove -x from all clauses that contain it. Record which clauses had -x in them so you can backtrack later. Also record which clauses you marked inactive, for the same reason.
Another approach is to keep track of the number of unassigned variables in each unsatisfied clause. Record when the number decreases so you can backtrack later. Do unit propagation if the count reaches 1, backtrack if the number reaches 0 and all the literals are false.
I wrote "less naive" above because there are still better approaches. Modern DPLL-type SAT solvers use a lazy clause updating scheme called "two watched literals" that has the advantage of not needing to remove literals from clauses and thus not needing to restore them when a bad assignment is found. The variable assignments still have to be recorded and backtracked, but not having to update the clause-related structures makes two watched literals faster than any other known backtracking scheme for SAT solvers. You'll no doubt learn about this later in your class.