How are if statements are excuted in a function? - c++

I stumbled across an interesting case that makes me question what I know about if statements.
I have this code for insertion for binary trees:
void insert(Node<T>* &currentNode, const T& x)
{
if (currentNode == nullptr)
{
currentNode = new Node<T>(x);
}
if (x >= binTree<T>::root->data)
{
insert(currentNode->right, x);
}
else if (x < binTree<T>::root->data)
{
insert(currentNode->left, x);
}
}
The problem I was facing was that whenever I called this function to insert into the tree, it gave me a segmentation fault, after running valgrind, it told me that a stack overflow error occurred. I tested to see if the first if block was causing the problem by writing a cout statement and lo and behold my cout statement was excuted infinitely many times.
However, after changing that second if block to one single unified if statement like this:
if {}
else if {}
else if {}
The code worked perfectly and was not stuck in an infinite loop. How does one explain this behavior? Doesn't the if statement just test the condition and if it is false it continues on the rest of the block?

It should be quite obvious.
Without the else, each call to insert always makes at least one more call to insert, leading to an infinite number of calls. The second if is always executed, and either way it calls insert.
With the else, it is possible for insert not to call insert -- if currentNode is null.

Not expecting any credit for this answer since it's coming in late, but wanted to add another perspective. The problem here is that you've created (intentionally or not) a recursive function without a base case (which equals stack overflow).
It's guaranteed that one of the following statements will be true:
if (x >= binTree<T>::root->data)
{
insert(currentNode->right, x);
}
else if (x < binTree<T>::root->data) //could just as well have been only "else"
{
insert(currentNode->left, x);
}
By implementing the solution you found and which #David Schwartz confirmed, you're essentially converting the if (currentNode == nullptr) block into the base case, which solves the problem.

Related

Else keyword in non void function in C++ [duplicate]

I am always in the habit of using if, else-if statement instead of multiple if statements.
Example:
int val = -1;
if (a == b1) {
return c1;
} else if (a == b2) {
return c2;
} ...
...
} else {
return c11;
}
How does it compare to example 2:
if (a == b1) {
return c1;
}
if (a == b2) {
return c2;
}
....
if (a == b11) {
return c11;
}
I know functionality wise they are the same. But is it best practice to do if else-if, or not? It's raised by one of my friends when I pointed out he could structure the code base differently to make it cleaner. It's already a habit for me for long but I have never asked why.
if-elseif-else statements stop doing comparisons as soon as it finds one that's true. if-if-if does every comparison. The first is more efficient.
Edit: It's been pointed out in comments that you do a return within each if block. In these cases, or in cases where control will leave the method (exceptions), there is no difference between doing multiple if statements and doing if-elseif-else statements.
However, it's best practice to use if-elseif-else anyhow. Suppose you change your code such that you don't do a return in every if block. Then, to remain efficient, you'd also have to change to an if-elseif-else idiom. Having it be if-elseif-else from the beginning saves you edits in the future, and is clearer to people reading your code (witness the misinterpretation I just gave you by doing a skim-over of your code!).
What about the case where b1 == b2? (And if a == b1 and a == b2?)
When that happens, generally speaking, the following two chunks of code will very likely have different behavior:
if (a == b1) {
/* do stuff here, and break out of the test */
}
else if (a == b2) {
/* this block is never reached */
}
and:
if (a == b1) {
/* do stuff here */
}
if (a == b2) {
/* do this stuff, as well */
}
If you want to clearly delineate functionality for the different cases, use if-else or switch-case to make one test.
If you want different functionality for multiple cases, then use multiple if blocks as separate tests.
It's not a question of "best practices" so much as defining whether you have one test or multiple tests.
The are NOT functionally equivalent.
The only way it would be functionally equivalent is if you did an "if" statement for every single possible value of a (ie: every possibly int value, as defined in limits.h in C; using INT_MIN and INT_MAX, or equivalent in Java).
The else statement allows you to cover every possible remaining value without having to write millions of "if" statements.
Also, it's better coding practice to use if...else if...else, just like how in a switch/case statement, your compiler will nag you with a warning if you don't provide a "default" case statement. This prevents you from overlooking invalid values in your program. eg:
double square_root(double x) {
if(x > 0.0f) {
return sqrt(x);
} else if(x == 0.0f) {
return x;
} else {
printf("INVALID VALUE: x must be greater than zero");
return 0.0f;
}
}
Do you want to type millions of if statements for each possible value of x in this case? Doubt it :)
Cheers!
This totally depends on the condition you're testing. In your example it will make no difference eventually but as best practice, if you want ONE of the conditions to be eventually executed then you better use if else
if (x > 1) {
System.out.println("Hello!");
}else if (x < 1) {
System.out.println("Bye!");
}
Also note that if the first condition is TRUE the second will NOT be checked at all but if you use
if (x > 1) {
System.out.println("Hello!");
}
if (x < 1) {
System.out.println("Bye!");
}
The second condition will be checked even if the first condition is TRUE. This might be resolved by the optimizer eventually but as far as I know it behaves that way. Also the first one is the one is meant to be written and behaves like this so it is always the best choice for me unless the logic requires otherwise.
if and else if is different to two consecutive if statements. In the first, when the CPU takes the first if branch the else if won't be checked. In the two consecutive if statements, even if the the first if is checked and taken, the next if will also be check and take if the the condition is true.
I tend to think that using else if is easier more robust in the face of code changes. If someone were to adjust the control flow of the function and replaces a return with side-effect or a function call with a try-catch the else-if would fail hard if all conditions are truly exclusive. It really depends to much on the exact code you are working with to make a general judgment and you need to consider the possible trade-offs with brevity.
With return statements in each if branch.
In your code, you have return statements in each of the if conditions. When you have a situation like this, there are two ways to write this. The first is how you've written it in Example 1:
if (a == b1) {
return c1;
} else if (a == b2) {
return c2;
} else {
return c11;
}
The other is as follows:
if (a == b1) {
return c1;
}
if (a == b2) {
return c2;
}
return c11; // no if or else around this return statement
These two ways of writing your code are identical.
The way you wrote your code in example 2 wouldn't compile in C++ or Java (and would be undefined behavior in C), because the compiler doesn't know that you've covered all possible values of a so it thinks there's a code path through the function that can get you to the end of the function without returning a return value.
if (a == b1) {
return c1;
}
if (a == b2) {
return c2;
}
...
if (a == b11) {
return c11;
}
// what if you set a to some value c12?
Without return statements in each if branch.
Without return statements in each if branch, your code would be functionally identical only if the following statements are true:
You don't mutate the value of a in any of the if branches.
== is an equivalence relation (in the mathematical sense) and none of the b1 thru b11 are in the same equivalence class.
== doesn't have any side effects.
To clarify further about point #2 (and also point #3):
== is always an equivalence relation in C or Java and never has side effects.
In languages that let you override the == operator, such as C++, Ruby, or Scala, the overridden == operator may not be an equivalence relation, and it may have side effects. We certainly hope that whoever overrides the == operator was sane enough to write an equivalence relation that doesn't have side effects, but there's no guarantee.
In JavaScript and certain other programming languages with loose type conversion rules, there are cases built into the language where == is not transitive, or not symmetric. (In Javascript, === is an equivalence relation.)
In terms of performance, example #1 is guaranteed not to perform any comparisons after the one that matches. It may be possible for the compiler to optimize #2 to skip the extra comparisons, but it's unlikely. In the following example, it probably can't, and if the strings are long, the extra comparisons aren't cheap.
if (strcmp(str, "b1") == 0) {
...
}
if (strcmp(str, "b2") == 0) {
...
}
if (strcmp(str, "b3") == 0) {
...
}
I prefer if/else structures, because it's much easier to evaluate all possible states of your problem in every variation together with switches. It's more robust I find and quicker to debug especially when you do multiple Boolean evaluations in a weak-typed environment such as PHP, example why elseif is bad (exaggerated for demonstration):
if(a && (c == d))
{
} elseif ( b && (!d || a))
{
} elseif ( d == a && ( b^2 > c))
{
} else {
}
This problem has beyond 4^2=16 boolean states, which is simply to demonstrate the weak-typing effects that makes things even worse. It isn't so hard to imagine a three state variable, three variable problem involved in a if ab elseif bc type of way.
Leave optimization to the compiler.
In most cases, using if-elseif-else and switch statements over if-if-if statements is more efficient (since it makes it easier for the compiler to create jump/lookup tables) and better practice since it makes your code more readable, plus the compiler makes sure you include a default case in the switch. This answer, along with this table comparing the three different statements was synthesized using other answer posts on this page as well as those of a similar SO question.
I think these code snippets are equivalent for the simple reason that you have many return statements. If you had a single return statements, you would be using else constructs that here are unnecessary.
Your comparison relies on the fact that the body of the if statements return control from the method. Otherwise, the functionality would be different.
In this case, they perform the same functionality. The latter is much easier to read and understand in my opinion and would be my choice as which to use.
They potentially do different things.
If a is equal to b1 and b2, you enter two if blocks. In the first example, you only ever enter one. I imagine the first example is faster as the compiler probably does have to check each condition sequentially as certain comparison rules may apply to the object. It may be able to optimise them out... but if you only want one to be entered, the first approach is more obvious, less likely to lead to developer mistake or inefficient code, so I'd definitely recommend that.
CanSpice's answer is correct. An additional consideration for performance is to find out which conditional occurs most often. For example, if a==b1 only occurs 1% of the time, then you get better performance by checking the other case first.
Gir Loves Tacos answer is also good. Best practice is to ensure you have all cases covered.

Difference between iterating to next node in recursive function vs recursive function call

In this piece of code, I am comparing two linked lists and checking whether the items in the linked lists are equal or not.
bool Check(Node *listA, Node *listB) {
if (listA == NULL && listB == NULL)
return true;
if (listA->item == listB->item)
{
listA = listA->link;
listB = listB->link;
return Check(listA, listB);
}
return false;
}
I was wondering what the difference between this code:
listA = listA->link;
listB = listB->link;
return Check(listA, listB);
and this is:
return Check(listA->link, listB->link);
Both pieces of code produce the correct answer, but I can't seem to understand what the difference is.
There is no difference, they do exactly the same thing. The only difference is that you could change something in the next node if you needed before calling Check(). But in your case they are exactly the same, the second option is cleaner tho so i recommend that one.
In general, modifying an IN parameter value makes your function's code and intent less clear, so it's best to avoid it.
Also consider that if you are using a debugger and step back to a prior recursive call, you will not be able to see the correct node that was inspected, since its pointer was already overwritten. Thus it will be more confusing to debug.
Practically speaking, the outcome of both functions will be the same. The second one may be infinitesimally faster due to skipping the two pointless assignment operations, unless that is optimized away.

Should I use ELSE IF? for better performance?

Newbie here and I just want to know should I use ELSE IF for something like below:
(function)
IF x==1
IF x==2
IF x==3
That is the way I am using, because the x will not be anything else. However, I think that if the x is equal to 1, the program still gonna run through the following codes (which turn out to be FALSE FALSE FALSE ...). Should I use ELSE IF so it won't have to run the rest? Will that help the performance?
Why don't I want to use ELSE IF? because I'd like each code block (IF x==n) to be similar, not like this:
IF x==1
ELSE IF x==2
ELSE IF x==3
(each ELSE IF block is part of the block above it)
But the program will repeatedly call this function so I am worried about the performance or delay.
Short answer: If you do not need to handle a case where multiple conditions might be true at the same time, always use
if (condition) {
//do something
}
else if (other_condition) {
//do something else
}
else { //in all other conditions
//default behaviour
}
Long answer:
As others have already stated, performance is not really a big concern (unless you are writing production code for enterprise software targeted at colossal businesses). In case performance is indeed crucial though, you should go for the above format anyway. So that might be a good practice/habit to get used to (especially if you are now starting your code journey)
Switch could be an alternative, but since you haven't specified the language I would avoid suggesting it since, in some languages, it defaults to fall-through (which might get you where you started in the first place and confuse you even more)
Performance might not be a concern. But keep in mind that logic errors are a huge enemy to programming, and your solution is prone to them if you don't actually need it to be able to match more than one cases. Consider the following case.
if (x == 1) {
x = x + 1
}
if (x == 2) {
x = x + 2
}
if (x >= 3) {
print("Error: x should only be 1 or 2!")
}
In this case, you would expect that if x >= 3 you would warn about an error in value since you only had in mind handling the values 1 or 2. Actually though, even if the value of x is 1 or 2 (which you have considered to be valid) the same error message would be printed!. That's because you have allowed the possibility of more than one conditions being checked and the respective code block being executed each time. Note that this is an oversimplified example. In times, this can be a great pain! Especially if you collaborate with others and you share the code and you are aiming for expendable and maintainable code.
To conclude, do not use a simpler solution if you haven't thought it through. Go for the complete one instead and take in mind all possible outcomes (usually the worst case scenarios and even future features and code).
Best Regards!
If the value being tested is expected to be able to match multiple in a single calling, then test each (IF, IF, ...).
If the value is expected to only match one, then check for it and stop if you find it (IF, ELSE IF, ELSE IF...).
If the values are expected to be one of a known set, then go right to it (switch).
Assuming this is javascript, but this should be about the same for anything else.
The code inside the if statement will only be run if the condition you provide it is true. For example, if you declare x = 1, we could have something like:
function something() {
if(x == 1) {
//do this
}
if(x == 2) {
//do that
}
if(x == 3) {
//do this and that
}
The first block would be run and everything else is ignored. An else-if statement will run if the first if statement is false.
function something() {
if(x == 1) {
//do this
}
else if(x == 2) {
//do that
}
So if x == 1 was false, the next statement would be evaluated.
As for performance, the difference is way too little for you to care about. If you have many conditions you need to test, you may want to look into a switch statement.

Recursive Backtracking Sudoku Solver Problems, c++

It's my first time dealing with recursion as an assignment in a low level course. I've looked around the internet and I can't seem to find anybody using a method similar to the one I've come up with (which probably says something about why this isn't working). The error is a segmentation fault in std::__copy_move... which I'm assuming is something in the c++ STL.
Anywho, my code is as follows:
bool sudoku::valid(int x, int y, int value)
{
if (x < 0) {cerr << "No valid values exist./n";}
if (binary_search(row(x).begin(), row(x).end(), value))
{return false;} //if found in row x, exit, otherwise:
else if (binary_search(col(y).begin(), col(y).end(), value))
{return false;} //if found in col y, exit, otherwise:
else if (binary_search(box((x/3), (y/3)).begin(), box((x/3), (y/3)).end(), value))
{return false;} //if found in box x,y, exit, otherwise:
else
{return true;} //the value is valid at this index
}
int sudoku::setval(int x, int y, int val)
{
if (y < 0 && x > 0) {x--; y = 9;} //if y gets decremented past 0 go to previous row.
if (y > 8) {y %= 9; x++;} //if y get incremented past 8 go to next row.
if (x == 9) {return 0;} //base case, puzzle done.
else {
if (valid(x,y,val)){ //if the input is valid
matrix[x][y] = val; //set the element equal to val
setval(x,y++,val); //go to next element
}
else {
setval(x,y,val++); //otherwise increment val
if(val > 9) {val = value(x,y--); setval(x,y--,val++); }
} //if val gets above 9, set val to prev element,
} //and increment the last element until valid and start over
}
I've been trying to wrap my head around this thing for a while and I can't seem to figure out what's going wrong. Any suggestions are highly appreciated! :)
sudoku::setval is supposed to return an int but there are at least two paths where it returns nothing at all. You should figure out what it needs to return in those other paths because otherwise you'll be getting random undefined behavior.
Without more information, it's impossible to tell. Things like the data
structures involved, and what row and col return, for example.
Still, there are a number of obvious problems:
In sudoku::valid, you check for what is apparently an error
condition (x < 0), but you don't return; you still continue your
tests, using the negative value of x.
Also in sudoku:valid: do row and col really return references to
sorted values? If the values aren't sorted, then binary_search will
have undefined behavior (and if they are, the names are somewhat
misleading). And if they return values (copies of something), rather
than a reference to the same object, then the begin() and end()
functions will refer to different objects—again, undefined
behavior.
Finally, I don't see any backtracking in your algorithm, and I don't
see how it progresses to a solution.
FWIW: when I wrote something similar, I used a simple array of 81
elements for the board, then created static arrays which mapped the
index (0–80) to the appropriate row, column and box. And for each of
the nine rows, columns and boxes, I kept a set of used values (a
bitmap); this made checking for legality very trivial, and it meant that
I could increment to the next square to test just by incrementing the
index. The resulting code was extremely simple.
Independently of the data representation used, you'll need: some
"global" (probably a member of sudoku) means of knowing whether you've
found the solution or not; a loop somewhere trying each of the nine
possible values for a square (stopping when the solution has been
found), and the recursion. If you're not using a simple array for the
board, as I did, I'd suggest a class or a struct for the index, with a
function which takes care of the incrementation once and for all.
All of the following is for Unix not Windows.
std::__copy_move... is STL alright. But STL doesn't do anything by itself, some function call from your code would've invoked it with wrong arguments or in wrong state. You need to figure that out.
If you have a core dump from teh seg-fault then just do a pstack <core file name>, you will see the full call stack of the crash. Then just see which part of your code was involved in it and start debugging (add traces/couts/...) from there.
Usually you'll get this core file with nice readable names, but in case you don't you can use nm or c++filt etc to dismangle the names.
Finally, pstack is just a small cmd line utility, you can always load the binary (that produced the core) and the core file into a debugger like gdb, Sun Studio or debugger built into your IDE and see the same thing along with lots of other info and options.
HTH
It seems like your algorithm is a bit "brute forcy". This is generally not a good tactic with Constraint Satisfaction Problems (CSPs). I wrote a sudoku solver a while back (wish I still had the source code, it was before I discovered github) and the fastest algorithm that I could find was Simulated Annealing:
http://en.wikipedia.org/wiki/Simulated_annealing
It's probabilistic, but it was generally orders of magnitude faster than other methods for this problem IIRC.
HTH!
segmentation fault may (and will) happen if you enter a function recursively too many times.
I noted one scenario which lead to it. But I'm pretty sure there are more.
Tip: write in your words the purpose of any function - if it is too complicated to write - the function should probably be split...

stack overflow problem in program

So I am currently getting a strange stack overflow exception when i try to run this program, which reads numbers from a list in a data/text file and inserts it into a binary search tree. The weird thing is that when the program works when I have a list of 4095 numbers in random order. However when i have a list of 4095 numbers in increasing order (so it makes a linear search tree), it throws a stack overflow message. The problem is not the static count variable because even when i removed it, and put t=new BinaryNode(x,1) it still gave a stack overflow exception. I tried debugging it, and it broke at if (t == NULL){ t = new BinaryNode(x,count); Here is the insert function.
BinaryNode *BinarySearchTree::insert(int x, BinaryNode *t) {
static long count=0;
count++;
if (t == NULL){
t = new BinaryNode(x,count);
count=0;
}
else if (x < t->key){
t->left = insert(x, t->left);
}
else if (x > t->key){
t->right = insert(x, t->right);
}
else
throw DuplicateItem();
return t;
}
In a language like C++, you cannot use recursive algorithms on tall trees because each function call uses additional space on a limited stack. You must either change your algorithm (use iteration rather than recursion) or use a balanced binary tree structure.
If you have a bounded input (as it appears you do in this case), you can relieve stack pressure by either making the stack bigger (as Andreas suggests) or put less data on the stack. It seems as though insert is a member function of the BinarySearchTree class even though it doesn't reference any other members of the class. If you make insert a static method (or a regular function not in a class), it won't have to push a this pointer on the stack for every function call, and you will be able to get more iterations before overflowing.
You can increase the size of the stack. Depending on which compiler you're working with this is done in different ways. For instance in Visual Studio the stack size can be set with the command line option:
/F stacksize