Weird behaviour in for-loop in C++ on Mac - c++

I have the following function:
std::vector<std::vector<Surfel>> testAddingSift(
const Groundtruth &groundtruth,
SurfelHelper &surfelHelper) {
for (int k = 0; k < 10; k++) {
std::cout << "hej" << k <<std::endl;
}
}
When I forgot to return a vector<vector<Surfel>> I got an infinite loop:
hej1025849
hej1025850
hej1025851
hej1025852
When I return a vector<vector<Surfel>> I get:
hej0
hej1
hej2
hej3
hej4
hej5
hej6
hej7
hej8
hej9
Of course it was a mistake to forget to return the vector, but why is the for-loop affected?
I am using a MacBook Pro with Sierra and CLion and I think it is clang.

Failure to return from a function with non-void return type is undefined behavior. Doing so makes it, by definition, nearly impossible to reason about the resulting behavior. This can affect the behavior of code that comes before the point where undefined behavior would be expected to be encountered.

The proper answer here is that this is, of course, undefined behavior, so anything could happen. A more interesting question is, how in the world would something silly, like forgetting a return statement, lead to an infinite loop?
It is not the first time I forget to return something, but that has never caused this kind of problems before.
My best guess is that the result that you see has to do with returning an object by value when the object has a non-trivial copy constructor. In your case, copy constructor is rather non-trivial, because it needs to deal with nested arrays. In particular, I suspect that the infinite loop would go away if you change return type to an int (the behavior would remain undefined, though).
My guess is that when your loop calls operator << it places return address on the stack. Once operator << returns, the stack frame becomes unused, but its content remains intact. I suspect that the code for copying the returned vector re-interprets the content of the "garbage" stack frame as a vector with lots of elements, and invokes loop's body instead of copying array elements.
This is just one possibility. If you would like to find out what is happening, the proper way would be to dig through the disassembly.

Related

Difference in erasing an element from a vector and a vector of pairs

#include <bits/stdc++.h>
using namespace std;
int main()
{
vector<int>p;
p.push_back(30);
p.push_back(60);
p.push_back(20);
p.erase(p.end());
for(int i = 0; i < p.size(); ++i)
cout<<p[i]<<" ";
}
The above code throws error as it is understood that p.end() point to null pointer.
While this code below is running fine and output is 30 60. Can anyone explain this?
#include <bits/stdc++.h>
using namespace std;
#define mp make_pair
int main()
{
vector<pair<int,int>>p;
p.push_back(mp(30,2));
p.push_back(mp(60,5));
p.push_back(mp(20,7));
p.erase(p.end());
for(int i = 0; i < p.size(); ++i)
cout<<p[i].first<<" ";
}
From std::vector::erase :
The iterator pos must be valid and dereferenceable. Thus the end() iterator (which is valid, but is not dereferencable) cannot be used as a value for pos.
So your code is invalid in both cases and invokes undefined behaviour. This means anything can happen including crash, work, or whatever it wants.
The above code throws error as it is understood that ...
That code is not guaranteed to "throw an error". Rather, the behaviour is undefined. Throwing an error is one possible behaviour. If it does throw an error, you can count yourself lucky as it might have been difficult to find your bug otherwise.
... as it is understood that p.end() point to null pointer.
No, p.end() does not "point to a null pointer". It points to the end of the vector, where end of the vector is defined as the position after the last element.
While this code below is running fine and output is 30 60. Can anyone explain this?
"Running fine" and "output is 30 60" are possible behaviours when behaviour is undefined. Everything is a possible behaviour when it is undefined. But of course, there is no guarantee that it will run fine. As far as the language is concerned, the program could just as well not be running fine tomorrow.
I have checked it on many online compilers but the output is same!!
Ouput being the same on many online compilers is also possible behaviour when behaviour is undefined. There is no guarantee that some compiler would behave differently, just as much as there is no guarantee for them behaving the same.
No matter how many compilers you try, it is impossible to verify that a program is correct simply by executing it and observing the output that you hoped for. The only way to prove a program correct, is to verify all pre-conditions and invariants imposed on the program are satisfied.

Can writing into reserved space of std::vector result in a segmentation fault?

I hope this is not a too controversial question, but I cannot find a proper full answer on SO. This is also not a question about the difference between the methods reserve and resize or the difference between capacity and size, which are (hopefully) clear to me and have often enough been asked on SO. Also, this is not a question, if this is good practice at all, which it is not!
Consider the following situation:
#include <vector>
#include <iostream>
struct Foo
{
double a, b;
};
int main(int argc, char* argv[])
{
std::vector<Foo> Vec;
Vec.reserve(100);
Foo foo;
foo.a = -13.131;
foo.b = 3.141;
for(int i = 0; i < 100; ++i)
Vec[i] = foo;
for(int i = 0; i < 100; ++i)
std::cout << Vec[i].a << std::endl;
return 0;
}
I first create a std::vector of Foo and the reserve memory, but don't resize the vector. Clearly size() = 0, BUT the memory for 100 elements has been allocated and may now be freely used by my program, so technically, writing to and reading from any position in memory of these elements cannot result in a segmentation fault, is that correct?
I have tried to run this code on Ubuntu 14.04 and everything works as expected, all 100 elements have been written to successfully and all outputs are also -13.131, even though the vector size remains at 0. If I look for through many answers on SO, they all correctly point out that it results in undefined behaviour, because the elements are not initialized, but could it actually result in a segmentation fault in any way (not talking about accessing elements of unitialized pointers in a vector etc.)?
A question similar to this has been asked here and that seems to confirm my thought, but would it in principle work accross all platforms that support compilation of C++?
Once you have undefined behaviour, it is well, undefined behaviour.
One of the key aspects of undefined behaviour is that you can't be sure what the behaviour would be on different system and compiler. Now you could look at the code of a specific compiler and a specific library implementation and you will see it acts as you expect it to.
But I don't think you will find anyone who is willing to bet that this will work across all different systems, compilers and library implementations.
Just for instance, what if a specific vector implementation decide to use the reserved memory for internal information? Maybe it is unlikely, but how can you be sure no system is actually doing it?
Let us consider a concrete example - the std::vector implementation that when reserve() is called, it allocates the memory, but then starts performing the copy on a background thread - because it can... shrugs who knows what will happen in the near future! So while it's copying, all reads are unlocked and go straight to the old memory area because that's still good for reading.
Now attempting to read something out of range will be attempting to read random memory, and not what you're asserting should be your new allocated memory.
So as the comments and the other answer says, undefined is undefined.

Is alocating specific memory for a void pointer undefined behaviour?

I've met a situation that I think it is undefined behavior: there is a structure that has some member and one of them is a void pointer (it is not my code and it is not public, I suppose the void pointer is to make it more generic). At some point to this pointer is allocated some char memory:
void fooTest(ThatStructure * someStrPtr) {
try {
someStrPtr->voidPointer = new char[someStrPtr->someVal + someStrPtr->someOtherVal];
} catch (std::bad_alloc$ ba) {
std::cerr << ba.what << std::endl;
}
// ...
and at some point it crashes at the allocation part (operator new) with Segmentation fault (a few times it works, there are more calls of this function, more cases). I've seen this in debug.
I also know that on Windows (my machine is using Linux) there is also a Segmentation fault at the beginning (I suppose that in the first call of the function that allocates the memory).
More, if I added a print of the values :
std::cout << someStrPtr->someVal << " " << someStrPtr->someOtherVal << std::endl;
before the try block, it runs through the end. This print I've done to see if there is some other problem regarding the structure pointer, but the values are printed and not 0 or negative.
I've seen these topics: topic1, topic2, topic3 and I am thinking that there is some UB linked to the void pointer. Can anyone help me in pointing the issue here so I can solve it, thanks?
No, that in itself is not undefined behavior. In general, when code "crashes at the allocation part", it's because something earlier messed up the heap, typically by writing past one end of an allocated block or releasing the same block more than once. In short: the bug isn't in this code.
A void pointer is a perfectly fine thing to do in C/C++ and you can usually cast from/to other types
When you get a seg-fault while initialization, this means some of the used parameters are themselves invalid or so:
Is someStrPtr valid?
is someStrPtr->someVal and someStrPtr->someotherVal valid?
Are the values printed is what you were expecting?
Also if this is a multuthreaded application, make sure that no other thread is accessing those variables (especially between your print and initialization statement). This is what is really difficult to catch

Optimizer bug or programming error?

First of all: I know that most optimization bugs are due to programming errors or relying on facts which may change depending on optimization settings (floating point values, multithreading issues, ...).
However I experienced a very hard to find bug and am somewhat unsure if there is any way to prevent these kind of errors from happening without turning the optimization off. Am I missing something? Could this really be an optimizer bug? Here's a simplified example:
struct Data {
int a;
int b;
double c;
};
struct Test {
void optimizeMe();
Data m_data;
};
void Test::optimizeMe() {
Data * pData; // Note that this pointer is not initialized!
bool first = true;
for (int i = 0; i < 3; ++i) {
if (first) {
first = false;
pData = &m_data;
pData->a = i * 10;
pData->b = i * pData->a;
pData->c = pData->b / 2;
} else {
pData->a = ++i;
} // end if
} // end for
};
int main(int argc, char *argv[]) {
Test test;
test.optimizeMe();
return 0;
}
The real program of course has a lot more to do than this. But it all boils down to the fact that instead of accessing m_data directly, a (previously unitialized) pointer is being used. As soon as I add enough statements to the if (first)-part, the optimizer seems to change the code to something along these lines:
if (first) {
first = false;
// pData-assignment has been removed!
m_data.a = i * 10;
m_data.b = i * m_data.a;
m_data.c = m_data.b / m_data.a;
} else {
pData->a = ++i; // This will crash - pData is not set yet.
} // end if
As you can see, it replaces the unnecessary pointer dereference with a direct write to the member struct. However it does not do this in the else-branch. It also removes the pData-assignment. Since the pointer is now still unitialized, the program will crash in the else-branch.
Of course there are various things which could be improved here, so you might blame it on the programmer:
Forget about the pointer and do what the optimizer does - use m_data directly.
Initialize pData to nullptr - that way the optimizer knows that the else-branch will fail if the pointer is never assigned. At least it seems to solve the problem in my test-environment.
Move the pointer assignment in front of the loop (effectively initializing pData with &m_data, which then could also be a reference instead of a pointer (for good measure). This makes sense because pData is needed in all cases so there is no reason to do this inside the loop.
The code is obviously smelly, to say the least, and I'm not trying to "blame" the optimizer for doing this. But I'm asking: What am I doing wrong? The program might be ugly, but it's valid code...
I should add that I'm using VS2012 with C++/CLI and v110_xp-Toolset. Optimization is set to /O2. Please also note that if you really want to reproduce the problem (that's not really the point of this question though) you need to play around with the complexity of the program. This is a very simplified example and the optimizer sometimes doesn't remove the pointer assignment. Hiding &m_data behind a function seems to "help".
EDIT:
Q: How do I know that the compiler is optimizing it to something like the example provided?
A: I'm not very good at reading assembler, I have looked at it however and have made 3 observations which make me believe that it's behaving this way:
As soon as optimization kicks in (adding more assignments usually does the trick) the pointer assignment has no associated assembler statement. It also hasn't been moved up to the declaration, so it's really left uninitialized it seems (at least to me).
In cases where the program crashes, the debugger skips the assignment statement. In cases where the program runs without problems, the debugger stops there.
If I watch the content of pData and the content of m_data while debugging, it clearly shows that all assignments in the if-branch have an effect on m_data and m_data receives the correct values. The pointer itself it still pointing to the same uninitialized value it had from the beginning. Therefore I have to assume that it is in fact not using the pointer to make the assignments at all.
Q: Does it have to do anything with i (Loop unrolling)?
A: No, the actual program actually uses do { ... } while() to loop over a SQL SELECT-resultset so the iteration count is completely runtime-specific and cannot be predetermined by the compiler.
It sure looks like an bug to me. It's fine for the optimizer to eliminate the unnecessary redirection, but it should not eliminate the assignment to pData.
Of course, you can work around the problem by assigning to pData before the loop (at least in this simple example). I gather that the problem in your actual code isn't as easily resolved.
I also vote for an optimizer bug if it is really reproducible in this example. To overrule the optimizer you could try to declare pData as volatile.

c++ return value

I have the following code in c++:
int fff ( int a , int b )
{
if (a>b )
return 0;
else a+b ;
}
although I didn't write 'return' after else it does not make error ! < br/>
in main() when I wrote:
cout<<fff(1,2);
it printed 1 ?
How did that happened
can any one Explain that ?
This what is called undefined behavior. Anything can happen.
C++ does not require you to always return a value at the end of a function, because it's possible to write code that never gets there:
int fff ( int a , int b )
{
if (a>b )
return 0;
else return a+b;
// still no return at end of function
// syntactically, just as bad as original example
// semantically, nothing bad can happen
}
However, the compiler cannot determine if you never get to the end of the function, and the most it can do is give a warning. It's up to you to avoid falling off the end without a return.
And if you do, you might get a random value, or you might crash.
$6.6.3/2- "Flowing off the end of a
function is equivalent to a return
with no value; this results in
undefined behavior in a
value-returning function."
A compiler may or may not diagnose such a condition.
Here
else a + b;
is treated as an expression without any side effect.
the "random" return vaule is determined by the CPU register value after the call, since the register is 1 after the call, so the value is 1.
If you change you code, the function will return diffrent value.
A good compiler (e.g. gcc) will issue a warning if you make such a mistake, and have a command line switch to return a non-zero error status if any warnings were encountered. This is undefined behaviour: the result you're seeing is whatever value happened to be in the place that the compiler would normally expect a function returning int to use: for example, the accumulator register or some spot on the stack. Your code doesn't copy a+b into that location, so whatever was last put in there will be seen instead. Still, you're not even guaranteed to get a result - some compiler/architecture might do something that can crash the machine if the function didn't have a return statement: for example - pop() a value from the stack on the assumption that return has pushed one - future uses of the stack (including reading function-return addresses) could then get results from the memory address above or below the intended one.