std::optional & compilers - shouldn't this crash? - c++

I'm trying to create a hiearchy of classes in which parts are optional. I would like things to automatically be created as soon as variables are set.
For that I use C++17 std::optional feature.
Now in the example below I forgot to set the "parent" (test2_inst) first, yet g++, clang and msvc all compile and run fine altough with the "not set" output.
My questions now are: am I indeed doing the wrong thing in this example? and what would the proper way of resolving this?
Or are the compilers doing the wrong thing?
#include <optional>
class test1 {
public:
class test2 {
public:
int a, b;
class test3 {
public:
int c, d;
};
test3 test3_inst;
};
std::optional<test2> test2_inst;
};
int main(int argc, char *argv[])
{
test1 *test1_inst = new test1();
// can set value
test1_inst->test2_inst->test3_inst.c = 3;
// yet optional says it is note set?
if (test1_inst->test2_inst.has_value())
printf("set\n");
else
printf("not set\n");
return 0;
}

The behaviour of optional::operator* and optional::operator-> is undefined if the optional does not contain a value.
Accesses the contained value.
Returns a pointer to the contained value.
Returns a reference to the contained value.
The behavior is undefined if *this does not contain a value.
Source: https://en.cppreference.com/w/cpp/utility/optional/operator*

shouldn't this crash?
Could. Undefined behavior can do anything. Crashing is one possibility. Not crashing and appearing to work is also a possibility.
am I indeed doing the wrong thing in this example?
Yes.
what would the proper way of resolving this?
Depends what you are trying to do. Check the optional...
if (test1_inst->test2_inst)
test1_inst->test2_inst->test3_inst.c = 3;
Or, assign its value...
test1_inst->test2_inst = test1::test2{1, 2, {3, 4}};
Or are the compilers doing the wrong thing?
No, the C++ standard gives the compilers a lot of latitude.
C++ is not a nanny language, and gives programmers enough rope to shoot themselves in the foot.

Related

What happens to a reference when the object is deleted?

I did a bit of an experiment to try to understand references in C++:
#include <iostream>
#include <vector>
#include <set>
struct Description {
int a = 765;
};
class Resource {
public:
Resource(const Description &description) : mDescription(description) {}
const Description &mDescription;
};
void print_set(const std::set<Resource *> &resources) {
for (auto *resource: resources) {
std::cout << resource->mDescription.a << "\n";
}
}
int main() {
std::vector<Description> descriptions;
std::set<Resource *> resources;
descriptions.push_back({ 10 });
resources.insert(new Resource(descriptions.at(0)));
// Same as description (prints 10)
print_set(resources);
// Same as description (prints 20)
descriptions.at(0).a = 20;
print_set(resources);
// Why? (prints 20)
descriptions.clear();
print_set(resources);
// Object is written to the same address (prints 50)
descriptions.push_back({ 50 });
print_set(resources);
// Create new array
descriptions.reserve(100);
// Invalid address
print_set(resources);
for (auto *res : resources) {
delete res;
}
return 0;
}
https://godbolt.org/z/TYqaY6Tz8
I don't understand what is going on here. I have found this excerpt from C++ FAQ:
Important note: Even though a reference is often implemented using an address in the underlying assembly language, please do not think of a reference as a funny looking pointer to an object. A reference is the object, just with another name. It is neither a pointer to the object, nor a copy of the object. It is the object. There is no C++ syntax that lets you operate on the reference itself separate from the object to which it refers.
This creates some questions for me. So, if reference is the object itself and I create a new object in the same memory address, does this mean that the reference "becomes" the new object? In the example above, vectors are linear arrays; so, as long as the array points to the same memory range, the object will be valid. However, this becomes a lot trickier when other data sets are being used (e.g sets, maps, linked lists) because each "node" typically points to different parts of memory.
Should I treat references as undefined if the original object is destroyed? If yes, is there a way to identify that the reference is destroyed other than a custom mechanism that tracks the references?
Note: Tested this with GCC, LLVM, and MSVC
The note is misleading, treating references as syntax sugar for pointers is fine as a mental model. In all the ways a pointer might dangle, a reference will also dangle. Accessing dangling pointers/references is undefined behaviour (UB).
int* p = new int{42};
int& i = *p;
delete p;
void f(int);
f(*p); // UB
f(i); // UB, with the exact same reason
This also extends to the standard containers and their rules about pointer/reference invalidation. The reason any surprising behaviour happens in your example is simply UB.
The way I explain this to myself is:
Pointer is like a finger on your hands. It can point to memory blocks, think of them as a keyboard. So pointer literally points to a keypad that holds something or does something.
Reference is a nickname for something. Your name may be for example Michael Johnson, but people may call you Mike, MJ, Mikeson etc. Anytime you hear your nickname, person who called REFERED to the same thing - you. If you do something to yourself, reference will show the change too. If you point at something else, it won't affect what you previously pointed on (unless you're doing something weird), but rather point on something new. That being said, as in the accepted answer above, if you do something weird with your fingers and your nicknames, you'll see weird things happening.
References are likely the most important feature that C++ has that is critical in coding for beginners. Many schools today start with MATLAB which is insanely slow when you wish to do things seriously. One of the reasons is the lack of controlling references in MATLAB (yes it has them, make a class and derive from the handle - google it out) as you would in C++.
Look these two functions:
double fun1(std::valarray<double> &array)
{
return array.max();
}
double fun2(std::valarray<double> array)
{
return array.max();
}
These simple two functions are very different. When you have some STL array and use fun1, function will expect nickname for that array, and will process it directly without making a copy. fun2 on the other hand will take the input array, create its copy, and process the copy.
Naturally, it is much more efficient to use references when making functions to process inputs in C++. That being said, you must be certain not to change your input in any way, because that will affect original input array in another piece of code where you generated it - you are processing the same thing, just called differently.
This makes references useful for a bit controversial coding, called side-effects.
In C++ you can't make a function with multiple outputs directly without making a custom data type. One workaround is a side effect in example like this:
#include <stdio.h>
#include <valarray>
#include <iostream>
double fun3(std::valarray<double> &array, double &min)
{
min = array.min();
return array.max();
}
int main()
{
std::valarray<double> a={1, 2, 3, 4, 5};
double sideEffectMin;
double max = fun3(a,sideEffectMin);
std::cout << "max of array is " << max << " min of array is " <<
sideEffectMin<<std::endl;
return 0;
}
So fun3 is expecting a reference to a double data type. In other words, it wants the second input to be a nickname for another double variable. This function then goes to alter the reference, and this will also alter the input. Both name and nickname get altered by the function, because it's the same "thing".
In main function, variable sideEffectMin is initialized to 0, but it will get a value when fun3 function is called. Therefore, you got 2 outputs from fun3.
The example shows you the trick with side effect, but also to be ware not to alter your inputs, specially when they are references to something else, unless you know what you are doing.

How to make uninitiated pointer not equal to 0/null?

I am a c++ learner. Others told me "uninitiatied pointer may point to anywhere". How to prove that by code.?I made a little test code but my uninitiatied pointer always point to 0. In which case it does not point to 0? Thanks
#include <iostream>
using namespace std;
int main() {
int* p;
printf("%d\n", p);
char* p1;
printf("%d\n", p1);
return 0;
}
Any uninitialized variable by definition has an indeterminate value until a value is supplied, and even accessing it is undefined. Because this is the grey-area of undefined behaviour, there's no way you can guarantee that an uninitialized pointer will be anything other than 0.
Anything you write to demonstrate this would be dictated by the compiler and system you are running on.
If you really want to, you can try writing a function that fills up a local array with garbage values, and create another function that defines an uninitialized pointer and prints it. Run the second function after the first in your main() and you might see it.
Edit: For you curiosity, I exhibited the behavior with VS2015 on my system with this code:
void f1()
{
// junk
char arr[24];
for (char& c : arr) c = 1;
}
void f2()
{
// uninitialized
int* ptr[4];
std::cout << (std::uintptr_t)ptr[1] << std::endl;
}
int main()
{
f1();
f2();
return 0;
}
Which prints 16843009 (0x01010101). But again, this is all undefined behaviour.
Well, I think it is not worth to prove this question, because a good coding style should be used and this say's: Initialise all variables! One example: If you "free" a pointer, just give them a value like in this example:
char *p=NULL; // yes, this is not needed but do it! later you may change your program an add code beneath this line...
p=(char *)malloc(512);
...
free(p);
p=NULL;
That is a safe and good style. Also if you use free(p) again by accident, it will not crash your program ! In this example - if you don't set NULL to p after doing a free(), your can use the pointer by mistake again and your program would try to address already freed memory - this will crash your program or (more bad) may end in strange results.
So don't waste time on you question about a case where pointers do not point to NULL. Just set values to your variables (pointers) ! :-)
It depends on the compiler. Your code executed on an old MSVC2008 displays in release mode (plain random):
1955116784
1955116784
and in debug mode (after croaking for using unitialized pointer usage):
-858993460
-858993460
because that implementation sets uninitialized pointers to 0xcccccccc in debug mode to detect their usage.
The standard says that using an uninitialized pointer leads to undefined behaviour. That means that from the standard anything can happen. But a particular implementation is free to do whatever it wants:
yours happen to set the pointers to 0 (but you should not rely on it unless it is documented in the implementation documentation)
MSVC in debug mode sets the pointer to 0xcccccccc in debug mode but AFAIK does not document it (*), so we still cannot rely on it
(*) at least I could not find any reference...

Optimizer bug or programming error?

First of all: I know that most optimization bugs are due to programming errors or relying on facts which may change depending on optimization settings (floating point values, multithreading issues, ...).
However I experienced a very hard to find bug and am somewhat unsure if there is any way to prevent these kind of errors from happening without turning the optimization off. Am I missing something? Could this really be an optimizer bug? Here's a simplified example:
struct Data {
int a;
int b;
double c;
};
struct Test {
void optimizeMe();
Data m_data;
};
void Test::optimizeMe() {
Data * pData; // Note that this pointer is not initialized!
bool first = true;
for (int i = 0; i < 3; ++i) {
if (first) {
first = false;
pData = &m_data;
pData->a = i * 10;
pData->b = i * pData->a;
pData->c = pData->b / 2;
} else {
pData->a = ++i;
} // end if
} // end for
};
int main(int argc, char *argv[]) {
Test test;
test.optimizeMe();
return 0;
}
The real program of course has a lot more to do than this. But it all boils down to the fact that instead of accessing m_data directly, a (previously unitialized) pointer is being used. As soon as I add enough statements to the if (first)-part, the optimizer seems to change the code to something along these lines:
if (first) {
first = false;
// pData-assignment has been removed!
m_data.a = i * 10;
m_data.b = i * m_data.a;
m_data.c = m_data.b / m_data.a;
} else {
pData->a = ++i; // This will crash - pData is not set yet.
} // end if
As you can see, it replaces the unnecessary pointer dereference with a direct write to the member struct. However it does not do this in the else-branch. It also removes the pData-assignment. Since the pointer is now still unitialized, the program will crash in the else-branch.
Of course there are various things which could be improved here, so you might blame it on the programmer:
Forget about the pointer and do what the optimizer does - use m_data directly.
Initialize pData to nullptr - that way the optimizer knows that the else-branch will fail if the pointer is never assigned. At least it seems to solve the problem in my test-environment.
Move the pointer assignment in front of the loop (effectively initializing pData with &m_data, which then could also be a reference instead of a pointer (for good measure). This makes sense because pData is needed in all cases so there is no reason to do this inside the loop.
The code is obviously smelly, to say the least, and I'm not trying to "blame" the optimizer for doing this. But I'm asking: What am I doing wrong? The program might be ugly, but it's valid code...
I should add that I'm using VS2012 with C++/CLI and v110_xp-Toolset. Optimization is set to /O2. Please also note that if you really want to reproduce the problem (that's not really the point of this question though) you need to play around with the complexity of the program. This is a very simplified example and the optimizer sometimes doesn't remove the pointer assignment. Hiding &m_data behind a function seems to "help".
EDIT:
Q: How do I know that the compiler is optimizing it to something like the example provided?
A: I'm not very good at reading assembler, I have looked at it however and have made 3 observations which make me believe that it's behaving this way:
As soon as optimization kicks in (adding more assignments usually does the trick) the pointer assignment has no associated assembler statement. It also hasn't been moved up to the declaration, so it's really left uninitialized it seems (at least to me).
In cases where the program crashes, the debugger skips the assignment statement. In cases where the program runs without problems, the debugger stops there.
If I watch the content of pData and the content of m_data while debugging, it clearly shows that all assignments in the if-branch have an effect on m_data and m_data receives the correct values. The pointer itself it still pointing to the same uninitialized value it had from the beginning. Therefore I have to assume that it is in fact not using the pointer to make the assignments at all.
Q: Does it have to do anything with i (Loop unrolling)?
A: No, the actual program actually uses do { ... } while() to loop over a SQL SELECT-resultset so the iteration count is completely runtime-specific and cannot be predetermined by the compiler.
It sure looks like an bug to me. It's fine for the optimizer to eliminate the unnecessary redirection, but it should not eliminate the assignment to pData.
Of course, you can work around the problem by assigning to pData before the loop (at least in this simple example). I gather that the problem in your actual code isn't as easily resolved.
I also vote for an optimizer bug if it is really reproducible in this example. To overrule the optimizer you could try to declare pData as volatile.

Why is return value of queue:front() valid after queue::pop()

I am new to C++ and ran into following supposedly bug, but somehow my program just works..
Here is the code
#include<iostream>
#include<queue>
#include <string>
int main()
{
string s ("cat");
queue<string> _queue;
_queue.push(s);
string & s1 = _queue.front();
_queue.pop();
// at this time s1 should become invalid as pop called destructor on s
std::cout << s1 << std::endl;
return 0;
}
It just works, even though s1 is a reference to an invalid object. Is there a way i can assert that s1 truely refers to an invalid object?
Trying to access a destroyed object the way you do it in your code results in undefined behavior. And no, there's no language-provided way to perform a run-time check for this situation. It is entirely your responsibility to make sure things like that do not happen in your code.
The fact that "it just works" in your experiment is just an accident (with certain degree of typical computer determinism, as usual). Something completely unrelated might change in your program, and this code will no longer "work".

Unexpected output

#include <iostream>
int main()
{
const int i=10;
int *p =(int *) &i;
*p = 5;
cout<<&i<<" "<<p<<"\n";
cout<<i<<" "<<*p;
return 0;
}
Output:
0x22ff44 0x22ff44
10 5
Please Explain.
Well, your code obviously contains undefined behaviour, so anything can happen.
In this case, I believe what happens is this:
In C++, const ints are considered to be compile-time constants. In your example, the compiler basically replaces your "i" with number 10.
You've attempted to modify a const object, so the behavior is
undefined. The compiler has the right to suppose that the const
object's value doesn't change, which probably explains the
symptoms you see. The compiler also has the right to put the
const object in read only memory. It generally won't do so for
a variable with auto lifetime, but a lot will if the const has
static lifetime; in that case, the program will crash (on most
systems).
I'll take a shot at it: since there's no logical reason for that output, the compiler must have optimised that wretched cout<<i<<" " to a simple "cout<<"10 ". But it's just a hunch.