What happens when I do int*p=p in c/cpp? - c++

Below code is getting compiled in MinGw. How does it get compiled? How is it possible to assign a variable which is not yet created?
int main()
{
int*p=p;
return 0;
}

How does it get compiled?
The point of declaration of a variable starts at the end of its declarator, but before its initialiser. This allows more legitimate self-referential declarations like
void * p = &p;
as well as undefined initialisations like yours.
How is it possible to assign a variable which is not yet created?
There is no assignment here, just initialisation.
The variable has been created (in the sense of having storage allocated for it), but not initialised. You initialise it from whatever indeterminate value happened to be in that storage, with undefined behaviour.
Most compilers will give a warning or error about using uninitialised values, if you ask them to.

Let's take a look at what happens with the int*p=p; statement:
The compiler allocates space on the stack to hold the yet uninitialized value of variable p
Then the compiler initializes p with its uninitialized value
So, essentially there should be no problem with the code except that it assigns a variable an uninitialized value.
Actually there is no much difference than the following code:
int *q; // define a pointer and do not initialize it
int *p = q; // assign the value of the uninitizlized pointer to another pointer

The likely result ("what it compiles to") will be the declaration of a pointer variable that is not initialized at all (which is subsequently optimized out since it is not used, so the net result would be "empty main").
The pointer is declared and initialized. So far, this is an ordinary and legal thing. However, it is initialized to itself, and its value is only in a valid, initialized state after the end of the statement (that is, at the location of the semicolon).
This, unsurprisingly, makes the statement undefined behavior.
By definition, invoking undefined behavior could in principle cause just about everything (although often quoted dramatic effects like formatting your harddrive or setting the computer on fire are exaggerated).
The compiler might actually generate an instruction that moves a register (or memory location) to itself, which would be a no-op instruction on most architectures, but could cause a hardware exception killing your process on some exotic architectures which have special validating registers for pointers (in case the "random" value is incidentially an invalid address).
The compiler will however not insert any "format harddisk" statements.
In practice, optimizing compilers will nowadays often assume "didn't happen" when they encounter undefined behavior, so it is most likely that the compiler will simply honor the declaration, and do nothing else.
This is, in every sense, perfectly allowable in the light of undefined behavior. Further, it is the easiest and least troublesome option for the compiler.

Related

why C++ recognizes an uninitialized raw-pointer as true?

Why the following code produces a seg-fault
//somewhere in main
...
int *pointer;
if(pointer)
cout << *pointer;
...
But slightly changed following code doesn't
//somewhere in main
...
int *pointer = nullptr;
if(pointer)
cout << *pointer;
...
the question is what in C++ makes an uninitialized pointer true - and leads to a crash
Why C++ Recognizes an uninitialized Raw Pointer (or say a daemon) as true?
The behaviour may appear to be so, because the behaviour of the program is undefined.
Why the following code produces a SEGMENTATION FAULT!!!
Because the behaviour of the program is undefined, and that is one of the possible behaviours.
But slightly changed following code doesn't
Because you don't read an indeterminate value in the changed program, and the behaviour of that program is well defined, and the defined behaviour is that the if-statement won't be entered.
In conclusion: Don't read uninitialised variables. Otherwise you'll end up with a broken, useless program.
Although a compiler isn't required to diagnose undefined behaviour for you, luckily high quality compilers are able to detect such simple mistake. Here is example output:
warning: 'pointer' is used uninitialized [-Wuninitialized]
if(pointer)
^~
Compilers generally cannot detect all complex violations. However, runtime sanitisers can detect even complex cases. Example output:
==1==WARNING: MemorySanitizer: use-of-uninitialized-value
Aside from reading uninitialised values, even if it was initialised, if (pointer) doesn't necessarily mean that you're allowed to indirect through the pointer. It only means that the pointer isn't null. Besides null, other pointer values can be unsafe to indirect through.
Because your unitialized Pointer gets implicitly converted to a boolean.
Where 0 converts to false and every other value to true.

What are the dangers of uninitialised variables?

In a program I am writing I currently have several uninitialised variables in my .h files, all of which are initialised at run-time. However, in Visual Studio it warns me every time I do this to "Always initialise a member variable" despite how seemingly pointless it feels to do so. I am well aware that attempting to use a variable when uninitialised will lead to undefined behaviour, but as far as I know, this can be avoided by not doing so. Am I overlooking something?
Thanks.
These variables could contain any value if you don't initialize them and reading them in an uninitialized stated is undefined behavior. (except if they are zero initalized)
And if you forgot to initialize one of them, and reading from it by accident results in the value you expect it should have on your current system configuration (due to undefined behavior), then your program might behave unpredictable/unexpected after a system update, on a different system or when you do changes in your code.
And these kinds of errors are hard to debug. So even if you set them at runtime it is suggested to initialize them to known values so that you have a controlled environment with predictable behavior.
There are a few exceptions, e.g. if you set the variable right after you declared it and you can't set it directly, like if you set its value using a streaming operator.
You have not included the source so we have to guess about why it happens, and I can see possible reasons with different solutions (except just zero-initializing everything):
You don't initialize at the start of the constructor, but you combine member initialization with some other code that calls some functions for the not fully initialized object. That's a mess - and you never know when some functions will call another function using some non-initialized member. If you really need this, don't send in the entire object - but only the parts you need (might need more refactoring).
You have the initialization in an Init-function. Just use the recent C++-feature of having one constructor call another instead.
You don't initialize some members in the constructor, but even later. If you really don't want to initialize it having a pointer (or std::unique_ptr) containing that data, and create it when needed; or don't have it in the object.
It's a safety measure to not allow uninitialized variables, witch is a good thing, but if you are sure of what you are doing and you make sure your variables are always initialzed before use, you can turn this off, right click on your project in solution explorer -> properties -> C/C++ -> SDL checks, this should be marked as NO. It comes as YES by default.
Note that these compile-time checks do more than just check for unitialized variables, so before you turn this off I advise reading https://learn.microsoft.com/en-us/cpp/build/reference/sdl-enable-additional-security-checks?view=vs-2019
You can also disable a specific warning in you code using warning pragma
Personally I keep these on because IMO in the tradeoff safety/annoyance I prefer safety, but I reckon that someone else can have a different opinion.
There are two parts to this question: first is reading uninitialized variables dangerous and second is defining variables uninitialized dangerous even if I make sure I never access uninitialized variables.
What are the dangers of accessing uninitialized variables?
With very few exceptions, accessing an uninitialized variable makes the whole program have Undefined Behavior. There is a common misconception (which unfortunately is taught) that uninitialized variables have "garbage values" and so reading an uninitialized variable will result in reading some value. This is completely false. Undefined Behavior means the program can have any behavior: it can crash, it can behave as the variable has some value, it can pretend the variable doesn't even exist or all sorts of weird behaviors.
For instance:
void foo();
void bar();
void test(bool cond)
{
int a; // uninitialized
if (cond)
{
a = 24;
}
if (a == 24)
{
foo();
}
else
{
bar();
}
}
What is the result of calling the above function with true? What about with false?
test(true) will cleary call foo().
What about test(false)? If you answer: "Well it depends on what garbage value is in variable a, if it is 24 it will call foo, else it will call bar" Then you are completely wrong.
If you call test(false) the program accesses an uninitialized variable and has Undefined Behavior, it is an illegal path and so the compilers are free to assume cond is never false (because otherwise the program would be illegal). And surprise surprise both gcc and clang with optimizations enabled actually do this and generate this assembly for the function:
test(bool):
jmp foo()
So don't do this! Never access uninitialized variable! It is undefined behavior and it's much much worse than "the variable has some garbage value". Furthermore, on your system could work as you expect, on other systems or with other compiler flags it can behave in unexpected ways.
What are the dangers of defining uninitialized variables if I make sure I always initialize them later, before accessing them?
Well, the program is correct from this respect, but the source code is prone to errors. You have to mentally burden yourself with always checking if somewhere you actually initialized the variable. And if you did forget to initialize a variable finding the bug will be difficult as you have a lot of variables in your code who are defined uninitialized.
As opposed, if you always initialize your variables you and the programmers after you have a much much easier job and ease of mind.
It's just a very very good practice.

Does taking address of member variable through a null pointer yield undefined behavior?

The following code (or its equivalent which uses explicit casts of null literal to get rid of temporary variable) is often used to calculate the offset of a specific member variable within a class or struct:
class Class {
public:
int first;
int second;
};
Class* ptr = 0;
size_t offset = reinterpret_cast<char*>(&ptr->second) -
reinterpret_cast<char*>(ptr);
&ptr->second looks like it is equivalent to the following:
&(ptr->second)
which in turn is equivalent to
&((*ptr).second)
which dereferences an object instance pointer and yields undefined behavior for null pointers.
So is the original fine or does it yield UB?
Despite the fact that it does nothing, char* foo = 0; *foo; is could be undefined behavior.
Dereferencing a null pointer is could be undefined behavior. And yes , ptr->foo is equivalent to (*ptr).foo, and *ptr dereferences a null pointer.
There is currently an open issue in the working groups about if *(char*)0 is undefined behavior if you don't read or write to it. Parts of the standard imply it is, other parts imply it is not. The current notes there seem to lean towards making it defined.
Now, this is in theory. How about in practice?
Under most compilers, this works because no checks are done at dereferencing time: memory around where null pointer point to is guarded against access, and the above expression simply takes an address of something around null, it does not read or write the value there.
This is why cpp reference offsetof lists pretty much that trick as a possible implementation. The fact that some (many? most? every one I've checked?) compilers implement offsetof in a similar or equivalent manner does not mean that the behavior is well defined under the C++ standard.
However, given the ambiguity, compilers are free to add checks at every instruction that dereferences a pointer, and execute arbitrary code (fail fast error reporting, for example) if null is indeed dereferenced. Such instrumentation might even be useful to find bugs where they occur, instead of where the symptom occurs. And on systems where there is writable memory near 0 such instrumentation could be key (pre-OSX MacOS had some writable memory that controlled system functions near 0).
Such compilers could still write offsetof that way, and introduce pragmas or the like to block the instrumentation in the generated code. Or they could switch to an intrinsic.
Going a step further, C++ leaves lots of latitude on how non-standard-layout data is arranged. In theory, classes could be implemented as rather complex data structures and not the nearly standard-layout structures we have grown to expect, and the code would still be valid C++. Accessing member variables to non-standard-layout types and taking their address could be problematic: I do not know if there is any guarantee that the offset of a member variable in a non-standard layout type does not change between instances!
Finally, some compilers have aggressive optimization settings that find code that executes undefined behavior (at least under certain branches or conditions), and uses that to mark that branch as unreachable. If it is decided that null dereference is undefined behavior, this could be a problem. A classic example is gcc's aggressive signed integer overflow branch eliminator. If the standard dictates something is undefined behavior, the compiler is free to consider that branch unreachable. If the null dereference is not behind a branch in a function, the compiler is free to declare all code that calls that function to be unreachable, and recurse.
And it would be free to do this in not the current, but the next version of your compiler.
Writing code that is standards-valid is not just about writing code that compiles today cleanly. While the degree to which dereferencing and not using a null pointer is defined is currently ambiguous, relying on something that is only ambiguously defined is risky.

Assigning a reference by dereferencing a NULL pointer

int& fun()
{
int * temp = NULL;
return *temp;
}
In the above method, I am trying to do the dereferencing of a NULL pointer. When I call this function it does not give exception. I found when return type is by reference it does not give exception if it is by value then it does. Even when dereferencing of NULL pointer is assinged to reference (like the below line) then also it does not give.
int* temp = NULL:
int& temp1 = *temp;
Here my question is that does not compiler do the dereferencing in case of reference?
Dereferencing a null pointer is Undefined Behavior.
An Undefined Behavior means anything can happen, So it is not possible to define a behavior for this.
Admittedly, I am going to add this C++ standard quote for the nth time, but seems it needs to be.
Regarding Undefined Behavior,
C++ Standard section 1.3.24 states:
Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
NOTE:
Also, just to bring it to your notice:
Using a returned reference or pointer to a local variable inside a function is also an Undefined Behavior. You should be allocating the pointer on freestore(heap) using new and then returning a reference/pointer to it.
EDIT:
As #James McNellis, appropriately points out in the comments,
If the returned pointer or reference is not used, the behavior is well defined.
When you dereference a null pointer, you don't necessarily get an exception; all that is guaranteed is that the behavior is undefined (which really means that there is no guarantee at all as to what the behavior is).
Once the *temp expression is evaluated, it is impossible to reason about the behavior of the program.
You are not allowed to dereference a null pointer, so the compiler can generate code assuming that you don't do that. If you do it anyway, the compiler might be nice and tell you, but it doesn't have to. It's your part of the contract that says you must not do it.
In this case, I bet the compiler will be nice and tell you the problem already at compile time, if you just set the warning level properly.
Don't * a null pointer, it's UB. (undefined behavior, you can never assume it'll do anything short of lighting your dog on fire and forcing you to take shrooms which will lead to FABULOUS anecdotes)
Some history and information of null pointers in the Algol/C family: http://en.wikipedia.org/wiki/Pointer_(computing)#Null_pointer
Examples and implications of undefined behavior: http://en.wikipedia.org/wiki/Undefined_behavior#Examples_in_C
I don't sure I understand what you're trying todo. Dereferencing of ** NULL** pointer is not defined.
In case you want to indicate that you method not always returns value you can declare it as:
bool fun(int &val);
or stl way (similar to std::map insert):
std::pair<int, bool> fun();
or boost way:
boost::optional<int> fun();

Confusing output?

I fail to understand is why does the code print '3' in VS2010 (release build), whether I leave the declaration of 'r' or comment it out.
int main(){
int arr1[2];
int &r = arr1[0];
int arr2[2];
cout << (&arr1[1] - &arr2[0]);
}
So, three questions:
a. why does the code print 3?
b. why does it print 3 even if the declaration of 'r' is present? (Is it because that in C++ whether a reference occupies storage or not is implementation defined?)
c. Does this code have undefined behavior or implementation defined behavior?
Because in Release build r variable is removed. Unused variable of built-in type is removed, because Release build is done with optimizations. Try to use it later, and result will change. Some variables may be placed into CPU register and not on the stack, this also changes the distance between another local variables.
On the other hand, unused class instance is not removed, because class instance creation may have side effects, since constructor is invoked.
This is both undefined and implementation-defined behavior, because compiler is free to place variables in any place where it is appropriate.
a. Order of the variables in memory are from
arr2[0]
arr2[1]
arr1[0]
arr1[1]
the code prints 3 because its using pointer arithmetic. Subtracting &arr1[1] from &arr2[0] means a difference of 3 int's.
b. Since r is never referenced, the C++ compiler is free to optimize it out.
c. Not positive but I don't believe the C++ standard defines an explicit order to variables on a stack. Therefore the compiler is free to reorder these variables, even putting extra space between them as it see's fit. So, yes its implementation specific. A different compiler could have just as easily given -1 as the answer.
&arr1[1] - &arr2[0]
Pointer arithmetic is only well-defined within the same array. It does not matter what you think this code snippet does or should do, you are invoking undefined behavior. Your program could do anything.