I am new to programming and am currently studying about address typecasting. I don't seem to understand why I am getting this : *** stack smashing detected ***: terminated Aborted (core dumped) when I run the following code??
#include<iostream>
using namespace std;
void updateValue(int *p){
*p = 610 % 255;
}
int main(){
char ch = 'A';
updateValue((int*)&ch);
cout << ch;
}
Here's what I understand about the code:
The address of ch is typecasted to int* and passed into the function updateValue(). Now, inside the updateValue() stack, an integer pointer p is created which points to ch. When p is dereferenced, it interprets ch as an int and reads 4(or 8) bytes of contiguous memory instead of 1. So, 'A'(65) along with some garbage value gets assigned to 610%255 i.e. 20.
But I don't understand, what and where things are going wrong?
when p is dereferenced, it interprets ...
When you indirect through the reinterpreted p and access an object of the wrong type, the behaviour of the program is undefined.
what and where things are going wrong?
Things started going wrong when you reinterpreted a pointer to one type as a pointer to an unrelated type.
Some rules of thumb:
Don't use reinterpret casts until you know what it does. They are difficult to get right, and are rarely useful.
Don't use reinterpret casts when it would result in undefined behaviour.
Don't use C-style casts at all.
If you think that you need to reinterpret cast, then take a few steps back, and consider why you think that you need it.
The problem is that you're typecasting a char* to an int* and then dereferencing p which leads to undefined behavior.
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior. The program may just crash.
So the output that you're seeing(maybe seeing) is a result of undefined behavior. And as i said don't rely on the output of a program that has UB. The program may just crash which happens in your case.
For example, here the program crashes, but here it doesn't crash.
So the first step to make the program correct would be to remove UB. Then and only then you can start reasoning about the output of the program.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.
But I don't understand, what and where things are going wrong?
In this statement
*p = 610 % 255;
the memory that does not belong to the object ch that has the type char is overwritten. That is instead of one byte occupied by the object ch there are overwritten 4 bytes that correspond to the allocated memory for an object of the type int.
Related
This question already has answers here:
Where exactly does C++ standard say dereferencing an uninitialized pointer is undefined behavior?
(7 answers)
Closed 4 years ago.
#include <iostream>
using namespace std;
int main() {
int *a;
int *b;
int c=12;
a=&c;
*b=*b;
cout << *b << endl;
}
The above code works fine, but the following code returns a segmentation fault error
#include <iostream>
using namespace std;
int main() {
int *a;
int *b;
int c=12;
//a=&c;
*b=*b;
cout << *b << endl;
}
Why?
gcc (Ubuntu 8.2.0-7ubuntu1) 8.2.0
Here's a step-by-step breakdown of what your code is doing:
int *a;
int *b;
This declares two pointers to int named a and b. It does not initialize them. That means that their values are unspecified, and you should expect them to be complete garbage. You can think of them as "wild" pointers at this moment, which is to say that they don't point to valid objects, and dereferencing them will cause Undefined Behavior and introduce a plethora of weird bugs, if not a simple crash.
int c=12;
This creates a simple local variable c of type int, which is initialized, with a value of 12. If you hadn't initialized it, as in int c; then it would also be full of garbage.
a=&c;
This snippet sets the pointer a to point to c, which is to say that the address of c is assigned to a. Now a is no longer uninitialized, and points to a well-defined location. After this, you can safely dereference a and be assured that there is a valid int at the other end.
*b=*b;
Here, you are dereferencing b, which means that you are reaching into your programs memory to grab whatever is pointed to by b. But b is uninitialized; it is garbage. What is it pointing to? Who knows? To read from the address it points to is like Russian roulette, you might kill your program immediately if you get really unlucky and the Operating System or runtime environment notices you doing something that's obviously wrong. But you also might get away with it, only for weird and unpredictable bugs to emerge later. This weirdness and unpredictability is why a good C++ programmer avoids Undefined Behavior at all costs, and ensures that variables are initialized before they are used, and makes sure that pointers are pointing to valid objects before dereferencing them.
Why does there appear to be a difference depending on a=&c;?
As to why your program apparently crashes or doesn't crash depending on how you initialize the other pointer, the answer is that it doesn't matter. In both cases, you're causing Undefined Behavior; you are breaking the language's rules and you should not expect the language to behave correctly for you thereafter, and all bets are off.
"*b=*b" - Is broken code. b is uninitialised, so you are dereferencing an uninitialised pointer. That's Undefined Behaviour. The program is broken and has no meaning according to the standard and the compiler is allowed to generate whatever code it feels like (no diagnostic required).
To be clear, I am aware of pointer to pointer concept in C and of dereferencing double, triple pointers. The only doubt I have is in the following program which I wrote:
#include<stdio.h>
int main(){
int a;
int* p;
a=5;
p=&a;
int **q;
printf("*p=%d\n",*p);
printf("*q=%d\n",*q);
}
Now I know, the program is quite stupid and makes no sense, but that's not the problem. The question is WHY?
Why is the Output like this:
*p=5
*q=1
Why is *q=1 on every run?
Also it is to keep in mind, if I now declare a ***r;
And add the following line:
printf("*r=%d\n",*r);
Now the output is :
*p=5
*q=-40821602 //garbage
*r=1
Now, *r=1. WHY?
Same goes for ****s. In that case, *q,*r is a garbage and *s=1. Why?
Evaluation of an uninitialized pointer is undefined behavior, and that is what you did. cppreference states the following on undefined behavior:
Undefined Behavior:
undefined behavior - there are no restrictions on the behavior of the program. Examples of undefined behavior are memory accesses outside of array bounds, signed integer overflow, null pointer dereference, modification of the same scalar more than once in an expression without sequence points, access to an object through a pointer of a different type, etc. Compilers are not required to diagnose undefined behavior (although many simple situations are diagnosed), and the compiled program is not required to do anything meaningful.
Hence, you cannot expect anything meaningful coming out of your program. It may print 1, but it could also do anything else. In my case, for example, it simply crashed.
So the question "why?" is simply not a valid question.
From C++ Primer 18.1.1:
If the [thrown] expression has an array or function type, the expression is
converted to its corresponding pointer type.
How is it that this program can produce correct output of 9876543210 (g++ 5.2.0)?
#include <iostream>
using namespace std;
int main(){
try{
int a[10] = {9,8,7,6,5,4,3,2,1,0};
throw a;
}
catch(int* b) { for(int i = 0; i < 10; ++i) cout << *(b+i); }
}
From the quote, throw a would create an exception object of type int* which is a pointer to the first element of the array. But surely the array elements of a would be destroyed when we exit the try block and enter the catch clause since we change block scope? Am I getting a false positive or are the array elements "left alone" (not deleted) during the duration of the catch clause?
Dereferencing a dangling pointer is Undefined Behaviour.
When a program encounters Undefined Behaviour, it is free to do anything. Which includes crashing and making daemons fly out of your nose, but it also includes doing whatever you expected.
In this case, there are no intervening operations that would overwrite that part of memory and access past the stack pointer (below it, because stack grows down on most platforms) is normally not checked.
So yes, this is a false positive. The memory is not guaranteed to contain the values any more and is not guaranteed to be accessible at all, but it still happens to contain them and be accessible.
Also note, that gcc optimizations are rather infamous for relying on the program not invoking undefined behaviour. Quite often if you have an undefined behaviour, the unoptimized version appears to work, but once you turn on optimizations, it starts doing something completely unexpected.
But surely the array elements of a would be destroyed when we exit the try block and enter the catch clause since we change block scope?
Correct.
Am I getting a false positive or are the array elements "left alone" (not deleted) during the duration of the catch clause?
They're not "left alone". The array is destroyed as you correctly assumed. The program is accessing invalid memory. The behaviour is undefined.
How is it that this program can produce correct output of 9876543210 (g++ 5.2.0)?
There is no correct output when the behaviour is undefined. The program can produce what it did because it can produce any output when the behaviour is undefined.
Reasoning about UB is usually pointless, but in this case, the contents of the invalid memory could be identical if the part of the memory had not yet been overwritten by the program.
You will find later in this chapter a Warning:
Throwing a pointer requires that the object to which the pointer
points exist wherever the corresponding handler resides.
so your example would be valid if a array were static or global, otherwise its UB. Or (as Jan Hudec writes in comment) in the enclosing block of the try/catch statement.
Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or reinterpret_cast?
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
char& c2 = reinterpret_cast<char&>(pc);
}
The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.
My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:
#include <iostream>
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
std::cout << *pc << "\n";
c1 = 'B';
std::cout << *pc << "\n";
}
With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are not.
Ideas? Explanations? Standard quotes?
Well, that's the purpose of reinterpret_cast! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast you can always cast an lvalue of one type to a reference of another type.
This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x) is the same thing as *reinterpret_cast<T*>(&x).
The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x) interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double, a pointer, an int or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.
As for the C-style cast - it just gets interpreted as reinterpret_cast in this context, so the above immediately applies to it.
In your second example you attached reference c to the memory occupied by pointer variable pc. When you did c = 'B', you forcefully wrote the value 'B' into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.
One can rewrite your second program into an equivalent program without references
int main(){
char* pc = new char('A');
char* c = (char *) &pc;
std::cout << *pc << "\n";
*c = 'B';
std::cout << *pc << "\n";
}
From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.
It took me a while to grok it, but I think I finally got it.
The C++ standard specifies that a cast reinterpret_cast<U&>(t) is equivalent to *reinterpret_cast<U*>(&t).
In our case, U is char, and t is char*.
Expanding those, we see that the following happens:
we take the address of the argument to the cast, yielding a value of type char**.
we reinterpret_cast this value to char*
we dereference the result, yielding a char lvalue.
reinterpret_cast allows you to cast from any pointer type to any other pointer type. And so, a cast from char** to char* is well-formed.
I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.
C didn't have reference types, it only had values and pointer types (addresses) - since, physically in memory, we only have values and addresses.
In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.
Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x; and int& y{x}; then we really have a int* y_ptr = &x; and if we say y = 123; we merely mean *(y_ptr) = 123;. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;) what actually happens is that a is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2.
(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)
Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>().
PS - std::ref is also really just a pointer when you drill down into it.
Its allowed because C++ allows pretty much anything when you cast.
But as for the behavior:
pc is a 4 byte pointer
(char)pc tries to interpret the pointer as a byte, in particular the last of the four bytes
(char&)pc is the same, but returns a reference to that byte
When you first print pc, nothing has happened and you see the letter you stored
c = 'B' modifies the last byte of the 4 byte pointer, so it now points to something else
When you print again, you are now pointing to a different location which explains your result.
Since the last byte of the pointer is modified the new memory address is nearby, making it unlikely to be in a piece of memory your program isn't allowed to access. That's why you don't get a seg-fault. The actual value obtained is undefined, but is highly likely to be a zero, which explains the blank output when its interpreted as a char.
when you're casting, with a C-style cast or with a reinterpret_cast, you're basically telling the compiler to look the other way ("don't you mind, I know what I'm doing").
C++ allows you to tell the compiler to do that. That doesn't mean it's a good idea...
What is the meaning of
*(int *)0 = 0;
It does compile successfully
It has no meaning. That's an error. It's parsed as this
(((int)0) = 0)
Thus, trying to assign to an rvalue. In this case, the right side is a cast of 0 to int (it's an int already, anyway). The result of a cast to something not a reference is always an rvalue. And you try to assign 0 to that. What Rvalues miss is an object identity. The following would work:
int a;
(int&)a = 0;
Of course, you could equally well write it as the following
int a = 0;
Update: Question was badly formatted. The actual code was this
*(int*)0 = 0
Well, now it is an lvalue. But a fundamental invariant is broken. The Standard says
An lvalue refers to an object or function
The lvalue you assign to is neither an object nor a function. The Standard even explicitly says that dereferencing a null-pointer ((int*)0 creates such a null pointer) is undefined behavior. A program usually will crash on an attempt to write to such a dereferenced "object". "Usually", because the act of dereferencing is already declared undefined by C++.
Also, note that the above is not the same as the below:
int n = 0;
*(int*)n = 0;
While the above writes to something where certainly no object is located, this one will write to something that results from reinterpreting n to a pointer. The mapping to the pointer value is implementation defined, but most compilers will just create a pointer referring to address zero here. Some systems may keep data on that location, so this one may have more chances to stay alive - depending on your system. This one is not undefined behavior necessarily, but depends on the compiler and runtime-environment it is invoked in.
If you understand the difference between the above dereference of a null pointer (only constant expressions valued 0 converted to pointers yield null pointers!) and the below dereference of a reinterpreted zero value integer, i think you have learned something important.
It will usually cause an access violation at runtime. The following is done: first 0 is cast to an int * and that yields a null pointer. Then a value 0 is written to that address (null address) - that causes undefined behaviour, usually an access violation.
Effectively it is this code:
int* address = reinterpret_cast<int*>( 0 );
*address = 0;
Its a compilation error. You cant modify a non-lvalue.
It puts a zero on address zero. On some systems you can do this. Most MMU-based systems will not allow this in run-time. I once saw an embedded OS writing to address 0 when performing time(NULL).
there is no valid lvalue in that operation so it shouldn't compile.
the left hand side of an assignment must be... err... assignable