What is the meaning of
*(int *)0 = 0;
It does compile successfully
It has no meaning. That's an error. It's parsed as this
(((int)0) = 0)
Thus, trying to assign to an rvalue. In this case, the right side is a cast of 0 to int (it's an int already, anyway). The result of a cast to something not a reference is always an rvalue. And you try to assign 0 to that. What Rvalues miss is an object identity. The following would work:
int a;
(int&)a = 0;
Of course, you could equally well write it as the following
int a = 0;
Update: Question was badly formatted. The actual code was this
*(int*)0 = 0
Well, now it is an lvalue. But a fundamental invariant is broken. The Standard says
An lvalue refers to an object or function
The lvalue you assign to is neither an object nor a function. The Standard even explicitly says that dereferencing a null-pointer ((int*)0 creates such a null pointer) is undefined behavior. A program usually will crash on an attempt to write to such a dereferenced "object". "Usually", because the act of dereferencing is already declared undefined by C++.
Also, note that the above is not the same as the below:
int n = 0;
*(int*)n = 0;
While the above writes to something where certainly no object is located, this one will write to something that results from reinterpreting n to a pointer. The mapping to the pointer value is implementation defined, but most compilers will just create a pointer referring to address zero here. Some systems may keep data on that location, so this one may have more chances to stay alive - depending on your system. This one is not undefined behavior necessarily, but depends on the compiler and runtime-environment it is invoked in.
If you understand the difference between the above dereference of a null pointer (only constant expressions valued 0 converted to pointers yield null pointers!) and the below dereference of a reinterpreted zero value integer, i think you have learned something important.
It will usually cause an access violation at runtime. The following is done: first 0 is cast to an int * and that yields a null pointer. Then a value 0 is written to that address (null address) - that causes undefined behaviour, usually an access violation.
Effectively it is this code:
int* address = reinterpret_cast<int*>( 0 );
*address = 0;
Its a compilation error. You cant modify a non-lvalue.
It puts a zero on address zero. On some systems you can do this. Most MMU-based systems will not allow this in run-time. I once saw an embedded OS writing to address 0 when performing time(NULL).
there is no valid lvalue in that operation so it shouldn't compile.
the left hand side of an assignment must be... err... assignable
Related
I never thought I will be going to ask this question but I have no idea why this happens.
const int a = 3;
int *ptr;
ptr = (int*)( &a );
printf( "A=%d\n", &a );
*ptr = 5;
printf( "A=%d\n", ptr );
printf( "A=%d\n", a );
printf( "A=%d\n", *ptr );
Output
A=6945404
A=6945404
A=3
A=5
How can this happen? How can one memory location hold two different values? I searched around and all I find is undefined behavior is undefined. Well that does not make any sense. There must be an explanation.
Edit
I get it, Marks answer makes alot of sense but still I wonder that const was added into the language so that user does not change the value unintentionally. I get that old compilers allows you to do that but I tried this on VS 2012 and I got the same behavior. Then again as haccks said, one memory location can't hold two values it looks like it does, then where is the second value stored?
The optimizer can determine that a is a constant value, and replace any reference to it with the literal 3. That explains what you see, although there's no guarantee that's what's actually happening. You'd need to study the generated assembly output for that.
Modifying a const variable through a non-const pointer results in undefined behavior. Most ikely the optimizer is substituting the original value in this line:
printf( "A=%d\n", a );
Look at the disassembly to verify this.
The C Standard, subclause 6.7.3, paragraph 6 [ISO/IEC 9899:2011], states:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
In fact your program invokes undefined behavior because of two reasons:
1.You are printing an address with wrong specifier %d. Correct specifier for that is %p.
2.You are modifying a variable with const specifier.
If the behavior is undefined then anything could happen. You may get either expected or unexpected result.
Standard says about it;
3.4.3 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
The problem is that the type of ptr is "pointer to int" not "pointer to const int".
You are then casting the address of 'a' (a const int) to be of type "pointer to int" and storing that address in ptr. The effect of this is that you are casting away the const-ness of a const variable.
This results in undefined behavior so your results may vary from compiler to compiler.
It is possible for the compiler to store 'a' in program ROM since it knows 'a' is a const value that can never be changed. When you lie to the compiler and cast away the const-ness of 'a' so that you can modify it through ptr, it may be invalid for ptr to actually modify the value of 'a' since that data may be stored in program ROM. Instead of giving you a crash, this compiler this time decided to point ptr to a different location with a different value this time. But anything could have happened since this behavior is undefined.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have just started using c++ and saw that their is a null value for pointers. I am curious as to what this is used for. It seems like it would be pointless to add a pointer to point to nothing.
Well, the null pointer value has the remarkable property that, despite it being a well-defined and unique constant value, the exact value depending on machine-architecture and ABI (on most modern ones all-bits-zero, not that it matters), it never points to (or just behind) an object.
This allows it to be used as a reliable error-indicator when a valid pointer is expected (functions might throw an exception or terminate execution instead), as well as a sentinel value, or to mark the absence of something optional.
On many implementations accessing memory through a nullpointer will reliably cause a hardware exception (some even trap on arithmetic), though on many others, especially those without paging and / or segmentation it will not.
Generally it's a placeholder. If you just declare a pointer, int *a;, there's no guarantee what is in the pointer when you want to access it. So if your code may or may not set the pointer later, there's no way to tell if the pointer is valid or just pointing to garbage memory. But if you declare it as NULL, such as int *a = NULL; you can then check later to see if the pointer was set, like if(a == NULL).
Most of the time during initialization we assign null value to a pointer so that we can check whether it is still null or a address has been assign to it or not.
It seems like it would be pointless to add a pointer to point to
nothing.
No, it is not. Suppose you have a function returning optional dynamically allocated value. When you want to return "nothing" you return null. The caller can check for null and distinguish between 2 different cases: when the return value is "nothing" and when the return value is some valid usable object.
null value in C and C++ is equal to 0. But nullptr in C++ is different from it, nullptr is always a pointer type in C++. We assign a null value to a pointer variable for various reason.
To check whether a memory has been allocated to the pointer or not
To neutralize a dangling pointer so that it should not create any side effect
To check whether a return address is a valid address or not etc.
Most of the time during initialization we assign null value to a pointer so that we can check whether it is still null or a address has been assign to it or not.
Basically, pointers are just integers. The null pointer is a pointer with a value of 0. It doesn't strictly point to nothing, it points to absolute address 0, which generally isn't accessible to your program; dereferencing it causes a fault.
It's generally used as a flag value, so that you can, for example, use it to end a loop.
Update:
There seem to be a lot of people confused by this answer, which is, strictly, completely correct. See C11(ISO/IEC 9899:201x) §6.3.2.3 Pointers Section 3:
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
So, what's an address? It's a number n where 0 ≤ n ≤ max_address. And how do we represent such a number? Why, it's an integer, just like the standard says.
The C11 standard makes it clear that there's never anything to reference at address 0, because in some old pathologically non-portable code in BSD 4.2, you often saw code like this:
/* DON'T TRY THIS AT HOME */
int
main(){
char target[100] ;
char * tp = &target ;
char * src = "This won't do what you think." ;
void exit(int);
while((*tp++ = *src++))
;
exit(0);
}
This is still valid C:
$ gcc -o dumb dumb.c
dumb.c:6:12: warning: incompatible pointer types initializing 'char *' with an
expression of type 'char (*)[100]' [-Wincompatible-pointer-types]
char * tp = &target ;
^ ~~~~~~~
1 warning generated.
$
In 4.2BSD on a VAX, you could get away with that nonsense, because address 0 reliably contained the value 0, so the assignment evaluated to 0, which is of course FALSE.
Now, to demonstrate:
/* Very simple program dereferencing a NULL pointer. */
int
main() {
int * a_pointer ;
int a_value ;
void exit(int); /* To avoid any #includes */
a_pointer = ((void*)0);
a_value = *a_pointer ;
exit(0);
}
Here's the results:
$ gcc -o null null.c
$ ./null
Segmentation fault: 11
$
This question already has answers here:
Is incrementing a null pointer well-defined?
(9 answers)
Closed 7 years ago.
This is a related question to the discussion around Example of error caused by UB of incrementing a NULL pointer
Suppose I define this data structure:
union UPtrMem
{
void* p;
char ach[sizeof(void*)];
}
UPtrMem u;
u.p = nullptr;
u.p++; // UB according to standards
u.ach[0]++; // why is this OK then??
p and ach share the same memory, so is merely the act of modifying a memory location (that happens to contain a pointer) UB? I would think it only gets undefined once you try to dereference the pointer.
This is still UB because
it's undefined behavior to read from the member of the union that wasn't most recently written.
(from here). So you have UB, regardless of the value of p. To conclude:
why is this OK then??
It is not.
Your example doesn't contain any UB, because you don't get that far: it's invalid code that just won't compile.
To have the kind of UB you're thinking about, the title's “UB when manipulating nullptr”, you need to have the code executed.
That doesn't happen when it doesn't compile.
Just in case the question is changed after I answer, which isn't uncommon with these kinds of apparently designed-to-trap-the-responder questions, this is the code presented as I'm writing this:
union UPtrMem
{
void* p;
char ach[sizeof(void*)];
}
UPtrMem u;
u.p = nullptr;
u.p++; // UB according to standards
u.ach[0]++; // why is this OK then??
Incrementing a void* is just invalid, not a supported operation, and won't compile.
The reason why the standard makes incrementing a null pointer undefined is because it is not always the case that a null pointer contains an arithmetically meaningful value like 0. It could contain a specific bit pattern that indicates non-addressable memory to the CPU.
Your example has other problems too.
When you increment an allocated pointer it adds the size of the thing it points to to its value.
So on a 32bit computer and int* will likely advance 4 places (sizeof(int)) when you add 1 to it.
The problem with void* is the compiler has no size information and so can not know how far to increment its value.
In your example you then do this:
u.ach[0]++;
That doesn't increment a pointer at all, it increments whatever char value is contained in the first element of the char array. This, of course, is undefined so, even though it works, you can not rely on it having any specific value.
Seems to me u.p++; isn't even valid because void has no size so - nothing to increment. But u.ach[0]++; is valid because your incrementing a char.
edit yes it takes up space in the structure... but what it points to has no size... what would it increment by?
I want to parse UTF-8 in C++. When parsing a new character, I don't know in advance if it is an ASCII byte or the leader of a multibyte character, and also I don't know if my input string is sufficiently long to contain the remaining characters.
For simplicity, I'd like to name the four next bytes a, b, c and d, and because I am in C++, I want to do it using references.
Is it valid to define those references at the beginning of a function as long as I don't access them before I know that access is safe? Example:
void parse_utf8_character(const string s) {
for (size_t i = 0; i < s.size();) {
const char &a = s[i];
const char &b = s[i + 1];
const char &c = s[i + 2];
const char &d = s[i + 3];
if (is_ascii(a)) {
i += 1;
do_something_only_with(a);
} else if (is_twobyte_leader(a)) {
i += 2;
if (is_safe_to_access_b()) {
do_something_only_with(a, b);
}
}
...
}
}
The above example shows what I want to do semantically. It doesn't illustrate why I want to do this, but obviously real code will be more involved, so defining b,c,d only when I know that access is safe and I need them would be too verbose.
There are three takes on this:
Formally
well, who knows. I could find out for you by using quite some time on it, but then, so could you. Or any reader. And it's not like that's very practically useful.
EDIT: OK, looking it up, since you don't seem happy about me mentioning the formal without looking it up for you. Formally you're out of luck:
N3280 (C++11) §5.7/5 “If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.”
Two situations where this can produce undesired behavior: (1) computing an address beyond the end of a segment, and (2) computing an address beyond an array that the compiler knows the size of, with debug checks enabled.
Technically
you're probably OK as long as you avoid any lvalue-to-rvalue conversion, because if the references are implemented as pointers, then it's as safe as pointers, and if the compiler chooses to implement them as aliases, well, that's also ok.
Economically
relying needlessly on a subtlety wastes your time, and then also the time of others dealing with the code. So, not a good idea. Instead, declare the names when it's guaranteed that what they refer to, exists.
Before going into the legality of references to unaccessible memory, you have another problem in your code. Your call to s[i+x] might call string::operator[] with a parameter bigger then s.size(). The C++11 standard says about string::operator[] ([string.access], §21.4.5):
Requires: pos <= size().
Returns: *(begin()+pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.
This means that calling s[x] for x > s.size() is undefined behaviour, so the implementation could very well terminate your program, e.g. by means of an assertion, for that.
Since string is now guaranteed to be continous, you could go around that problem using &s[i]+x to get an address. In praxis this will probably work.
However, strictly speaking doing this is still illegal unfortunately. The reason for this is that the standard allows pointer arithmetic only as long as the pointer stays inside the same array, or one past the end of the array. The relevant part of the (C++11) standard is in [expr.add], §5.7.5:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Therefore generating references or pointers to invalid memory locations might work on most implementations, but it is technically undefined behaviour, even if you never dereference the pointer/use the reference. Relying on UB is almost never a good idea , because even if it works for all targeted systems, there are no guarantees about it continuing to work in the future.
In principle, the idea of taking a reference for a possibly illegal memory address is itself perfectly legal. The reference is only a pointer under the hood, and pointer arithmetic is legal until dereferencing occurs.
EDIT: This claim is a practical one, not one covered by the published standard. There are many corners of the published standard which are formally undefined behaviour, but don't produce any kind of unexpected behaviour in practice.
Take for example to possibility of computing a pointer to the second item after the end of an array (as #DanielTrebbien suggests). The standard says overflow may result in undefined behaviour. In practice, the overflow would only occur if the upper end of the array is just short of the space addressable by a pointer. Not a likely scenario. Even when if it does happen, nothing bad would happen on most architectures. What is violated are certain guarantees about pointer differences, which don't apply here.
#JoSo If you were working with a character array, you can avoid some of the uncertainty about reference semantics by replacing the const-references with const-pointers in your code. That way you can be certain no compiler will alias the values.
Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or reinterpret_cast?
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
char& c2 = reinterpret_cast<char&>(pc);
}
The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.
My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:
#include <iostream>
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
std::cout << *pc << "\n";
c1 = 'B';
std::cout << *pc << "\n";
}
With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are not.
Ideas? Explanations? Standard quotes?
Well, that's the purpose of reinterpret_cast! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast you can always cast an lvalue of one type to a reference of another type.
This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x) is the same thing as *reinterpret_cast<T*>(&x).
The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x) interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double, a pointer, an int or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.
As for the C-style cast - it just gets interpreted as reinterpret_cast in this context, so the above immediately applies to it.
In your second example you attached reference c to the memory occupied by pointer variable pc. When you did c = 'B', you forcefully wrote the value 'B' into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.
One can rewrite your second program into an equivalent program without references
int main(){
char* pc = new char('A');
char* c = (char *) &pc;
std::cout << *pc << "\n";
*c = 'B';
std::cout << *pc << "\n";
}
From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.
It took me a while to grok it, but I think I finally got it.
The C++ standard specifies that a cast reinterpret_cast<U&>(t) is equivalent to *reinterpret_cast<U*>(&t).
In our case, U is char, and t is char*.
Expanding those, we see that the following happens:
we take the address of the argument to the cast, yielding a value of type char**.
we reinterpret_cast this value to char*
we dereference the result, yielding a char lvalue.
reinterpret_cast allows you to cast from any pointer type to any other pointer type. And so, a cast from char** to char* is well-formed.
I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.
C didn't have reference types, it only had values and pointer types (addresses) - since, physically in memory, we only have values and addresses.
In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.
Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x; and int& y{x}; then we really have a int* y_ptr = &x; and if we say y = 123; we merely mean *(y_ptr) = 123;. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;) what actually happens is that a is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2.
(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)
Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>().
PS - std::ref is also really just a pointer when you drill down into it.
Its allowed because C++ allows pretty much anything when you cast.
But as for the behavior:
pc is a 4 byte pointer
(char)pc tries to interpret the pointer as a byte, in particular the last of the four bytes
(char&)pc is the same, but returns a reference to that byte
When you first print pc, nothing has happened and you see the letter you stored
c = 'B' modifies the last byte of the 4 byte pointer, so it now points to something else
When you print again, you are now pointing to a different location which explains your result.
Since the last byte of the pointer is modified the new memory address is nearby, making it unlikely to be in a piece of memory your program isn't allowed to access. That's why you don't get a seg-fault. The actual value obtained is undefined, but is highly likely to be a zero, which explains the blank output when its interpreted as a char.
when you're casting, with a C-style cast or with a reinterpret_cast, you're basically telling the compiler to look the other way ("don't you mind, I know what I'm doing").
C++ allows you to tell the compiler to do that. That doesn't mean it's a good idea...