Address held by pointer changes after pointer is deleted - c++

In the following code, why is the address held by pointer x changing after the delete? As I understand, the deletecall should free up allocated memory from heap, but it shouldn't change the pointer address.
using namespace std;
#include <iostream>
#include <cstdlib>
int main()
{
int* x = new int;
*x = 2;
cout << x << endl << *x << endl ;
delete x;
cout << x << endl;
system("Pause");
return 0;
}
OUTPUT:
01103ED8
2
00008123
Observations: I'm using Visual Studio 2013 and Windows 8. Reportedly this doesn't work the same in other compilers. Also, I understand this is bad practice and that I should just reassign the pointer to NULL after it's deletion, I'm simply trying to understand what is driving this weird behaviour.

As I understand, the deletecall should free up allocated memory from heap, but it shouldn't change the pointer address.
Well, why not? It's perfectly legal output -- reading a pointer after having deleted it leads to undefined behavior. And that includes the pointer's value changing. (In fact, that doesn't even need UB; a deleted pointer can really point anywhere.)

Having read relevant bits of both C++98 and C++11 [N3485], and all the stuff H2CO3 pointed to:
Neither edition of the standard adequately describes what an "invalid pointer" is, under what circumstances they are created, or what their semantics are. Therefore, it is unclear to me whether or not the OP's code was intended to provoke undefined behavior, but de facto it does (since anything that the standard does not clearly define is, tautologically, undefined). The text is improved in C++11 but is still inadequate.
As a matter of language design, the following program certainly does exhibit unspecified behavior as marked, which is fine. It may, but should not also exhibit undefined behavior as marked; in other words, to the extent that this program exhibits undefined behavior, that is IMNSHO a defect in the standard. Concretely, copying the value of an "invalid" pointer, and performing equality comparisons on such pointers, should not be UB. I specifically reject the argument to the contrary from hypothetical hardware that traps on merely loading a pointer to unmapped memory into a register. (Note: I cannot find text in C++11 corresponding to C11 6.5.2.3 footnote 95, regarding the legitimacy of writing one union member and reading another; this program assumes that the result of this operation is unspecified but not undefined (except insofar as it might involve a trap representation), as it is in C.)
#include <string.h>
#include <stdio.h>
union ptr {
int *val;
unsigned char repr[sizeof(int *)];
};
int main(void)
{
ptr a, b, c, d, e;
a.val = new int(0);
b.val = a.val;
memcpy(c.repr, a.repr, sizeof(int *));
delete a.val;
d.val = a.val; // copy may, but should not, provoke UB
memcpy(e.repr, a.repr, sizeof(int *));
// accesses to b.val and d.val may, but should not, provoke UB
// result of comparison is unspecified (may, but should not, be undefined)
printf("b %c= d\n", b.val == d.val ? '=' : '!');
// result of comparison is unspecified
printf("c %c= e\n", memcmp(c.repr, e.repr, sizeof(int *)) ? '!' : '=');
}
This is all of the relevant text from C++98:
[3.7.3.2p4] If the argument given to a deallocation function in the standard library
is a pointer that is not the null pointer value (4.10), the deallocation function
shall deallocate the storage referenced by the pointer, rendering invalid all
pointers referring to any part of the deallocated storage. The effect of using
an invalid pointer value (including passing it to a deallocation function) is
undefined. [footnote: On some implementations, it causes a system-generated
runtime fault.]
The problem is that there is no definition of "using an invalid pointer value", so we get to argue about what qualifies. There is a clue to the committee's intent in the discussion of iterators (a category which is defined to include bare pointers):
[24.1p5] ... Iterators can also have singular values that are not associated with any container. [Example: After the declaration of an uninitialized pointer x (as with int* x; [sic]), x must always be assumed to have a singular value of a pointer.] Results of most expressions are undefined for singular values; the only exception is an assignment of a non-singular value to an iterator that holds a singular value. In this case the singular value is overwritten the same way as any other value. Dereferenceable and past-the-end values are always non-singular.
It seems at least plausible to assume that an "invalid pointer" is also meant to be an example of a "singular iterator", but there is no text to back this up; going in the opposite direction, there is no text confirming the (equally plausible) assumption that an uninitialized pointer value is meant to be an "invalid pointer" as well s a "singular iterator". So the hair-splitters among us might not accept "results of most expressions are undefined" as clarifying what qualifies as use of an invalid pointer.
C++11 has changed the text corresponding to 3.7.2.3p4 somewhat:
[3.7.4.2p4] ... Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an
invalid pointer value has implementation-defined behavior. [footnote: Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.]
(the text elided by the ellipsis is unchanged) We now have somewhat more clarity as to what is meant by "use of an invalid pointer value", and we can now say that the OP's code's semantics are definitely implementation-defined (but might be implementation-defined to be undefined). There is also a new paragraph in the discussion of iterators:
[24.2.1p10] An invalid iterator is an iterator that may be singular.
which confirms that "invalid pointer" and "singular iterator" are effectively the same thing. The remaining confusion in C++11 is largely about the exact circumstances that produce invalid/singular pointers/iterators; there should be a detailed chart of pointer/iterator lifecycle transitions (like there is for *values). And, as with C++98, the standard is defective to the extent it does not guarantee that copying-from and equality comparison upon such values are valid (not undefined).

Related

optimization based on alignment requirements

According to [basic.align]
Object types have alignment requirements ([basic.fundamental],
[basic.compound]) which place restrictions on the addresses at which
an object of that type may be allocated.
I would expect a C++ compiler is allowed to assume that any pointer to T points to a correctly aligned T.
e.g. (if alignof(int) == 4)
int* p = /* ... */
auto val = p[0];
if ((reinterpret_cast<std::uintptr_t>(p) & 3) != 0) {
// assumed to be unreachable
}
However, testing GCC and clang in compiler explorer, I don't see either of them making this assumption. Even if I dereference the pointer (which surely would cause UB if it was not correctly aligned) this pointer-value assumption is not made.
I recognise that the mapping of pointers to integer values is implementation defined, and that perhaps the compiler has a mapping other than what this code assumes (though, I don't think so).
I also recognise that there may be a lot of code in production would break by making this assumption, explaining why it's not currently done by these compilers.
My question is: According to the standard, would it be a valid assumption a compiler could make?
If not, please cite the standard!
EDIT: clarified that the pointer is dereferenced.
EDIT2: say alignof() (instead of sizeof())
Yes compilers may assume the pointer is properly aligned.
Let us assume for the time being there's a function is_aligned which tests whether the address pointed to by a pointer has proper alignment for its type.
int* p = /* ... */
auto val = p[0];
if (is_aligned(p)) {
// assumed to be unreachable
}
Will then be true. We deduce this from [basic.life]
The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained [...]
Thus no object that has improper alignment exists. It then follows that the pointer is either a null pointer, invalid or pointing to an object of another type.
Accessing through a null pointer is illegal, access through invalid pointers is also illegal, and so is access to an object of another type†, which makes access through misaligned pointers UB.
† with exception of unsigned char, char and std::byte

Pointers in c++ after delete

After reading many posts about this, I want to clarify the next point:
A* a = new A();
A* b = a;
delete a;
A* c = a; //illegal - I know it (in c++ 11)
A* d = b; //I suppose it's legal, is it true?
So the question is about using the value of copy of deleted pointer.
I've read, that in c++ 11 reading the value of a leads to undefined behaviour - but what about reading the value of b?
Trying to read the value of the pointer (note: this is different to
dereferencing it) causes implementation-defined behaviour since C++14,
which may include generating a runtime fault. (In C++11 it was
undefined behaviour)
What happens to the pointer itself after delete?
Both:
A* c = a;
A* d = b;
are undefined in C++11 and implementation defined in C++14. This is because a and b are both "invalid pointer values" (as they point to deallocated storage space), and "using an invalid pointer value" is either undefined or implementation defined, depending on the C++ version. ("Using" includes "copying the value of").
The relevant section ([basic.stc.dynamic.deallocation]/4) in C++11 reads (emphasis added):
If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value (4.10), the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage. The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined.
with a non-normative note stating:
On some implementations, it causes a system-generated runtime
In C++14 the same section reads:
If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value (4.10), the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
with a non-normative note stating:
Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault
These 2 lines do not have any difference (meaning legality for C++):
A* c = a; //illegal - I know it (in c++ 11)
A* d = b; //I suppose it's legal, is it true?
Your mistake (and it is pretty common) to think if you call delete on a it makes it any different than b. You should remember that when you call delete on a pointer you pass argument by value, so memory, where a points to after delete is not usable anymore, but that call does not make a any different than b in your example.
You should not use the pointer after delete. My below example with acessing a is based on implementation-defined behaviour.
(thanks to for M.M and Mankarse for pointing this)
I feel that it is not the variable a (or b, c, d) that is important here, but that the value (=the memory address of a deallocated block) which in some implementations can trigger a runtime fault when used in some 'pointer context'.
This value may be an rvalue/expression, not necessarily the value stored in a variable - so I do not believe the value of a ever changes (I am using the loose 'pointer context' to distinguish from using the same value, i.e. the same set of bits, in non-pointer related expressions - which will not cause a runtime fault).
------------My original post is below.---------------
Well, you are almost there with your experiment. Just add some cout's like here:
class A {};
A* a = new A();
A* b = a;
std::cout << a << std::endl; // <--- added here
delete a;
std::cout << a << std::endl; // <--- added here. Note 'a' can still be used!
A* c = a;
A* d = b;
Calling delete a does not do anything to the variable a. This is just a library call. The library that manages dynamic memory allocation keeps a list of allocated memory blocks and uses the value passed by variable a to mark one of the previously allocated blocks as freed.
While it is true what Mankarse cites from C++ documentation, about: "rendering invalid all pointers referring to any part of the deallocated storage" - note that the value of variable a remains untouched (you did not pass it by reference, but by value !).
So to sum up and to try to answer your question:
Variable a still exists in the scope after delete. The variable a still contains the same value, which is the address of the beginning of the memory block allocated (and now already deallocated) for an object of class A. This value of a technically can be used - you can e.g. print it like in my above example – however it is hard to find a more reasonable use for it than printing/logging the past...
What you should not do is trying to de-reference this value (which you also keep in variables b, c, and d) – as this value is not a valid memory pointer any longer.
You should never rely on the object being in the deallocated storage (while it is quite probable that it will remain there for some while, as C++ does not require to clear the storage freed after use) - you have no guarantees and no safe way to check this).

Use of a deleted pointer address

(*) As far as I know the Standard allows an implementation to modify the operand of the delete operator, however most implementations do not do that.
int* ptr = new int(0);
delete ptr; //delete is allowed to modify ptr, for example set it to 0
std::cout << ptr; // UB?
Acknowledging (*), is the reading of ptr (in the form of printing it) well-defined?
If delete does modify ptr, is it allowed to set a trap value, which would make reading ptr UB?
In C++14 this is implementation-defined behaviour, [basic.stc.dynamic.deallocation]/4:
If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value, the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage.
Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
There is a footnote:
Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.
This changed since C++11 where the bolded text said "undefined behaviour" and there was no footnote.
So to answer your question, delete ptr; is allowed to set a trap value that would cause a runtime fault for std::cout << ptr. The compiler documentation must specify the behaviour. This is a narrower restriction than UB in which case any unstable behaviour would be permissible.
In this example, std::cout << ptr is NOT undefined behavior by default, because ptr is not being dereferenced at all, so it doesn't matter what its value is actually set to.
By default, the STL does not define an operator<< for int* pointers. It defines:
an operator<< for (signed|unsignd) char*, used for printing null-terminated text.
a generic operator<< for void*, which simply prints the memory address itself that the pointer is set to, not the data that is being pointed at.
Since int* is implicitly convertible to void*, calling std::cin << ptr is actually calling operator<<(std::cin, (void*)ptr), and so prints the memory address as-is that ptr holds.
The code would have undefined behavior only if your app defines its own operator<< for int* and then tries to dereference the pointer after it has been deleted.

C++: Memory allocation

#include <iostream>
using namespace std;
int main()
{
int a[100],n;
cout<<&n<<" "<<&a[100]<<endl;
if(&n!=&a[100])
{
cout<<" What is wrong with C++?";
}
}
It prints the address of n and a[100] as same. But when I compare the two values in the if loop It says that they both are not equal.
What does this mean?
When I change the value of n, a[100] also changes so doesn't that mean n and a[100] are equal.
First, let's remember that there is no a[100]. It does not exist! If you tried to access a[100]'s "value" then, according to the abstract machine called C++, anything can happen. This includes blowing up the sun or dying my hair purple; now, I like my hair the way it is, so please don't!
Anyway, what you're doing is playing with the array's "one-past-the-end" pointer. You are allowed to obtain this pointer, which is a fake pointer of sorts, as long as you do not dereference it.
It is available only because the "101st element" would be "one past the end" of the array. (And there is debate as to whether you are allowed to write &a[100], rather than a+100, to get such a pointer; I am in the "no" camp.)
However, that still says nothing about comparing it to the address of some entirely different object. You cannot assume anything about the relative location of local variables in memory. You simply have no idea where n will live with respect to a.
The results you're observing are unpredictable, unreliable and meaningless side effects of the undefined behaviour exhibited by your program, caused by a combination of compiler optimisations and data locality.
For the array declaration int a[100], array index starts from 0 to 99 and when you try access the address of 101th element which is out of its range could overlap with next member (in your case its variable n) on the stack. However its undefined behavior.
For a bit if facts here is the relevant text from the specifications
Equality operator (==,!=)
Pointers to objects of the same type can be compared for equality with the 'intuitive' expected results:
From § 5.10 of the C++11 standard:
Pointers of the same type (after pointer conversions) can be compared for equality. Two pointers of the same type compare equal if > and only if they are both null, both point to the same function, or > both represent the same address (3.9.2).
(leaving out details on comparison of pointers to member and or the
null pointer constants - they continue down the same line of 'Do >
What I Mean':)
[...] If both operands are null, they compare equal. Otherwise if only one is null, they compare unequal.[...]
The most 'conspicuous' caveat has to do with virtuals, and it does seem to be the logical thing to expect too:
[...] if either is a pointer to a virtual member function, the result is unspecified. Otherwise they compare equal if and only if
they would refer to the same member of the same most derived object
(1.8) or the same subobject if they were dereferenced with a
hypothetical object of the associated class type. [...]
Maybe the problem could be that array of int is not the same as int!
by writing &a[100] you're invoking undefined behavior, since there is no element with index 100 in the array a. To get the same address safely, you could instead write a+100 (or &a[0]+100) in which case your program would be well-defined, but whether or not the if condition will hold cannot be predicted or relied upon on any implementation.

Can I safely create references to possibly invalid memory as long as I don't use it?

I want to parse UTF-8 in C++. When parsing a new character, I don't know in advance if it is an ASCII byte or the leader of a multibyte character, and also I don't know if my input string is sufficiently long to contain the remaining characters.
For simplicity, I'd like to name the four next bytes a, b, c and d, and because I am in C++, I want to do it using references.
Is it valid to define those references at the beginning of a function as long as I don't access them before I know that access is safe? Example:
void parse_utf8_character(const string s) {
for (size_t i = 0; i < s.size();) {
const char &a = s[i];
const char &b = s[i + 1];
const char &c = s[i + 2];
const char &d = s[i + 3];
if (is_ascii(a)) {
i += 1;
do_something_only_with(a);
} else if (is_twobyte_leader(a)) {
i += 2;
if (is_safe_to_access_b()) {
do_something_only_with(a, b);
}
}
...
}
}
The above example shows what I want to do semantically. It doesn't illustrate why I want to do this, but obviously real code will be more involved, so defining b,c,d only when I know that access is safe and I need them would be too verbose.
There are three takes on this:
Formally
well, who knows. I could find out for you by using quite some time on it, but then, so could you. Or any reader. And it's not like that's very practically useful.
EDIT: OK, looking it up, since you don't seem happy about me mentioning the formal without looking it up for you. Formally you're out of luck:
N3280 (C++11) §5.7/5 “If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.”
Two situations where this can produce undesired behavior: (1) computing an address beyond the end of a segment, and (2) computing an address beyond an array that the compiler knows the size of, with debug checks enabled.
Technically
you're probably OK as long as you avoid any lvalue-to-rvalue conversion, because if the references are implemented as pointers, then it's as safe as pointers, and if the compiler chooses to implement them as aliases, well, that's also ok.
Economically
relying needlessly on a subtlety wastes your time, and then also the time of others dealing with the code. So, not a good idea. Instead, declare the names when it's guaranteed that what they refer to, exists.
Before going into the legality of references to unaccessible memory, you have another problem in your code. Your call to s[i+x] might call string::operator[] with a parameter bigger then s.size(). The C++11 standard says about string::operator[] ([string.access], §21.4.5):
Requires: pos <= size().
Returns: *(begin()+pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.
This means that calling s[x] for x > s.size() is undefined behaviour, so the implementation could very well terminate your program, e.g. by means of an assertion, for that.
Since string is now guaranteed to be continous, you could go around that problem using &s[i]+x to get an address. In praxis this will probably work.
However, strictly speaking doing this is still illegal unfortunately. The reason for this is that the standard allows pointer arithmetic only as long as the pointer stays inside the same array, or one past the end of the array. The relevant part of the (C++11) standard is in [expr.add], §5.7.5:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Therefore generating references or pointers to invalid memory locations might work on most implementations, but it is technically undefined behaviour, even if you never dereference the pointer/use the reference. Relying on UB is almost never a good idea , because even if it works for all targeted systems, there are no guarantees about it continuing to work in the future.
In principle, the idea of taking a reference for a possibly illegal memory address is itself perfectly legal. The reference is only a pointer under the hood, and pointer arithmetic is legal until dereferencing occurs.
EDIT: This claim is a practical one, not one covered by the published standard. There are many corners of the published standard which are formally undefined behaviour, but don't produce any kind of unexpected behaviour in practice.
Take for example to possibility of computing a pointer to the second item after the end of an array (as #DanielTrebbien suggests). The standard says overflow may result in undefined behaviour. In practice, the overflow would only occur if the upper end of the array is just short of the space addressable by a pointer. Not a likely scenario. Even when if it does happen, nothing bad would happen on most architectures. What is violated are certain guarantees about pointer differences, which don't apply here.
#JoSo If you were working with a character array, you can avoid some of the uncertainty about reference semantics by replacing the const-references with const-pointers in your code. That way you can be certain no compiler will alias the values.