After reading many posts about this, I want to clarify the next point:
A* a = new A();
A* b = a;
delete a;
A* c = a; //illegal - I know it (in c++ 11)
A* d = b; //I suppose it's legal, is it true?
So the question is about using the value of copy of deleted pointer.
I've read, that in c++ 11 reading the value of a leads to undefined behaviour - but what about reading the value of b?
Trying to read the value of the pointer (note: this is different to
dereferencing it) causes implementation-defined behaviour since C++14,
which may include generating a runtime fault. (In C++11 it was
undefined behaviour)
What happens to the pointer itself after delete?
Both:
A* c = a;
A* d = b;
are undefined in C++11 and implementation defined in C++14. This is because a and b are both "invalid pointer values" (as they point to deallocated storage space), and "using an invalid pointer value" is either undefined or implementation defined, depending on the C++ version. ("Using" includes "copying the value of").
The relevant section ([basic.stc.dynamic.deallocation]/4) in C++11 reads (emphasis added):
If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value (4.10), the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage. The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined.
with a non-normative note stating:
On some implementations, it causes a system-generated runtime
In C++14 the same section reads:
If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value (4.10), the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
with a non-normative note stating:
Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault
These 2 lines do not have any difference (meaning legality for C++):
A* c = a; //illegal - I know it (in c++ 11)
A* d = b; //I suppose it's legal, is it true?
Your mistake (and it is pretty common) to think if you call delete on a it makes it any different than b. You should remember that when you call delete on a pointer you pass argument by value, so memory, where a points to after delete is not usable anymore, but that call does not make a any different than b in your example.
You should not use the pointer after delete. My below example with acessing a is based on implementation-defined behaviour.
(thanks to for M.M and Mankarse for pointing this)
I feel that it is not the variable a (or b, c, d) that is important here, but that the value (=the memory address of a deallocated block) which in some implementations can trigger a runtime fault when used in some 'pointer context'.
This value may be an rvalue/expression, not necessarily the value stored in a variable - so I do not believe the value of a ever changes (I am using the loose 'pointer context' to distinguish from using the same value, i.e. the same set of bits, in non-pointer related expressions - which will not cause a runtime fault).
------------My original post is below.---------------
Well, you are almost there with your experiment. Just add some cout's like here:
class A {};
A* a = new A();
A* b = a;
std::cout << a << std::endl; // <--- added here
delete a;
std::cout << a << std::endl; // <--- added here. Note 'a' can still be used!
A* c = a;
A* d = b;
Calling delete a does not do anything to the variable a. This is just a library call. The library that manages dynamic memory allocation keeps a list of allocated memory blocks and uses the value passed by variable a to mark one of the previously allocated blocks as freed.
While it is true what Mankarse cites from C++ documentation, about: "rendering invalid all pointers referring to any part of the deallocated storage" - note that the value of variable a remains untouched (you did not pass it by reference, but by value !).
So to sum up and to try to answer your question:
Variable a still exists in the scope after delete. The variable a still contains the same value, which is the address of the beginning of the memory block allocated (and now already deallocated) for an object of class A. This value of a technically can be used - you can e.g. print it like in my above example – however it is hard to find a more reasonable use for it than printing/logging the past...
What you should not do is trying to de-reference this value (which you also keep in variables b, c, and d) – as this value is not a valid memory pointer any longer.
You should never rely on the object being in the deallocated storage (while it is quite probable that it will remain there for some while, as C++ does not require to clear the storage freed after use) - you have no guarantees and no safe way to check this).
Related
Arrays of any type are implicit-lifetime objects, and it is possible to to begin the lifetime of implicit-lifetime object, without beginning the lifetime of its subobjects.
As far as I am aware, the possibility to create arrays without beginning the lifetime of their elements in a way that doesn't result in UB, was one of the motivations for implicit-lifetime objects, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html.
Now, what is the proper way to do it? Is allocating memory and returning a pointer to array is enough? Or there is something else one needs to be aware of?
Namely, is this code valid and does it create an array with uninitialized members, or we still have UB?
// implicitly creates an array of size n and returns a pointer to it
auto arrPtr = reinterpret_cast<T(*)[]>(::operator new(sizeof(T) * n, std::alignval_t{alignof(T)}) );
// is there a difference between reinterpret_cast<T(*)[]> and reinterpret_cast<T(*)[n]>?
auto arr = *arrPtr; // de-reference of the result in previous line.
The question can be restated as follows.
According to https://en.cppreference.com/w/cpp/memory/allocator/allocate, the allocate function function creates an array of type T[n] in the storage and starts its lifetime, but does not start lifetime of any of its elements.
A simple question - how is it done? (ignoring the constexpr part, but I wouldn't mind if constexpr part is explained in the answer as well).
PS: The provided code is valid (assuming it is correct) for c++20, but not for earlier standards as far as I am aware.
I believe that an answer to this question should answer two similar questions I have asked earlier as well.
Arrays and implicit-lifetime object
creation.
Is it possible to allocatate uninialized array in a way that does
not result in
UB.
EDIT: I am adding few code snippets, to make my question more clear. I would appreciate an answer explaining which one are valid and which ones are not.
PS: feel free to replace malloc with aligned version, or ::operator new variation. As far as I am aware it doesn't matter.
Example #1
T* allocate_array(std::size_t n)
{
return reinterpret_cast<T*>( malloc(sizeof(T) * n) );
// does it return an implicitly constructed array (as long as
// subsequent usage is valid) or a T* pointer that does not "point"
// to a T object that was constructed, hence UB
// Edit: if we take n = 1 in this example, and T is not implicit-lifetime
// type, then we have a pointer to an object that has not yet been
// constructed and and doesn't have implicit lifetime - which is bad
}
Example #2.
T* allocate_array(std::size_t n)
{
// malloc implicitly constructs - reinterpet_cast should a pointer to
// suitably created object (a T array), hence, no UB here.
T(*)[] array_pointer = reinterpret_cast<T(*)[]>(malloc(sizeof(T) * n) );
// The pointer in the previous line is a pointer to valid array, de-reference
// is supposed to give me that array
T* array = *array_pointer;
return array;
}
Example #3 - same as 2 but size of array is known.
T* allocate_array(std::size_t n)
{
// malloc implicitly constructs - reinterpet_cast should a pointer to
// suitably created object (a T array), hence, no UB here.
T(*)[n] n_array_pointer = reinterpret_cast<T(*)[n]>(malloc(sizeof(T) * n) );
// The pointer in the previous line is a pointer to valid array, de-reference
// is supposed to give me that array
T* n_array = *n_array_pointer;
return n_array;
}
Are any of these valid?
The answer
While wording of the standard is not 100% clear, after reading the paper more carefully, the motivation is to make casts to T* legal and not casts to T(*)[]. Dynamic construction of arrays. Also, the changes to the standard by the authors of the paper imply that the cast should be to T* and not to T(*)[]. Hence, the accepting the answer by Nicol Bolas as the correct answer for my question.
The whole point of implicit object creation is that it is implicit. That is, you don't do anything to get it to happen. Once IOC occurs on a piece of memory, you may use the memory as if the object in question exists, and so long as you do that, your code works.
When you get your T* back from allocator_traits<>::allocate, if you add 1 to the pointer, then the function has returned an array of at least 1 element (the new pointer could be the past-the-end pointer for the array). If you add 1 again, then the function has returned an array of at least 2 elements. Etc. None of this is undefined behavior.
If you do something inconsistent with this (casting to a different pointer type and acting as though there is an array there), or if you act as though the array extends beyond the size of the storage that IOC applies to, then you get UB.
So allocator_traits::allocate doesn't really have to do anything, so long as the memory that the allocator allocated implicitly creates objects.
// does it return an implicitly constructed array (as long as
// subsequent usage is valid) or a T* pointer that does not "point"
// to a T object that notconstructed, hence UB
Neither. It returns a pointer (to type T) to storage into which objects may have been implicitly created already. Which objects have been implicitly created depends on how you use this storage. And merely doing a cast doesn't constitute "using" the storage.
It isn't the reinterpret_cast that causes UB; it's using the pointer returned by an improper reinterpret_cast that's the problem. And since IOC works based on the operation that would have caused UB, IOC doesn't care what you cast the pointer to.
Part and parcel of the IOC rules is the corollary "suitable created object" rule. This rule says that certain operations (like malloc and operator new) return a pointer to a "suitable created object". Essentially it's back to quantum superposition: if IOC retroactively creates an object to make your code work, then these functions retroactively returns a pointer to whichever object that was created that makes your code work.
So if your code uses the pointer as a T* and does pointer arithmetic on that pointer, then malloc returned a pointer to the first element of an array of Ts. How big is that array? That depends: how big was the allocation, and how far did you do your pointer arithmetic? Does it have live Ts in them? That depends: do you try to access any Ts in the array?
Let's say we have this legacy code from C++98:
bool expensiveCheck();
struct Foo;
bool someFunc()
{
Foo *ptr = 0;
if( expensiveCheck() )
ptr = new Foo;
// doing something irrelevant here
...
if( ptr ) {
// using foo
}
delete ptr;
return ptr; // here we have UB(Undefined Behavior) in C++11
}
So basically pointer here is used to keep dynamically allocated data and use it as a flag at the same time. For me it is readable code and I believe it is legal C++98 code. Now according to this questions:
Pointers in c++ after delete
What happens to the pointer itself after delete?
this code has UB in C++11. Is it true?
If yes another question comes in mind, I heard that committee puts significant effort not to break existing code in new standard. If I am not mistaken in this case this not true. What is the reason? Is such code considered harmfull already so nobody cares it would be broken? They did not think about consequences? This optimization is so important? Something else?
Your example exhibits undefined behavior under C++98, too. From the C++98 standard:
[basic.stc.dynamic.deallocation]/4 If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value (4.10), the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage. The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined.33)
Footnote 33) On some implementations, it causes a system-generated runtime fault.
Let's consider below program:
int main ()
{
int *p, *r;
p = (int*)malloc(sizeof(int));
cout<<"Addr of p = "<<p <<endl;
cout<<"Value of p = "<<*p <<endl;
free(p);
cout<<"After free(p)"<<endl;
r = (int*)malloc(sizeof(int));
cout<<"Addr of r = "<<r <<endl;
cout<<"Value of r = "<<*r <<endl;
*p = 100;
cout<<"Value of p = "<<*p <<endl;
cout<<"Value of r = "<<*r <<endl;
return 0;
}
Output:
Addr of p = 0x2f7630
Value of p = 3111728
free(p)
Addr of r = 0x2f7630
Value of r = 3111728
*p = 100
Value of p = 100
Value of r = 100
In the above code, p and r are dynamically created.
p is created and freed. r is created after p is freed.
On changing the value in p, r's value also gets changed. But I have already freed p's memory, then why on changing p's value, r's value also gets modified with the same value as that of p?
I have come to below conclusion. Please comment if I am right?
Explanation:
Pointer variables p and q are dynamically declared. Garbage values are stored initially. Pointer variable p is freed/deleted. Another pointer variable r is declared. The addresses allocated for r is same as that of p (p still points to the old address). Now if the value of p is modified, r’s value also gets modified with the same value as that of p (since both variables are pointing to the same address).
The operator free() only frees the memory address from the pointer variable and returns the address to the operating system for re-use, but the pointer variable (p in this case) still points to the same old address.
The free() function and the delete operator do not change the content of a pointer, as the pointer is passed by value.
However, the stuff in the location pointed to by the pointer may not be available after using free() or delete.
So if we have memory location 0x1000:
+-----------------+
0x1000 | |
| stuff in memory |
| |
+-----------------+
Lets assume that the pointer variable p contains 0x1000, or points to the memory location 0x1000.
After the call to free(p), the operating system is allowed to reuse the memory at 0x1000. It may not use it immediately or it could allocate the memory to another process, task or program.
However, the variable p was not altered, so it still points to the memory area. In this case, the variable p still has a value, but you should not dereference (use the memory) because you don't own the memory any more.
Your analysis is superficially close in some ways but not correct.
p and r are defined to be pointers in the first statement of main(). The are not dynamically created. They are defined as variables of automatic storage duration with main(), so they cease to exist when (actually if, in the case of your program) main() returns.
It is not p that is created and freed. malloc() dynamically allocates memory and, if it succeeds, returns a pointer which identifies that dynamically allocated memory (or a NULL pointer if the dynamic allocation fails) but does not initialise it. The value returned by malloc() is (after conversion into a pointer to int, which is required in C++) assigned to p.
Your code then prints the value of p.
(I have highlighted the next para in italic, since I'll refer back to it below).
The next statement prints the value of *p. Doing that means accessing the value at the address pointed to by p. However, that memory is uninitialised, so the result of accessing *p is undefined behaviour. With your implementation (compiler and library), at this time, that happens to result in a "garbage value", which is then printed. However, that behaviour is not guaranteed - it could actually do anything. Different implementations could give different results, such as abnormal termination (crash of your program), reformatting a hard drive, or [markedly less likely in practice] playing the song "Crash" by the Primitives through your computer's loud speakers.
After calling free(p) your code goes through a similar sequence with the pointer r.
The assignment *p = 100 has undefined behaviour, since p holds the value returned by the first malloc() call, but that has been passed to free(). So, as far as your program is concerned, that memory is no longer guaranteed to exist.
The first cout statement after that accesses *p. Since p no longer exists (having being passed to free()) that gives undefined behaviour.
The second cout statement after that accesses *r. That operation has undefined behaviour, for exactly the same reason I described in the italic paragraph above (for p, as it was then).
Note, however, that there have been five occurrences of undefined behaviour in your code. When even a single instance of undefined behaviour occurs, all bets are off for being able to predict behaviour of your program. With your implementation, the results happen to be printing p and r with the same value (since malloc() returns the same value 0x2f7630 in both cases), printing a garbage value in both cases, and then (after the statement *p = 100) printing the value of 100 when printing *p and *r.
However, none of those results are guaranteed. The reason for no guarantee is that the meaning of "undefined behaviour" in the C++ standard is that the standard describes no limits on what is permitted, so an implementation is free to do anything. Your analysis might be correct, for your particular implementation, at the particular time you compiled, linked, and ran your code. It might even be correct next week, but be incorrect a month from now after updating your standard library (e.g. applying bug fixes). It is probably incorrect for other implementations.
Lastly, a couple of minor points.
Firstly, your code is incomplete, and would not even compile in the form you have described it. In discussion above, I have assumed your code is actually preceded by
#include <iostream>
#include <cstdlib>
using namespace std;
Second, malloc() and free() are functions in the standard library. They are not operators.
Your analysis of what actually happened is correct; however, the program is not guaranteed to behave this way reliably. Every use of p after free(p) "provokes undefined behavior". (This also happens when you access *p and *r without having written anything there first.) Undefined behavior is worse than just producing an unpredictable result, and worse than just potentially causing the program to crash, because the compiler is explicitly allowed to assume that code that provokes undefined behavior will never execute. For instance, it would be valid for the compiler to treat your program as identical to
int main() {}
because there is no control flow path in your program that does not provoke undefined behavior, so it must be the case that the program will never run at all!
free() frees the heap memory to be re-used by OS. But the contents present in the memory address are not erased/removed.
I wonder why I would need the second version?
int* p; // version 1
int* p = new int; // version 2
In the first version, the pointer isn't pointing at anything, it is undefined. Version 2 allocated memory and points p to that new memory. You are not allocating space for the pointer itself but memory for the pointer to point at. (In both versions the pointer itself is on the stack)
Assuming that the code appears in a function:
The first one defines a local variable of type int* (that is, a pointer). The variable is not initialized, which means the pointer doesn't have a value. It doesn't point at anything. It's nearly useless, about the only thing you can do with it is assign a pointer value to it[*]. So you think to yourself, "can I hold off defining the variable until I have a value to assign to it?"
The second one defines a local variable of type int* (that is a pointer), and also dynamically allocates an object of type int and assigns the address of that object to the pointer variable. So the pointer points to the int.
Dynamically allocating one int is nearly always a bad idea. It's not useless in the sense that you do at least have an int and a means to access it. But you've created a problem for yourself in that you have to keep track of it and free it.
[*] other things you can do with an uninitialized int* variable: take the address of the variable; bind it to a reference of type int*&; convert the address of the variable to char* and examine the memory one byte at a time, just to see what your implementation has put in that uninitialized variable. Nothing exciting and, crucially, nothing involving any int objects because you have none.
The first pointer
The first pointer, declared as:
int* p;
only allocates the memory needed to to store a pointer to int. The actual size is implementation defined. What the p object contains is indeterminate as per 8.5/12:
If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17).
This means that dereferencing the pointer will lead to undefined behavior.
The second pointer
The second pointer, declared as:
int* p = new int;
dynamically allocates an int. This means that the lifetime of the object will terminate either at the exit of the program (not sure if the standard actually enforces this, but I'm pretty sure the underlying OS will take back the unused memory once the program is done executing) or when you free it.
This pointer can be dereferenced safely, unless operator new failed to allocate memory (in which case it will throw std::bad_alloc or, since C++11, another exception derived from std::bad_alloc).
Why the second pointer shouldn't be used in most cases
Memory management is an hard topic. The main tip that I can give you, is to avoid new and delete like a plague. Whenever you can do something in any other standard way, you should prefer it.
For example, in this case, the only reason I can come up with to justify such a technique is to have an optional parameter. You could, and should, std::optional instead.
what happens when you dereference a pointer when passing by reference to a function?
Here is a simple example
int& returnSame( int &example ) { return example; }
int main()
{
int inum = 3;
int *pinum = & inum;
std::cout << "inum: " << returnSame(*pinum) << std::endl;
return 0;
}
Is there a temporary object produced?
Dereferencing the pointer doesn't create a copy; it creates an lvalue that refers to the pointer's target. This can be bound to the lvalue reference argument, and so the function receives a reference to the object that the pointer points to, and returns a reference to the same. This behaviour is well-defined, and no temporary object is involved.
If it took the argument by value, then that would create a local copy, and returning a reference to that would be bad, giving undefined behaviour if it were accessed.
The Answer To Your Question As Written
No, this behavior is defined. No constructors are called when pointer types are dereferenced or reference types used, unless explicitly specified by the programmer, as with the following snippet, in which the new operator calls the default constructor for the int type.
int* variable = new int;
As for what is really happening, as written, returnSame(*pinum) is the same variable as inum. If you feel like verifying this yourself, you could use the following snippet:
returnSame(*pinum) = 10;
std::cout << "inum: " << inum << std::endl;
Further Analysis
I'll start by correcting your provided code, which it doesn't look like you tried to compile before posting it. After edits, the one remaining error is on the first line:
int& returnSame( int &example ) { return example; } // semi instead of colon
Pointers and References
Pointers and references are treated in the same way by the compiler, they differ in their use, not so much their implementation. Pointer types and reference types store, as their value, the location of something else. Pointer dereferencing (using the * or -> operators) instructs the compiler to produce code to follow the pointer and perform the operation on the location it refers to rather than the value itself. No new data is allocated when you dereference a pointer (no constructors are called).
Using references works in much the same way, except the compiler automatically assumes that you want the value at the location rather than the location itself. As a matter of fact, it is impossible to refer to the location specified by a reference in the same way pointers allow you to: once assigned, a reference cannot be reseated (changed) (that is, without relying on undefined behavior), however you can still get its value by using the & operator on it. It's even possible to have a NULL reference, though handling of these is especially tricky and I don't recommend using them.
Snippet analysis
int *pinum = & inum;
Creates a pointer pointing to an existing variable, inum. The value of the pointer is the memory address that inum is stored in. Creating and using pointers will NOT call a constructor for a pointed-to object implicitly, EVER. This task is left to the programmer.
*pinum
Dereferencing a pointer effectively produces a regular variable. This variable may conceptually occupy the same space that another named variable uses, or it may not. in this case, *pinum and inum are the same variable. When I say "produces", it's important to note than no constructors are called. This is why you MUST initialize pointers before using them: Pointer dereferencing will NEVER allocate storage.
returnSame(*pinum)
This function takes a reference and returns the same reference. It's helpful to realize that this function could be written with pointers as well, and behave exactly the same way. References do not perform any initialization either, in that they do not call constructors. However, it is illegal to have an uninitialized reference, so running into uninitialized memory through them is not as common a mistake as with pointers. Your function could be rewritten to use pointers in the following way:
int* returnSamePointer( int *example ) { return example; }
In this case, you would not need to dereference the pointer before passing it, but you would need to dereference the function's return value before printing it:
std::cout << "inum: " << *(returnSamePointer(pinum)) << std::endl;
NULL References
Declaring a NULL reference is dangerous, since attempting to use it will automatically attempt to dereference it, which will cause a segmentation fault. You can, however, safely check if a reference is a null reference. Again, I highly recommend not using these ever.
int& nullRef = *((int *) NULL); // creates a reference to nothing
bool isRefNull = (&nullRef == NULL); // true
Summary
Pointer and Reference types are two different ways to accomplish the same thing
Most of the gotchas that apply to one apply to the other
Neither pointers nor references will call constructors or destructors for referenced values implicitly under any circumstances
Declaring a reference to a dereferenced pointer is perfectly legal, as long as the pointer is initialized properly
A compiler doesn't "call" anything. It just generates code. Dereferencing a pointer would at the most basic level correspond to some sort of load instruction, but in the present code the compiler can easily optimize this away and just print the value directly, or perhaps shortcut directly to loading inum.
Concerning your "temporary object": Dereferencing a pointer always gives an lvalue.
Perhaps there's a more interesting question hidden in your question, though: How does the compiler implement passing function arguments as references?