main.cpp:
#include <iostream>
static constexpr bool f1()
{
auto p = new int(1);
delete p;
auto q = new int(2);
delete q;
return p == q;
}
static bool f2() // Same body as f1
{
auto p = new int(1);
delete p;
auto q = new int(2);
delete q;
return p == q;
}
int main()
{
constexpr bool i1 = f1();
std::cout << i1 << std::endl;
auto i2 = f2();
std::cout << i2 << std::endl;
}
Compilation command line:
clang++ -std=c++20 -pedantic-errors main.cpp -o prog
Output from running prog (this is what I got, but may be different for you):
0
1
How is this possible? How is it even possible that I am allowed to define f1 that way given that it has unspecified behaviour?
Deleting a pointer invalidates it.
Any use of an invalid pointer value has implementation-defined behavior (except for indirecting through and passing to a deallocation function, which have undefined behaviour; neither is done in the example).
In the example that behaviour happened to be different in two slightly different cases.
How is this possible?
The compiler produced a program that outputs "0\n1". It is possible.
If you want to know if this conforms to the standard: Yes.
Whether this is intentional by the implementation... I suspect not directly, but rather by coincidence. My entirely hypothetical guess about the implementation:
There may be a piece of logic that sets invalid pointers to null. This has the useful side-effect that programs that have "use after free" bug (undefined behaviour) are less likely to read/write arbitrary memory (heap smashing) and instead avoid that due to null pointer check, or outright crash due to indirecting through null pointer. This potentially would reduce the severity of security vulnerabilities caused by such bug. As a side-effect, two unspecified values that would happen to be null pointers would also happen to compare equal.
But in constexpr case there may be another piece of logic which analyses that the pointers never point to same object and therefore are never equal, and constant-fold the return value as false before the null "protection" occurs.
Standard quote:
[basic.stc]
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.31
Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.
тое
Related
C++20 allows heap allocation in constexpr functions as long as the memory does not leak. However GCC and Clang disagree on whether comparing the addresses of two dynamically allocated objects is a constant expression or not.
The following snippet can be compiled with Clang but not gcc.
constexpr bool foo() {
int* a = new int(4);
int* b = new int(4);
bool result = a == b;
delete a;
delete b;
return result;
}
constexpr bool x = foo(); // GCC: error: '(((int*)(& heap deleted)) == ((int*)(& heap deleted)))' is not a constant expression
The following works fine on both compilers
constexpr bool foo2() {
int a = 4;
int b = 5;
bool result = &a == &b;
return result;
}
constexpr bool x = foo2();
I'd assume that in order to delete the dynamic objects correctly the compiler must know whether the pointers point to the same objects or not, so I'd assume this is a GCC bug (or not yet fully implemented). Can anyone confirm this assumption? Or am I wrong?
Live example here.
Edit: Strangely, when I open the live example through the link provided, it suddenly compiles on gcc, too. But if I copy-paste it to a new compiler explorer instance, it fails again. Or if I reload it multiple times it fails every second time and compiles every other second time...
This is a gcc bug (#85428).
There's nothing in [expr.const]/5 that would cause evaluation of a == b to fail to be a constant expression. The only one there in which there is any question would be the one about undefined behavior. So we could go look at [expr.eq] to see what that says about pointer comparison:
If at least one of the operands is a pointer, pointer conversions, function pointer conversions, and qualification conversions are performed on both operands to bring them to their composite pointer type.
Comparing pointers is defined as follows:
If one pointer represents the address of a complete object, and another pointer represents the address one past the last element of a different complete object, the result of the comparison is unspecified.
Otherwise, if the pointers are both null, both point to the same function, or both represent the same address, they compare equal.
Otherwise, the pointers compare unequal.
Both pointers represent the address of different complete objects, with neither being null, so so we fall into the third bullet point and the pointers just compare unequal. No undefined or unspecified behavior here.
a == b should just yield false.
C++20 allows heap allocation in constexpr functions as long as the memory does not leak. However GCC and Clang disagree on whether comparing the addresses of two dynamically allocated objects is a constant expression or not.
The following snippet can be compiled with Clang but not gcc.
constexpr bool foo() {
int* a = new int(4);
int* b = new int(4);
bool result = a == b;
delete a;
delete b;
return result;
}
constexpr bool x = foo(); // GCC: error: '(((int*)(& heap deleted)) == ((int*)(& heap deleted)))' is not a constant expression
The following works fine on both compilers
constexpr bool foo2() {
int a = 4;
int b = 5;
bool result = &a == &b;
return result;
}
constexpr bool x = foo2();
I'd assume that in order to delete the dynamic objects correctly the compiler must know whether the pointers point to the same objects or not, so I'd assume this is a GCC bug (or not yet fully implemented). Can anyone confirm this assumption? Or am I wrong?
Live example here.
Edit: Strangely, when I open the live example through the link provided, it suddenly compiles on gcc, too. But if I copy-paste it to a new compiler explorer instance, it fails again. Or if I reload it multiple times it fails every second time and compiles every other second time...
This is a gcc bug (#85428).
There's nothing in [expr.const]/5 that would cause evaluation of a == b to fail to be a constant expression. The only one there in which there is any question would be the one about undefined behavior. So we could go look at [expr.eq] to see what that says about pointer comparison:
If at least one of the operands is a pointer, pointer conversions, function pointer conversions, and qualification conversions are performed on both operands to bring them to their composite pointer type.
Comparing pointers is defined as follows:
If one pointer represents the address of a complete object, and another pointer represents the address one past the last element of a different complete object, the result of the comparison is unspecified.
Otherwise, if the pointers are both null, both point to the same function, or both represent the same address, they compare equal.
Otherwise, the pointers compare unequal.
Both pointers represent the address of different complete objects, with neither being null, so so we fall into the third bullet point and the pointers just compare unequal. No undefined or unspecified behavior here.
a == b should just yield false.
#include <iostream>
struct X
{
bool isNull() { return this == nullptr; }
bool isNullConst() const { return this == nullptr; }
};
bool isNull(X& x) { return &x == nullptr; }
bool isNullConst(const X& x) { return &x == nullptr; }
// always false or exception.
bool isNullCopy(X x) { return &x == nullptr; }
int main()
{
X* x = nullptr;
std::cout << x->isNull() << '\n';
std::cout << (*x).isNull() << '\n';
std::cout << isNull(*x) << '\n';
// std::cout << isNull2(*x) << '\n'; // exception.
}
Here, I know that X::isNull() is equivalent to isNull(X&) and that X::isNullConst() is equivalent to isNullConst(const X&).
What I did not know is that it's normal to dereference a null pointer. I thought that any dereferencing for a null pointer would result in an exception.
After playing with pointers for a bit, I concluded that dereferencing a null pointer itself is not the problem, the problem is trying to read or write to the address pointed to by the null pointer.
And since the functions are in a well known location in memory, dereferencing a null pointer to a class and calling one of its functions will just result in calling the function with the null object as the first parameter.
That was new to me, but that's probably not the complete picture.
I thought at first that this was an OOP concept at first, thus it should work in java for example, but it didn't work here and caused an exception (which makes me think why it doesn't work in java?...):
class X
{
boolean isNull() { return this == null; }
}
public class Main {
public static void main(String[] args) {
X x = null;
System.out.println(x.isNull());
}
}
So, clearly this is something related to C++ and not OOP in general.
What are all of the situations under which dereferencing a null pointer will be valid and won't cause exceptions?
Is there something else other than pointers of structs and classes that can be dereferenced successfully even if they're null pointers?
Also, why is calling a function of a null pointer without accessing its fields raises an exception in other languages like java?
One case where dereferencing a null pointer makes sense is in Red-Black trees for example. Null pointers are considered to be black.
#define RED true
#define BLACK false;
struct Node
{
bool color;
bool isRed()
{
return this != nullptr && this->color == RED;
}
};
bool isRed(Node* node)
{
return node != nullptr && node->color == RED;
}
Here, I believe it makes more sense to include the function in the Node class itself since it's related to it. It's not very convenient to include all of the logic related to the node inside it except for the one that checks for it being null.
I thought that any dereferencing for a null pointer would result in an exception.
No. Dereferencing a null pointer is undefinded behavior in C++.
C++ is not Java. C++ does have exceptions, but they are only for exceptional casses, not used all over the place (as in Java). You are supposed to know that dereferencing a null pointer is not allowed, and a compiler assumes that it never happens in correct code. If it still happens your code is invalid.
Read about undefined behavior. It is essential to know about it when you want to do anything serious in C++.
What are the rules for a valid dereferencing of a null pointer?
The rule is: You shall not do it. When you do it your code is ill-formed no diagnostics required. This is a different way to say: Your code has undefined behavior. The compiler is not reuqired to issue an error or warning and when you ask a compiler to compile your wrong code the result can be anything.
In Java your object declarations are references. So you can deliver a null reference to a method and it won't harm since the method can check if the reference points to a null object.
But calling a method onto a null reference won't work because the method is called upon the object behind the reference. Since it is null, the method can't be called onto any object so a NullpointerException is thrown.
What are the rules for a valid dereferencing of a null pointer [in C++]?
C++ standard is actually somewhat non-specific about whether indirecting through a null pointer is valid by itself or not. It is not disallowed explicitly. The standard used to use "dereferencing the null pointer" as an example of undefined behaviour, but this example has since been removed.
There is an active core language issue CWG-232 titled "Is indirection through a null pointer undefined behavior?" where this is discussed. It has a proposed change of wording to explicitly allow indirection through a null pointer, and even to allow "empty" references in the language. The issue was created 20 years ago, has last been updated 15 years ago, when the proposed wording was found insufficient.
Here are a few examples:
X* ptr = nullptr;
*ptr;
Above, the result of the indirection is discarded. This is a case where standard is not explicit about its validity one way or another. The proposed wording would have allowed this explicitly. This is also a fairly pointless operation.
X& x = *ptr;
X* ptr2 = &x; // ptr2 == nullptr?
Above, the result of indirection through null is bound to an lvalue. This is explicitly undefined behaviour now, but the proposed wording would have allowed this.
ptr->member_function();
Above, the result of indirection goes through lvalue-to-rvalue conversion. This has undefined behaviour regardless of what the function does, and would remain undefined in the proposed resolution of CWG-232. Same applies to all of your examples.
One consequence of this is that return this == nullptr; can be optimised to return false; because this can never be null in a well defined program.
Dereferencing a nullptr in C++ is an undefined behaviour, so technically anything can happen when you try to dereference a nullptr (and I mean: anything :)).
This code:
#include <iostream>
using namespace std;
int* fun()
{
int a = 5;
int* pointerA = &a;
cout << pointerA << endl;
return pointerA;
}
int main()
{
int* p = fun();
cout << p << endl;
return 0;
}
Prints the following:
0x[some address]
0
I understand that the variable a is deallocated when the function fun() returns, but why does cout << p << endl; return 0? Shouldn't it still point to the same address in memory, even though variable is technically not there anymore? Is this a compiler feature or undefined behavior?
repro case
EDIT: I found the culprit. I am using CodeBlocks, and in this project's build options, there is a flag "optimize even more (for speed) [-O2]". If it is checked, I get 0, and if I uncheck the flag, I get the same address 0x[some address], which is expected behavior.
I apologize for not mentioning my IDE.
Accessing the return value of fun has implementation-defined behavior, as it returns an invalid pointer value (see the quote below, why). In particular platforms, it may even generate a runtime fault. So, p's value is implementation-defined as well. Most likely, it will became invalid pointer value, so accessing it is implementation-defined.
basic.std/4:
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
It is probably a compiler feature. In this case it is very easy to see that the pointer returned by fun will be invalid and thus further usage of the pointer will result in undefined behaviour. If you try a different compiler it might be different. E.g. for me in Visual Studio 2012 it does return the actual address instead of 0.
Variable introduces the identificator which is associated with the memory section. Now, if we have a reference to that variable, then the identificator of the reference is also associated with the memory section of that variable? I created an image to better explain what I mean;
int variable = 0;
int &rVariable = variable;
Now, after the above code has been executed the result might look like the scheme below?
So we can think of reference as being as abstraction based on pointers. Thus, any operation (including taking the address of the reference) will actually be applied on the actual object that is referenced. Compiler may allocate memory for storing the reference. But it's not guaranteed.
Your question is quite broad and thus hard to answer correctly. In concept a reference is an alias to the object. How a specific compiler implements this concept can vary, especially after optimization. In many cases the implementation essentailly uses a pointer to the object, in which case the reference requires memory and what is stored there is probably just the address of the object.
struct S1{
int & ri;
S1(int &i):ri{i}{}
};
In this case I would expect the compiler to need memory in order to hold the reference.
struct S2{
int i;
int& ri;
S2():ri{i}{}
};
In this case the reference may not need memory.
At the end of the day one cannot rely on the way a particular compiler in a particular version handles references.
Whether you think of a reference as an alias or as a pointer which must always point to an object (i.e. never nullptr) and always points to the same object (i.e. const) and is automatically dereferenced in your own understanding of a piece of code is probably not important.
Although one can create a "null reference" like
int *i= nullptr;
int& ri=*i;
this is undefined behavior as explained here: Assigning a reference by dereferencing a NULL pointer
Undefined behavior means anything can happen, it might seem that on your compiler the reference still acts like a null pointer, however here are a few edxamples where the compiler could act quite oddly (and is allowed to with no error or warning)
int i;
//...
if(&i) {} //always true
else {
//optimized away
}
int* i;
//...
if(i){}
else{
//not optimized away
}
int *i = nullptr;
int& ri = *i;
//...
if(&ri){} //could be assumed always true!
else{
//could be optimized away !!
}
int *i =nullptr;
*i++; //runtime error, convention says user should have checked for nullness
int& ri = i;
ri++; //probably runtime error, user does not know to check for nullness and may not be able to because of optimization assuming &ri ! nullptr
struct S{
int i[1001];
};
S* s = nullptr;
S& rs = *s;
if(&rs){ //could be assume always true
rs.i[1000] = 4; //may not result in runtime error, address could be 0x4000,
// probably still not valid on a pc, on an embedded processor you could be
// changing clock speed or something nasty even though you checked for
// nullness!!!! Undefined behavior sucks!
}