Will the program:
#include <stdio.h>
struct foo
{
void blah() {printf("blah\n");}
int i;
};
void main(int, char**)
{
((foo*)NULL)->blah();
}
Ever crash, or do anything other than output blah, on any compiler you are aware of? Will any function crash, when called via a NULL pointer, if it doesn't access any members (including the vtable)?
There have been other questions on this topic, for instance Accessing class members on a NULL pointer and Is it legal/well-defined C++ to call a non-static method that doesn't access members through a null pointer?, and it is always pointed out that this results in undefined behavior. But is this undefined in the real world, or only in the standard's world? Does any extant compiler not behave as expected? Can you think of any plausible reason why any future compiler wouldn't behave as expected?
What if the function does modify members, but the NULL ptr is guarded against. For instance,
void foo::blah()
{
foo* pThis = this ? this : new foo();
pThis->i++;
}
Edit:
For the record, the reason I wanted this was to make the interface to my linked list class as easy and concise as possible. I wanted to initialize the list to NULL have idiomatic usage look like:
pList = pList->Insert(elt);
pList = pList->Remove(elt);
...
Where all the operators return the new head element. Somehow I didn't realize that using a container class would make things even easier, with no downside.
Can you think of any plausible reason why any future compiler wouldn't behave as expected?
A helpful compiler might add code to access the real object under the hood in debug builds in the hope of helping you catch this issue in your code early in the development cycle.
What if the function does modify members, but the NULL ptr is guarded against. For instance,
void foo::blah()
{
foo* pThis = this ? this : new foo();
pThis->i++;
}
Since it is undefined behavior to call that function with a null pointer, the compiler can assume that the test will always pass and optimize that function to:
void foo::blah()
{
this->i++;
}
Note that this is correct, since if this is not null, it behaves as-if the original code was executed, and if this was null, it would be undefined behavior and the compiler does not need to provide any particular behavior at all.
Undefined behavior means you can't rely on what will happen. However it's sometimes useful to know what's happening under the covers while you're debugging so that you're not surprised when the impossible happens.
Most compilers will code this as a simple function with a hidden this parameter, and if the this parameter is never referenced the code will work as expected.
Checking for this == NULL might not work, depending on how aggressively your compiler optimizes. Since a well formed program couldn't possibly have this==NULL, the compiler is free to pretend that it will never happen and optimize away the if statement entirely. I know though that Microsoft's C++ will not make this optimization because their GetSafeHWND function relies on it working as expected.
Trying to guard for this == NULL wouldn't give you any real desirable effect. Mainly dereferencing NULL pointer, AFAIK, is undefined. It works differently for different compilers. Let's say that it does work in one scenario (like this) it doesn't work for this scenarios or this (virtual functions). The second and third scenarios are understandable, since the instance doesn't have a vtable entry to check for which of the virtual functions to call. But I'm not sure the same can be said for the first.
The other thing that you need to consider is that any invalid pointer can also give the same type of error you'd want to guard against, like this. Notice that it successfully printed 'Foo' and then went into a runtime error trying to access a. This is because the memory location being pointed to by Test* t is invalid. The same behavior is seen here, when Test* t is NULL.
So, in general, avoid such behaviors and design in your code. It's not predictable and it would cause undesirable effect if someone comes after you and changes your code thinking it should behave as it did previously.
Related
As I understand Valgrind should report errors when code contains usage of uninitialized variables. In this toy example below printer is uninitialized, but program "happily" prints message anyway.
#include <iostream>
class Printer {
public:
void print() {
std::cout<<"I PRINT"<<std::endl;
}
};
int main() {
Printer* printer;
printer->print();
};
When I test this program with Valgrind it doesn't report any errors.
Is it expected behavior? And if yes, why so?
The variable is actually never used.
The method call is inlined1, so the variable is not passed as an argument.
The method itself doesn't use this in any way, so the variable is not used at all.
Above is independent of turning optimizations on or off.
As a matter of fact, in optimized code the variable will never exist at all - not even as memory allocation.
Question about a similar case: Extern variable only in header unexpectedly working, why?
.
1 All methods defined in the class body are inlined by default.
Is it an Undefined Behavior?
Yes it is. Calling the method requires this to point at an actual, initialized intance of object to be well-formed. As Nir Friedman points out, compiler is free to assume that and optimize on that base (and IIRC this kind of optimizations can happen even with -O0!).
I'd personally expect the specific code in question to work in any practical conditions (as the pointer value is really irrelevant), but I would never rely on that. You should fix your code right now.
Detection
To detect usage of uninitialized variables in Clang/GCC, use option -Wuninitialized (or simply use -Wall, which includes this flag).
-Wuninitialized should mostly cover use of stack-allocated memory, though I guess some use of stack-allocated arrays may still slip. Some compilers may support including extra runtime checks for uninitialized reads with -fsanitize=... options, like -fsanitize=memory in Clang (thx, chtz). These checks should cover the edge cases as well as the use of heap-allocated memory.
The main() function has undefined behaviour, since printer is uninitialised and the statement printer->print() both accesses the value of printer and dereferences it via -> and the call of the member function.
Practically, however, a compiler is permitted to handle undefined behaviour by simply assuming it is not present. Then the compiler can, if it chooses, follow a chain of logic;
When it sees a statement like printer->print() this means it is allowed to reason that printer has a value that can be accessed and dereferenced without introducing undefined behaviour.
Based on this reasoning, it is then permitted to assume that printer must have been initialised (by some means invisible to the compiler) to point at a valid object.
Based on this assumption, it can reason that the statement printer->print() will result in a call of Printer::print().
Since the compiler can see the definition of Printer::print(), it can simply inline it, and execute the statement std::cout<<"I PRINT"<<std::endl.
Since it doesn't need to access printer at all to produce that output, it can optimise out any reference to the variable named printer in main().
If a compiler follows the above sequence of logic, the program will simply print I PRINT and exit, without accessing any memory in a way that might trigger a report from Valgrind.
If you think the above sounds far-fetched, then you are mistaken. LLVM/Clang is one compiler that notionally follows a chain of logic very similar to what I have described. For more information have a look at the the LLVM Project Blog link to first article, second article, and
third article.
In my early days with C++, I seem to recall you could call a member function with a NULL pointer, and check for that in the member function:
class Thing {public: void x();}
void Thing::x()
{ if (this == NULL) return; //nothing to do
...do stuff...
}
Thing* p = NULL; //nullptr these days, of course
p->x(); //no crash
Doing this may seem silly, but it was absolutely wonderful when writing recursive functions to traverse data structures, where navigating could easily run into the blind alley of a NULL; navigation functions could do a single check for NULL at the top and then blithely call themselves to try to navigate deeper without littering the code with additional checks.
According to g++ at least, the freedom (if it ever existed) has been revoked. The compiler warns about it, and if compiling optimized, it causes crashes.
Question 1: does the C++ standard (any flavor) disallow a NULL this? Or is g++ just getting in my face?
Question 2. More philosophically, why? 'this' is just another pointer. The glory of pointers is that they can be nullptr, and that's a useful condition.
I know I can get around this by making static functions, passing as first parameter a pointer to the data structure (hellllo Days of C) and then check the pointer. I'm just surprised I'd need to.
Edit: To upvote an answer I'd like to see chapter and verse from the standard on why this is disallowed. Note that my example at NO POINT dereferences NULL. Nothing is virtual here, and p is copied to "argument this" but then checked before use. No defererence occurs! so dereference of NULL can't be used as a claim of UB.
People are making a knee-jerk reaction to *p and assuming it isn't valid if p is NULL. But it is, and the evidence is here:
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232
In fact it calls out two cases when a pointer, p, is surprisingly valid as *p: when p is null or when p points one element past the end of an array. What you must never do is USE the value of *p... other than to take the address of it. &*p where p == nullptr for any pointer type p IS valid. It's fine to point out that p->x() is really (*p).x(), but at the end of the day that translates to x(&*p) and that is perfectly well formed and valid. For p=nullptr... it simply becomes x(nullptr).
I think my debate should be with the standards community; in their haste to undercut the concept of a null reference, they left wording unclear. Since no one here has demanded p->x() is UB without trying to demand that it's UB because *p is UB; and because *p is definitely not UB because no aspect of x() uses the referenced value, I'm going to put this down to g++ overreaching on a standard ambiguity. The absolutely identical mechanism using a static function and extra parameter is well defined, so it's not like it stops my refactor effort. Consider the question withdrawn; portable code can't assume this==nullptr will work but there's a portable solution available, so in the end it doesn't matter.
To be in a situation where this is nullptr implies you called a non-static member function without using a valid instance such as with a pointer set to nullptr. Since this is forbidden, to obtain a null this you must already be in undefined behavior. In other words, this is never nullptr unless you have undefined behavior. Due to the nature of undefined behavior, you can simplify the statement to simply be "this is never nullptr" since no rule needs to be upheld in the presence of undefined behavior.
Question 1: does the C++ standard (any flavor) disallow a NULL this?
Or is g++ just getting in my face?
The C++ standard disallows it -- calling a method on a NULL pointer is officially 'undefined behavior' and you must avoid doing it or you will get bit. In particular, optimizers will assume that the this-pointer is non-NULL when making optimizations, leading to strange/unexpected behaviors at runtime (I know this from experience :))
Question 2. More philosophically, why? 'this' is just another pointer.
The glory of pointers is that they can be nullptr, and that's a useful
condition.
I'm not sure it matters, really; it's what is specified in the C++ standard, and they probably had their reasons (philosophical or otherwise), but since the standard specifies it, the compilers expect it, therefore as programmers we have to abide by it, or face undefined behavior. (One can imagine an alternate universe where NULL this-pointers are allowed, but we don't live there)
The question has already been answered - it is undefined behavior to dereference a null pointer, and using *obj or obj-> are both dereferencing.
Now (since I assume you have a question on how to work around this) the solution is to use static function:
class Foo {
static auto bar_st(Foo* foo) { if (foo) return foo->bar(); }
}
Having said that, I do think that gcc's decision of eliminating all branches for nullptr this was not a wise one. Nobody gained by that, and a lot of people suffered. What's the benefit?
C++ does not allow calling member functions of null object. Objects need identity and that can not be stored to null pointer. What would happen if member function would read or write a field of a object referenced by null pointer?
It sounds like you could use null object pattern in your code to create wanted result.
Null pointer is recognised a problematic entity in object oriented languages because in most languages it is not a object. This creates a need for code that specifically handles the case something being null. While checking for special null pointer is the norm. There are other approaches. Smalltalk actually has a NullObject which has methods its own methods. As all objects it can also be extended. Go programming language does allow calling struct member functions for something that is nil (which sounds like something required in the question).
this might be null too if you delete this (which is possible but not recommended)
CallingClass::CallingFunc()
{
SomeClass obj;
obj.Construct(*Singleton::GetInstance()); // passing the listener
// Singleton::GetInstance() returns a static pointer.
//Singleton is derived from IListener
}
SomeClass::Construct(const IListener &listener)
{
IListener* pListener = const_cast<IListener*>(&listener);
}
After const_cast pListener is null.
Is it possible to perform such typecasting?
Thanks
So let me see. You have two-phase initialization, a Singleton, and casting away const, and you're de-referencing an object just to take it's address again? A stray NULL pointer is the least of your concerns, my friend.
Throw it away and write it again from scratch. And pick up a C++ book first.
Just so you know, const_cast cannot produce a null pointer unless it was passed one. GetInstance() must be returning NULL to produce this behaviour, which is formally UB as soon as you de-reference it.
const_cast is basically an instruction to the compiler to ignore the constness of something. Use of it is to be avoided, because you are overriding the compiler protection, and it can lead to a crash as you write something that attempts to update read-only memory.
However, it doesn't actually cause any code to be generated.
Therefore, if this:
IListener* pListener = const_cast<IListener*>(&listener);
results in pListener being NULL, then &listener is NULL, which is impossible (or you are returning a null reference for your singleton, or you are missing something out from your description of the problem).
Having said which I agree strongly with the answer from DeadMG.
Creating an empty object and doing an Init on it (2-phase construction) is to be avoided. Properly created objects should be valid, and if you have an Init method, it isn't.
Removing the constness from anything is to be avoided - it is extremely likely to produce surprising behaviour.
The amount of de-and-rereferencing in that code is going to give anyone a headache.
Two questions:
What are you trying to acheive here?
How much control have you got over the code? (i.e. what are you able to change?)
Without wishing to be unkind I would honestly say that it might be better to start again. There are a couple of issues I would have with this code:
Firstly, the Singleton pattern should ensure that only one of a specific object is ever created, therefore it is usually returned by pointer, reference or some derivative thereof (i.e. boost shared pointer etc.) It need not necessarily be const though and the fact that it is here indicates that the author did not intend it to be used in a non-const way.
Second, you're then passing this object by reference into a function. No need. That's the one of the major features (and drawbacks) of the singleton pattern: You can access it from anywhere. So you could just as easily write:
SomeClass::Construct()
{
IListener* pListener = const_cast<IListener*>(*Singleton::GetInstance());
}
Although this still doesn't really help you. One thing it does do is make your interface a bit clearer. You see, when you write SomeClass::Construct(const IListener&listener) anyone reading your could reasonably imply that listener is treated as const within the function and by using const_cast, you've broken that implied contract. This is a very good reason that you should not use const_cast - at least not in these circumstances.
The fundamental question that you need to ask yourself is when your IListener is const, why do you need to use it in a non-const way within Construct? Either the singleton should not return a const object or your function should not need it to be non-const.
This is a design issue that you need to sort out before you take any further steps.
I was hoping that someone could clarify exactly what is meant by undefined behaviour in C++. Given the following class definition:
class Foo
{
public:
explicit Foo(int Value): m_Int(Value) { }
void SetValue(int Value) { m_Int = Value; }
private:
Foo(const Foo& rhs);
const Foo& operator=(const Foo& rhs);
private:
int m_Int;
};
If I've understood correctly the two const_casts to both a reference and a pointer in the following code will remove the const-ness of the original object of type Foo, but any attempts made to modify this object through either the pointer or the reference will result in undefined behaviour.
int main()
{
const Foo MyConstFoo(0);
Foo& rFoo = const_cast<Foo&>(MyConstFoo);
Foo* pFoo = const_cast<Foo*>(&MyConstFoo);
//MyConstFoo.SetValue(1); //Error as MyConstFoo is const
rFoo.SetValue(2); //Undefined behaviour
pFoo->SetValue(3); //Undefined behaviour
return 0;
}
What is puzzling me is why this appears to work and will modify the original const object but doesn't even prompt me with a warning to notify me that this behaviour is undefined. I know that const_casts are, broadly speaking, frowned upon, but I can imagine a case where lack of awareness that C-style cast can result in a const_cast being made could occur without being noticed, for example:
Foo& rAnotherFoo = (Foo&)MyConstFoo;
Foo* pAnotherFoo = (Foo*)&MyConstFoo;
rAnotherFoo->SetValue(4);
pAnotherFoo->SetValue(5);
In what circumstances might this behaviour cause a fatal runtime error? Is there some compiler setting that I can set to warn me of this (potentially) dangerous behaviour?
NB: I use MSVC2008.
I was hoping that someone could clarify exactly what is meant by undefined behaviour in C++.
Technically, "Undefined Behaviour" means that the language defines no semantics for doing such a thing.
In practice, this usually means "don't do it; it can break when your compiler performs optimisations, or for other reasons".
What is puzzling me is why this appears to work and will modify the original const object but doesn't even prompt me with a warning to notify me that this behaviour is undefined.
In this specific example, attempting to modify any non-mutable object may "appear to work", or it may overwrite memory that doesn't belong to the program or that belongs to [part of] some other object, because the non-mutable object might have been optimised away at compile-time, or it may exist in some read-only data segment in memory.
The factors that may lead to these things happening are simply too complex to list. Consider the case of dereferencing an uninitialised pointer (also UB): the "object" you're then working with will have some arbitrary memory address that depends on whatever value happened to be in memory at the pointer's location; that "value" is potentially dependent on previous program invocations, previous work in the same program, storage of user-provided input etc. It's simply not feasible to try to rationalise the possible outcomes of invoking Undefined Behaviour so, again, we usually don't bother and instead just say "don't do it".
What is puzzling me is why this appears to work and will modify the original const object but doesn't even prompt me with a warning to notify me that this behaviour is undefined.
As a further complication, compilers are not required to diagnose (emit warnings/errors) for Undefined Behaviour, because code that invokes Undefined Behaviour is not the same as code that is ill-formed (i.e. explicitly illegal). In many cases, it's not tractible for the compiler to even detect UB, so this is an area where it is the programmer's responsibility to write the code properly.
The type system — including the existence and semantics of the const keyword — presents basic protection against writing code that will break; a C++ programmer should always remain aware that subverting this system — e.g. by hacking away constness — is done at your own risk, and is generally A Bad Idea.™
I can imagine a case where lack of awareness that C-style cast can result in a const_cast being made could occur without being noticed.
Absolutely. With warning levels set high enough, a sane compiler may choose to warn you about this, but it doesn't have to and it may not. In general, this is a good reason why C-style casts are frowned upon, but they are still supported for backwards compatibility with C. It's just one of those unfortunate things.
Undefined behaviour depends on the way the object was born, you can see Stephan explaining it at around 00:10:00 but essentially, follow the code below:
void f(int const &arg)
{
int &danger( const_cast<int&>(arg);
danger = 23; // When is this UB?
}
Now there are two cases for calling f
int K(1);
f(k); // OK
const int AK(1);
f(AK); // triggers undefined behaviour
To sum up, K was born a non const, so the cast is ok when calling f, whereas AK was born a const so ... UB it is.
Undefined behaviour literally means just that: behaviour which is not defined by the language standard. It typically occurs in situations where the code is doing something wrong, but the error can't be detected by the compiler. The only way to catch the error would be to introduce a run-time test - which would hurt performance. So instead, the language specification tells you that you mustn't do certain things and, if you do, then anything could happen.
In the case of writing to a constant object, using const_cast to subvert the compile-time checks, there are three likely scenarios:
it is treated just like a non-constant object, and writing to it modifies it;
it is placed in write-protected memory, and writing to it causes a protection fault;
it is replaced (during optimisation) by constant values embedded in the compiled code, so after writing to it, it will still have its initial value.
In your test, you ended up in the first scenario - the object was (almost certainly) created on the stack, which is not write protected. You may find that you get the second scenario if the object is static, and the third if you enable more optimisation.
In general, the compiler can't diagnose this error - there is no way to tell (except in very simple examples like yours) whether the target of a reference or pointer is constant or not. It's up to you to make sure that you only use const_cast when you can guarantee that it's safe - either when the object isn't constant, or when you're not actually going to modify it anyway.
What is puzzling me is why this appears to work
That is what undefined behavior means.
It can do anything including appear to work.
If you increase your optimization level to its top value it will probably stop working.
but doesn't even prompt me with a warning to notify me that this behaviour is undefined.
At the point it were it does the modification the object is not const. In the general case it can not tell that the object was originally a const, therefore it is not possible to warn you. Even if it was each statement is evaluated on its own without reference to the others (when looking at that kind of warning generation).
Secondly by using cast you are telling the compiler "I know what I am doing override all your safety features and just do it".
For example the following works just fine: (or will seem too (in the nasal deamon type of way))
float aFloat;
int& anIntRef = (int&)aFloat; // I know what I am doing ignore the fact that this is sensable
int* anIntPtr = (int*)&aFloat;
anIntRef = 12;
*anIntPtr = 13;
I know that const_casts are, broadly speaking, frowned upon
That is the wrong way to look at them. They are a way of documenting in the code that you are doing something strange that needs to be validated by smart people (as the compiler will obey the cast without question). The reason you need a smart person to validate is that it can lead to undefined behavior, but the good thing you have now explicitly documented this in your code (and people will definitely look closely at what you have done).
but I can imagine a case where lack of awareness that C-style cast can result in a const_cast being made could occur without being noticed, for example:
In C++ there is no need to use a C style cast.
In the worst case the C-Style cast can be replaced by reinterpret_cast<> but when porting code you want to see if you could have used static_cast<>. The point of the C++ casts is to make them stand out so you can see them and at a glance spot the difference between the dangerous casts the benign casts.
A classic example would be trying to modify a const string literal, which may exist in a protected data segment.
Compilers may place const data in read only parts of memory for optimization reasons and attempt to modify this data will result in UB.
Static and const data are often stored in another part of you program than local variables. For const variables, these areas are often in read-only mode to enforce the constness of the variables. Attempting to write in a read-only memory results in an "undefined behavior" because the reaction depends on your operating system. "Undefined beheavior" means that the language doesn't specify how this case is to be handled.
If you want a more detailed explanation about memory, I suggest you read this. It's an explanation based on UNIX but similar mecanism are used on all OS.
Ok, so I know that technically this is undefined behavior, but nonetheless, I've seen this more than once in production code. And please correct me if I'm wrong, but I've also heard that some people use this "feature" as a somewhat legitimate substitute for a lacking aspect of the current C++ standard, namely, the inability to obtain the address (well, offset really) of a member function. For example, this is out of a popular implementation of a PCRE (Perl-compatible Regular Expression) library:
#ifndef offsetof
#define offsetof(p_type,field) ((size_t)&(((p_type *)0)->field))
#endif
One can debate whether the exploitation of such a language subtlety in a case like this is valid or not, or even necessary, but I've also seen it used like this:
struct Result
{
void stat()
{
if(this)
// do something...
else
// do something else...
}
};
// ...somewhere else in the code...
((Result*)0)->stat();
This works just fine! It avoids a null pointer dereference by testing for the existence of this, and it does not try to access class members in the else block. So long as these guards are in place, it's legitimate code, right? So the question remains: Is there a practical use case, where one would benefit from using such a construct? I'm especially concerned about the second case, since the first case is more of a workaround for a language limitation. Or is it?
PS. Sorry about the C-style casts, unfortunately people still prefer to type less if they can.
The first case is not calling anything. It's taking the address. That's a defined, permitted, operation. It yields the offset in bytes from the start of the object to the specified field. This is a very, very, common practice, since offsets like this are very commonly needed. Not all objects can be created on the stack, after all.
The second case is reasonably silly. The sensible thing would be to declare that method static.
I don't see any benefit of ((Result*)0)->stat(); - it is an ugly hack which will likely break sooner than later. The proper C++ approach would be using a static method Result::stat() .
offsetof() on the other hand is legal, as the offsetof() macro never actually calls a method or accesses a member, but only performs address calculations.
Everybody else has done a good job of reiterating that the behavior is undefined. But lets pretend it wasn't, and that p->member is allowed to behave in a consistent manner under certain circumstances even if p isn't a valid pointer.
Your second construct would still serve almost no purpose. From a design perspective, you've probably done something wrong if a single function can do its job both with and without accessing members, and if it can then splitting the static portion of the code into a separate, static function would be much more reasonable than expecting your users to create a null pointer to operate on.
From a safety perspective, you've only protected against a small portion of the ways an invalid this pointer can be created. There's uninitialized pointers, for starters:
Result* p;
p->stat(); //Oops, 'this' is some random value
There's pointers that have been initialized, but are still invalid:
Result* p = new Result;
delete p;
p->stat(); //'this' points to "safe" memory, but the data doesn't belong to you
And even if you always initialize your pointers, and absolutely never accidentally reuse free'd memory:
struct Struct {
int i;
Result r;
}
int main() {
((Struct*)0)->r.stat(); //'this' is likely sizeof(int), not 0
}
So really, even if it weren't undefined behavior, it is worthless behavior.
Although libraries targeting specific C++ implementations may do this, that doesn't mean it's "legitimate" generally.
This works just fine! It avoids a null
pointer dereference by testing for the
existence of this, and it does not try
to access class members in the else
block. So long as these guards are in
place, it's legitimate code, right?
No, because although it might work fine on some C++ implementations, it is perfectly okay for it to not work on any conforming C++ implementation.
Dereferencing a null-pointer is undefined behavior and anything can happen if you do it. Don't do it if you want a program that works.
Just because it doesn't immediately crash in one specific test case doesn't mean that it won't get you into all kinds of trouble.
Undefined behaviour is undefined behaviour. Do this tricks "work" for your particular compiler? well, possibly. will they work for the next iteration of it. or for another compiler? Possibly not. You pays your money and you takes your choice. I can only say that in nearly 25 years of C++ programming I've never felt the need to do any of these things.
Regarding the statement:
It avoids a null pointer dereference by testing for the existence of this, and it does not try to access class members in the else block. So long as these guards are in place, it's legitimate code, right?
The code is not legitimate. There is no guarantee that the compiler and/or runtime will actually call to the method when the pointer is NULL. The checking in the method is of no help because you can't assume that the method will actually end up being called with a NULL this pointer.