Let's say whe have
class Foo{
public:
bool error;
......
bool isValid(){return error==false;}
};
and somewhere
Foo *aFoo=NULL;
I usually would do if (aFoo!=NULL && aFoo->isValid()).....
But what if in the isValid method I test the nullity:
bool isValid(){return this!=NULL && error==false)
That would simplify the external testing with simply calling if (aFoo->isValid())
I've tested it in some compilers and it works but I wonder if it is standard and could cause problems when porting to other environments.
The compiler is free to optimize away the check -- calling any non-static member of any class through an invalid (or NULL pointer) is undefined behavior. Please don't do this.
Why not simply a namespace-scope function like this?
bool isValid(Foo* f) {return f && f->isValid();}
An if-Statement like
if (aFoo->isValid())
Implies that the pointer is pointing to a valid object. It would be a huge source of confusion and very error prone.
Finally, your code would indeed invoke undefined behavior - aFoo->isValid is per definition equivalent to (*aFoo).isValid:
N3337, §5.2.5/2
The expression E1->E2 is converted to the equivalent form
(*(E1)).E2;
which would dereference a null pointer to obtain a null reference, which is clearly undefined:
N3337, §8.3.2/5
[ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the
“object” obtained by dereferencing a null pointer, which causes
undefined behavior. […] — end note ]
Generally it would be bad design and in standard C++ it doesn't make much sense as your internal NULL check implies that you would call a null pointer, which is undefined behavior.
This topic was discusses here:
Checking if this is null
Related
It is, as far as I have known, been a good rule that a pointer like argument type to a function should be a pointer if the argument can sensible be null and it should be a reference if the argument should never be null.
Based on that "rule", I have naiively expected that doing something like
someMethodTakingAnIntReference(*aNullPointer) would fail when trying to make the call, but to my surprise the following code is running just fine which kinda makes "the rule" less usable. A developer can still read meaning from the argument type being reference, but the compiler doesn't help and the location of the runtime error does not either.
Am I misunderstanding the point of this rule, or is this undefined behavior, or...?
int test(int& something1, int& something2)
{
return something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs fine returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
While the above works, obviously the following does not, but the same would be true if the references were just pointers; meaning that the rule and the compiler is not really helping.
int test(int& something1, int& something2)
{
return something1+something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
Writing test(*i1, *i2) causes undefined behaviour; specifically the part *i1. This is covered in the C++ Standard by [expr.unary.op]/1:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
This defines the behaviour of *X only for the case where X points to an object or function. Since i1 does not point to an object or function, the standard does not define the behaviour of *i1, therefore it is undefined behaviour. (This is sometimes known as "undefined by omission", and this same practice handles many other uses of lvalues that don't designate objects).
As described in the linked page, undefined behaviour does not necessitate any sort of diagnostic message. The runtime behaviour could literally be anything. The compiler could, but is not required to, generate a compilation warning or error. In general, it's up to the programmer to comply with the rules of the language. The compiler helps out to some extent but it cannot cover all cases.
You're better off thinking of references as little more than a handy notation for pointers.
They are still pointers, and the runtime error occurs when you use (dereference) a null pointer, not when you pass it to a function.
(An added advantage of references is that they can not be changed to reference something else, once initialized.)
I know that casting away const-ness should be done with care, and any attempt to remove const-ness from an initially const object followed by modifying the object results in undefined behaviour. What if we want to remove const-ness so that we may invoke a non-const function which doesn't modify the object? I know that we should actually mark such function const, but suppose I'm using a "bad" code that doesn't have the const version available.
So, to summarize, is the code below "safe"? My guess is that as long as you don't end up modifying the object you're ok, but I'm not 100% sure.
#include <iostream>
struct Foo
{
void f() // doesn't modify the instance, although is not marked const
{
std::cout << "Foo::f()" << std::endl;
}
};
int main()
{
const Foo foo;
const_cast<Foo&>(foo).f(); // is this safe?
}
Undefined behavior with respect to const_cast is defined by the C++11 standard's §3.8/9 (§3.8 is “Object lifetime”):
” Creating a new object at the storage location that a const object with static, thread, or automatic storage duration occupies or, at the storage location that such a const object used to occupy before its lifetime ended results in undefined behavior.
and §7.1.6.1/4 (§7.1.6.1 is “The cv-qualifiers”)
” Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const
object during its lifetime (3.8) results in undefined behavior.
In other words, you get UB if you modify an originally const object, and otherwise 1not.
The const_cast itself doesn't introduce UB.
There is additionally a non-normative note at §5.2.11/7 that “depending on the type” a write through the pointer or reference obtained from a const_cast, may have undefined behavior.
This non-normative note is so wooly that it has its own non-normative footnote, that explains that “const_cast is not limited to conversions that cast away a const-qualifier.”.
However, still with that clarification I fail to think of any case where the write could be well-defined or not depending on the type, i.e., I fail to make sense of this note. Two other answers here focus on the word “write” in this note, and that's necessary to get into UB-land via §3.8/9, yes. The to me rather fishy aspect is the “depending on the type”, which appears to be the significant part of that note.
1) Except insofar as UB-rules about other non-const_cast-related things get into play, e.g. nulling a pointer that's later dereferenced in a context other than as typeid-expression.
This particular example happens to be safe (have well defined behavior) because there is no write to an object which is declared const.
We have this in [dcl.type.cv]:
Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const object during its lifetime (3.8) results in undefined behavior.
And there is a note (non-normative) in [expr.const.cast] which states:
[Note: Depending on the type of the object, a write operation through the pointer, lvalue or pointer
to data member resulting from a const_cast that casts away a const-qualifier may produce undefined
behavior (7.1.6.1). —end note ]
An attempt to modify an object or a write operation after a const_cast [may] result in undefined behavior. Here, we have no write operation.
In one of the projects I'm working on, I'm seeing this code
struct Base {
virtual ~Base() { }
};
struct ClassX {
bool isHoldingDerivedObj() const {
return typeid(1 ? *m_basePtr : *m_basePtr) == typeid(Derived);
}
Base *m_basePtr;
};
I have never seen typeid used like that. Why does it do that weird dance with ?:, instead of just doing typeid(*m_basePtr)? Could there be any reason? Base is a polymorphic class (with a virtual destructor).
EDIT: At another place of this code, I'm seeing this and it appears to be equivalently "superfluous"
template<typename T> T &nonnull(T &t) { return t; }
struct ClassY {
bool isHoldingDerivedObj() const {
return typeid(nonnull(*m_basePtr)) == typeid(Derived);
}
Base *m_basePtr;
};
I think it is an optimisation! A little known and rarely (you could say "never") used feature of typeid is that a null dereference of the argument of typeid throws an exception instead of the usual UB.
What? Are you serious? Are you drunk?
Indeed. Yes. No.
int *p = 0;
*p; // UB
typeid (*p); // throws
Yes, this is ugly, even by the C++ standard of language ugliness.
OTOH, this does not work anywhere inside the argument of typeid, so adding any clutter will cancel this "feature":
int *p = 0;
typeid(1 ? *p : *p); // UB
typeid(identity(*p)); // UB
For the record: I am not claiming in this message that automatic checking by the compiler that a pointer is not null before doing a dereference is necessarily a crazy thing. I am only saying that doing this check when the dereference is the immediate argument of typeid, and not elsewhere, is totally crazy. (Maybe is was a prank inserted in some draft, and never removed.)
For the record: I am not claiming in the previous "For the record" that it makes sense for the compiler to insert automatic checks that a pointer is not null, and to to throw an exception (as in Java) when a null is dereferenced: in general, throwing an exception on a null dereference is absurd. This is a programming error so an exception will not help. An assertion failure is called for.
The only effect I can see is that 1 ? X : X gives you X as an rvalue instead of plain X which would be an lvalue. This can matter to typeid() for things like arrays (decaying to pointers) but I don't think it would matter if Derived is known to be a class. Perhaps it was copied from someplace where the rvalue-ness did matter? That would support the comment about "cargo cult programming"
Regarding the comment below I did a test and sure enough typeid(array) == typeid(1 ? array : array), so in a sense I'm wrong, but my misunderstanding could still match the misunderstanding that lead to the original code!
This behaviour is covered by [expr.typeid]/2 (N3936):
When typeid is applied to a glvalue expression whose type is a polymorphic class type, the result refers to a std::type_info object representing the type of the most derived object (that is, the dynamic type) to which the glvalue refers. If the glvalue expression is obtained by applying the unary * operator to a pointer and the pointer is a null pointer value, the typeid expression throws an exception of a type that would match a handler of type std::bad_typeid exception.
The expression 1 ? *p : *p is always an lvalue. This is because *p is an lvalue, and [expr.cond]/4 says that if the second and third operand to the ternary operator have the same type and value category, then the result of the operator has that type and value category also.
Therefore, 1 ? *m_basePtr : *m_basePtr is an lvalue with type Base. Since Base has a virtual destructor, it is a polymorphic class type.
Therefore, this code is indeed an example of "When typeid is applied to a glvalue expression whose type is a polymorphic class type" .
Now we can read the rest of the above quote. The glvalue expression was not "obtained by applying the unary * operator to a pointer" - it was obtained via the ternary operator. Therefore the standard does not require that an exception be thrown if m_basePtr is null.
The behaviour in the case that m_basePtr is null would be covered by the more general rules about dereferencing a null pointer (which are a bit murky in C++ actually but for practical purposes we'll assume that it causes undefined behaviour here).
Finally: why would someone write this? I think curiousguy's answer is the most plausible suggestion so far: with this construct, the compiler does not have to insert a null pointer test and code to generate an exception, so it is a micro-optimization.
Presumably the programmer is either happy enough that this will never be called with a null pointer, or happy to rely on a particular implementation's handling of null pointer dereference.
I suspect some compiler was, for the simple case of
typeid(*m_basePtr)
returning typeid(Base) always, regardless of the runtime type. But turning it to an expression/temporary/rvalue made the compiler give the RTTI.
Question is which compiler, when, etc. I think GCC had problems with typeid early on, but it is a vague memory.
Is this piece of code valid (and defined behavior)?
int &nullReference = *(int*)0;
Both g++ and clang++ compile it without any warning, even when using -Wall, -Wextra, -std=c++98, -pedantic, -Weffc++...
Of course the reference is not actually null, since it cannot be accessed (it would mean dereferencing a null pointer), but we could check whether it's null or not by checking its address:
if( & nullReference == 0 ) // null reference
References are not pointers.
8.3.2/1:
A reference shall be initialized to
refer to a valid object or function.
[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior. As
described in 9.6, a reference cannot
be bound directly to a bit-field. ]
1.9/4:
Certain other operations are described
in this International Standard as
undefined (for example, the effect of
dereferencing the null pointer)
As Johannes says in a deleted answer, there's some doubt whether "dereferencing a null pointer" should be categorically stated to be undefined behavior. But this isn't one of the cases that raise doubts, since a null pointer certainly does not point to a "valid object or function", and there is no desire within the standards committee to introduce null references.
The answer depends on your view point:
If you judge by the C++ standard, you cannot get a null reference because you get undefined behavior first. After that first incidence of undefined behavior, the standard allows anything to happen. So, if you write *(int*)0, you already have undefined behavior as you are, from a language standard point of view, dereferencing a null pointer. The rest of the program is irrelevant, once this expression is executed, you are out of the game.
However, in practice, null references can easily be created from null pointers, and you won't notice until you actually try to access the value behind the null reference. Your example may be a bit too simple, as any good optimizing compiler will see the undefined behavior, and simply optimize away anything that depends on it (the null reference won't even be created, it will be optimized away).
Yet, that optimizing away depends on the compiler to prove the undefined behavior, which may not be possible to do. Consider this simple function inside a file converter.cpp:
int& toReference(int* pointer) {
return *pointer;
}
When the compiler sees this function, it does not know whether the pointer is a null pointer or not. So it just generates code that turns any pointer into the corresponding reference. (Btw: This is a noop since pointers and references are the exact same beast in assembler.) Now, if you have another file user.cpp with the code
#include "converter.h"
void foo() {
int& nullRef = toReference(nullptr);
cout << nullRef; //crash happens here
}
the compiler does not know that toReference() will dereference the passed pointer, and assume that it returns a valid reference, which will happen to be a null reference in practice. The call succeeds, but when you try to use the reference, the program crashes. Hopefully. The standard allows for anything to happen, including the appearance of pink elephants.
You may ask why this is relevant, after all, the undefined behavior was already triggered inside toReference(). The answer is debugging: Null references may propagate and proliferate just as null pointers do. If you are not aware that null references can exist, and learn to avoid creating them, you may spend quite some time trying to figure out why your member function seems to crash when it's just trying to read a plain old int member (answer: the instance in the call of the member was a null reference, so this is a null pointer, and your member is computed to be located as address 8).
So how about checking for null references? You gave the line
if( & nullReference == 0 ) // null reference
in your question. Well, that won't work: According to the standard, you have undefined behavior if you dereference a null pointer, and you cannot create a null reference without dereferencing a null pointer, so null references exist only inside the realm of undefined behavior. Since your compiler may assume that you are not triggering undefined behavior, it can assume that there is no such thing as a null reference (even though it will readily emit code that generates null references!). As such, it sees the if() condition, concludes that it cannot be true, and just throw away the entire if() statement. With the introduction of link time optimizations, it has become plain impossible to check for null references in a robust way.
TL;DR:
Null references are somewhat of a ghastly existence:
Their existence seems impossible (= by the standard),
but they exist (= by the generated machine code),
but you cannot see them if they exist (= your attempts will be optimized away),
but they may kill you unaware anyway (= your program crashes at weird points, or worse).
Your only hope is that they don't exist (= write your program to not create them).
I do hope that will not come to haunt you!
clang++ 3.5 even warns on it:
/tmp/a.C:3:7: warning: reference cannot be bound to dereferenced null pointer in well-defined C++ code; comparison may be assumed to
always evaluate to false [-Wtautological-undefined-compare]
if( & nullReference == 0 ) // null reference
^~~~~~~~~~~~~ ~
1 warning generated.
If your intention was to find a way to represent null in an enumeration of singleton objects, then it's a bad idea to (de)reference null (it C++11, nullptr).
Why not declare static singleton object that represents NULL within the class as follows and add a cast-to-pointer operator that returns nullptr ?
Edit: Corrected several mistypes and added if-statement in main() to test for the cast-to-pointer operator actually working (which I forgot to.. my bad) - March 10 2015 -
// Error.h
class Error {
public:
static Error& NOT_FOUND;
static Error& UNKNOWN;
static Error& NONE; // singleton object that represents null
public:
static vector<shared_ptr<Error>> _instances;
static Error& NewInstance(const string& name, bool isNull = false);
private:
bool _isNull;
Error(const string& name, bool isNull = false) : _name(name), _isNull(isNull) {};
Error() {};
Error(const Error& src) {};
Error& operator=(const Error& src) {};
public:
operator Error*() { return _isNull ? nullptr : this; }
};
// Error.cpp
vector<shared_ptr<Error>> Error::_instances;
Error& Error::NewInstance(const string& name, bool isNull = false)
{
shared_ptr<Error> pNewInst(new Error(name, isNull)).
Error::_instances.push_back(pNewInst);
return *pNewInst.get();
}
Error& Error::NOT_FOUND = Error::NewInstance("NOT_FOUND");
//Error& Error::NOT_FOUND = Error::NewInstance("UNKNOWN"); Edit: fixed
//Error& Error::NOT_FOUND = Error::NewInstance("NONE", true); Edit: fixed
Error& Error::UNKNOWN = Error::NewInstance("UNKNOWN");
Error& Error::NONE = Error::NewInstance("NONE");
// Main.cpp
#include "Error.h"
Error& getError() {
return Error::UNKNOWN;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
Error& getErrorNone() {
return Error::NONE;
}
int main(void) {
if(getError() != Error::NONE) {
return EXIT_FAILURE;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
if(getErrorNone() != nullptr) {
return EXIT_FAILURE;
}
}
I came across this code:
void f(const std::string &s);
And then a call:
f( *((std::string*)NULL) );
And I was wondering what others think of this construction, it is used to signal that function f() should use some default value (which it computes) instead of some user provided value.
I am not sure what to think of it, it looks weird but what do you think of this construction?
No. It is undefined behaviour and can lead to code to do anything (including reformatting you hard disk, core dumping or insulting your mother).
If you need to be able to pass NULL, then use pointers. Code that takes a reference can assume it refers to a valid object.
Addendum: The C++03 Standard (ISO/IEC 14882, 2nd edition 2003) says, in §8.3.2 "References", paragraph 4:
A reference shall be initialized to refer to a valid object
or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only
way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer,
which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. ]
[Bold added for emphasis]
You will sometimes see constructions like this in fairly esoteric template library code, but only inside a sizeof() where it is harmless.
Supposing you wanted to know the size of the return type of a function-like type F if it was passed a reference to a type T as an argument (both of those being template parameters). You could write:
sizeof(F(T()))
But what if T happens to have no public default constructor? So you do this instead:
sizeof(F(*((T *)0)))
The expression passed to sizeof never executes - it just gets analyzed to the point where the compiler knows the size of the result.
I'm curious - does function 'f' actually check for this condition? Because if it doesn't, and it tries to use the string, then this is clearly going to crash when you try to use it.
And if 'f' does check the reference for NULL, then why isn't it just using a pointer? Is there some hard and fast rule that you won't use pointers and some knucklehead obeyed the letter of the law without thinking about what it meant?
I'd just like to know...
Is using NULL references OK?
No, unless you do not like your boss and your job ;)
This is something VERY bad. One of most important point of reference that it
can't be NULL (unless you force it)
for the case you can make "empty object", which will play the role of the zero pointer
class Foo
{
static Foo empty;
public:
static bool isEmpty( const Foo& ref )
{
return &ref==∅
}
}
As others already said: A reference has to be valid. That's why it's a reference instead of a pointer.
If you want to make f() have a default behavior, you might want to use this:
static const std::string default_for_f;
void f(const std::string &s = default_for_f)
{
if (&s == &default_for_f)
{
// make default processing
}
else
...
}
...
void bar()
{
f(); // call with default behavior
f(default_for_f); // call with default behavior
f(std::string()); // call with other behavior
}
You can spare the default parameter for f(). (Some people hate default parameters.)
f( *((std::string*)NULL) );
This is essentially dereferencing NULL, which on most systems is #defined to be 0. Last I checked 0x00000000 is an invalid memory address for doing anything.
Whatever happened to just checking
if (std::string.length() > 0) ....