I want to check to see whether something is null, e.g.:
string xxx(const NotMyClass& obj) {
if (obj == NULL) {
//...
}
}
But the compiler complains about this: there are 5 possible overloads of ==.
So I tried this:
if (obj == static_cast<NotMyClass>(NULL)) {
This crashes because NotMyClass's == overload doesn't handle nulls.
edit: for everyone tell me it can't be NULL, I'm certainly getting something NULL like in my debugger:
In a well-formed C++ program, references are never NULL (more accurately, the address of an object to which you have a reference may never be NULL).
So not only is the answer "no, there's no way", a corollary is "this makes no sense".
Your statement regarding C makes no sense either, since C does not have references.
And as for Java, its "references" are more like C++ pointers in many ways, including this one.
Comparing such specific behaviours between different languages is something of a fool's errand.
If you need this "optional object" behaviour, then you're looking for pointers:
std::string xxx(const NotMyClass* ptr) {
if (ptr == NULL)
throw SomeException();
const NotMyClass& ref = *ptr;
/* ... */
}
But consider whether you really need this; a decent alternative might be boost::optional if you really do.
What you're asking makes no sense. References in C++ can never be "null", since they can only ever be created by aliasing an existing object, and they cannot be rebound. Once a reference to x, always a reference to x.
(A reference may become "dangling" if the original object's lifetime ends before that of the reference, but that's a programming error and not a checkable runtime condition.)
You don't need to test this, as references in C++ can't be NULL. Pointers can be NULL, but you're not using them here.
As others said, well-defined code never has NULL references, so it's not your responsibility to test for them.
That doesn't strictly mean they aren't ever created in practice though (but hopefully in intermediate, rather than production code). It's possible in some compilers, though definitely not standard C++, to get a reference whose address is NULL:
int * p = NULL;
int & x = *p;
Often won't crash (yet), although by the C++ standard, it's nondeterministic behavior after the second line. This is a side-effect of references typically being implemented with pointers "behind the scenes." It will probably crash later down the line when someone uses x.
If you're trying to debug such a situation, you can test if the address of x is not NULL:
#include <cassert>
// ...
assert(&x != NULL);
As people have said references in C++ should never be null (NULL or nullptr), however it is still possible to get null references, especially if you do some evil casting. (A long time ago I did such a thing when I didn't know any better.)
To test if a reference is null (NULL or nullptr) convert it to a pointer and then test. So:
if (&obj == nullptr)
is what you are effectively looking for.
But now since you know how to do it, don't. Just assume that references can never be null and let the application crash if they are, because by then something else must have gone horribly wrong and the program should be terminated.
Related
I have code that calculates an array index, and if it is valid accesses that array item. Something like:
int b = rowCount() - 1;
if (b == -1) return;
const BlockInfo& bi = blockInfo[b];
I am worried that this might be triggering undefined behavior. For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.
Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result? Does it change if blockInfo is not an actual array, but an container like a vector? If this is unsafe, could I fix it by putting the access in an else clause?
if (b == -1) {
return;
} else {
const BlockInfo& bi = blockInfo[b];
}
Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?
For clarification: My concern is specifically because of a different issue, where you intend to test whether a pointer is non-null before accessing it. The compiler turns this around and reasons that, since you are dereferencing it, it cannot have been null! Something like this (untested):
void someFunc(struct MyStruct *s) {
if (s != NULL) {
cout << s->someField << endl;
delete s;
}
}
I recall hearing that simply forming an out-of-bounds array access is UB in C++. Thus the compiler could legally assume the array index is not out of bounds, and remove checks to the contrary.
There is no access to blockInfo[-1] in your program. Your code specifically prohibits that.
For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.
No, it cannot do that, precisely because an access to index -1 (or, rather, (std::size_t)-1) may or may not be a valid index. The language does let you pass -1 as an index; it'll just be converted first to a std::size_t with the well-defined unsigned wrap-around logic that comes with doing so. So there is not, and cannot be, any rule whereby the compiler is permitted to assume that you will never pass int -1 as an index.
Even if there were, it'd still make no sense to let the compiler completely ignore the if statement. If it could, if our if statements were not reliable, every program in the world would be unsafe! There'd be no way to enforce any of your operations' preconditions.
The compiler may only skip or re-order things when it can prove that doing so results in a well-defined program with the same behaviour as your original instructions, given any possible input.
In fact, this is where UB comes from: where proving correctness is really difficult, that's usually where the standard throws compilers a bone and says something is "undefined" and the compiler can just do whatever it likes.
One interesting example of this is kind of the opposite of your case, where a check is [erroneously] placed after the access, and the compiler therefore assumes the check passes, whether it actually did or not:
void foo(char* ptr)
{
char x = *ptr;
if (ptr)
bar();
else
baz();
}
The function foo may call bar() even if ptr is null! That might sound unlikely to you, but it actually does happen (e.g. this crash in a widely-used library).
could I fix it by putting the access in an else clause?
Those two pieces of code are semantically equivalent; it's the same program.
Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?
The compiler already does the obvious thing, as long as "obvious" is "according to the C++ standard".
the compiler might assume
If the compiler proceeds from a wrong assumption, then it's wrong and defective.
Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result?
It is never safe to access an array out of bounds, because that produces UB before you have a chance to use or not-use the result. However, an untaken branch in the code doesn't count as an access, as in your first or second examples. So, if I understand your last question, there's no need for a special flag.
In maintaining a large legacy code base I came across this function which serves as an accessor to an XML tree.
std::string DvCfgNode::getStringValue() const
{
xmlChar *val = xmlNodeGetContent(mpNode);
if (val == 0)
return 0;
std::string value = (const char *)val;
xmlFree(val);
return value;
}
How can this function return '0'? In C you could return a zero pointer as char * indicating no data was found, but is this possible in C++ with std::string? I wouldn't think so but not knowledgable enough with C++.
The (10 year old) code compiles and runs under C++ 98.
EDIT I
Thanks to everyone for the comments. To answer a few questions:
a) Code is actually 18 years old and about, umm, 500K-1M lines (?)
b) exception handling is turned on, but there are no try-catch blocks anywhere except for a few in main(), which result in immediate program termination.
c) Poor design in the calling code which seems to "trust" getStringValue() to return a proper value, so lots of something like:
std::string s = pTheNode->getStringValue()
Probably just lucky it never returned zero (or if it did, nobody found the bug until now).
Your intuition about the "zero pointer as char*" is correct. What is happening is that 0 is being interpreted as the null pointer, resulting in the returned string being initialized from a const char* null pointer.
However, that is undefined behaviour in C++. The std::string(const char*) constructor requires a pointer to a null-terminated string. So you have found a bug. The fix really depends on the requirements of the function, but I throwing an exception would be an improvement over undefined behaviour*.
* That is a massive understatement. Code should not have undefined behaviour
It depends on how you want to signal that there was no data. If no data means that there is an empty string value in the xml tree you can just return an empty string.
In case you want to model e.g. that there is no data item and thus no data in the tree, you have several options depending on your data semantics.
If the data is mandatory and shall be present, you have an object with a violated invariant, i.e. an object in an illegal state. Using that object for anything is illegal. I would either std::terminate the program (or use some other termination mechanism that is suitable, e.g. an error reporter) or throw something that is guaranteed not to be caught and handled.
If the data is optional you can return something that models this. In C, you would probably go with a pointer to an object which can be null, but this introduces ownership issues. In C++, you can return an std::optional<std::string> which exactly describes this.
According to the c++ grammar, const int* const p means that what p points to and it' value can't be rewritten.But today I found that if I code like this:
void f(const int* const p)
{
char* ch = (char*) p;
int* q = (int*) ch;
(*q) = 3; //I can modify the integer that p points to
}
In this condition,the keyword "const" will lose it's effect.Is there any significance to use "const"?
You are casting away the constness here:
char* ch = (char*) p;
Effectively, you are saying "I know what I am doing, forget you are const, I accept the consequences." C++ allows you to do stuff like this because sometimes it can be useful/necessary. But it is fraught with danger.
Note that if the argument passed to the function were really const, then your code would result in undefined behaviour (UB). And you have no way of knowing from inside the function.
Note also that in C++ it is preferable to make your intent clear,
int* pi = const_cast<int*>(p);
This makes it clear that your intention is to cast away the const. It is also easier to search for. The same caveats about danger and UB apply.
Your example will crash the app if const int* const p points to a compile time constant, when casting away constancy you need to be sure what you are doing.
C++ is a language for adults, extra power will never be sacrificed for ephemeral safety, it is the code author's choice whether to stay in a safe zone or to use cast operators and move one step closer to C/asm.
C/C++ will let you do many things that allow you to 'hurt' yourself. Here, casting away the const of p is "legal" because the compiler assumes you know what you are doing. Whether this is good coding style or not is another matter.
When you do something like this, you assume responsibility for any side effects and issues it could create. Here, if the memory pointed to in the parameter is static memory, the program will likely crash.
In short, just because you can do something doesn't mean it is a good idea.
The const keyword is a way to use the compiler to catch programming mistakes, nothing more. It makes no claims about the mutability of memory at runtime, only that the compiler should shout at you if you directly modify the value (without using C-style casts).
A C-style cast is pretty much a marker saying 'I know what I'm doing here'. Which in most instances is false.
Here you change the type of the pointer. Using such a cast (C-type) cast you can change it to any possible pointer with no problem.
The same way you can use c++ cast: const_cast<...>(...):
int* p_non_const = const_cast<int*>(p);
In this case (I hope) you see immediately what is happening - you simply rid of the constness.
Note that in your program you also don't need temprorary variable ch. You can do it directly:
int* q = (int*) p;
In principle you should not use such a cast, because correctly designed and written program doesn't need it. Usually the const_cast is used for quick and temporary changes in the program (to check something) or to use a function from some unproperly designed library.
I read on the wikipedia page for Null_pointer that Bjarne Stroustrup suggested defining NULL as
const int NULL = 0;
if "you feel you must define NULL." I instantly thought, hey.. wait a minute, what about const_cast?
After some experimenting, I found that
int main() {
const int MyNull = 0;
const int* ToNull = &MyNull;
int* myptr = const_cast<int*>(ToNull);
*myptr = 5;
printf("MyNull is %d\n", MyNull);
return 0;
}
would print "MyNull is 0", but if I make the const int belong to a class:
class test {
public:
test() : p(0) { }
const int p;
};
int main() {
test t;
const int* pptr = &(t.p);
int* myptr = const_cast<int*>(pptr);
*myptr = 5;
printf("t.p is %d\n", t.p);
return 0;
}
then it prints "t.p is 5"!
Why is there a difference between the two? Why is "*myptr = 5;" silently failing in my first example, and what action is it performing, if any?
First of all, you're invoking undefined behavior in both cases by trying to modify a constant variable.
In the first case the compiler sees that MyNull is declared as a constant and replaces all references to it within main() with a 0.
In the second case, since p is within a class the compiler is unable to determine that it can just replace all classInstance.p with 0, so you see the result of the modification.
Firstly, what happens in the first case is that the compiler most likely translates your
printf("MyNull is %d\n", MyNull);
into the immediate
printf("MyNull is %d\n", 0);
because it knows that const objects never change in a valid program. Your attempts to change a const object leads to undefined behavior, which is exactly what you observe. So, ignoring the undefined behavior for a second, from the practical point of view it is quite possible that your *myptr = 5 successfully modified your Null. It is just that your program doesn't really care what you have in your Null now. It knows that Null is zero and will always be zero and acts accordingly.
Secondly, in order to define NULL per recommendation you were referring to, you have to define it specifically as an Integral Constant Expression (ICE). Your first variant is indeed an ICE. You second variant is not. Class member access is not allowed in ICE, meaning that your second variant is significantly different from the first. The second variant does not produce a viable definition for NULL, and you will not be able to initialize pointers with your test::p even though it is declared as const int and set to zero
SomeType *ptr1 = Null; // OK
test t;
SomeType *ptr2 = t.p; // ERROR: cannot use an `int` value to initialize a pointer
As for the different output in the second case... undefined behavior is undefined behavior. It is unpredictable. From the practical point of view, your second context is more complicated, so the compiler was unable to prefrom the above optimization. i.e. you are indeed succeeded in breaking through the language-level restrictions and modifying a const-qualified variable. Language specification does not make it easy (or possible) for the compilers to optimize out const members of the class, so at the physical level that p is just another member of the class that resides in memory, in each object of that class. Your hack simply modifies that memory. It doesn't make it legal though. The behavior si still undefined.
This all, of course, is a rather pointless exercise. It looks like it all began from the "what about const_cast" question. So, what about it? const_cast has never been intended to be used for that purpose. You are not allowed to modify const objects. With const_cast, or without const_cast - doesn't matter.
Your code is modifying a variable declared constant so anything can happen. Discussing why a certain thing happens instead of another one is completely pointless unless you are discussing about unportable compiler internals issues... from a C++ point of view that code simply doesn't have any sense.
About const_cast one important thing to understand is that const cast is not for messing about variables declared constant but about references and pointers declared constant.
In C++ a const int * is often understood to be a "pointer to a constant integer" while this description is completely wrong. For the compiler it's instead something quite different: a "pointer that cannot be used for writing to an integer object".
This may apparently seem a minor difference but indeed is a huge one because
The "constness" is a property of the pointer, not of the pointed-to object.
Nothing is said about the fact that the pointed to object is constant or not.
The word "constant" has nothing to do with the meaning (this is why I think that using const it was a bad naming choice). const int * is not talking about constness of anything but only about "read only" or "read/write".
const_cast allows you to convert between pointers and references that can be used for writing and pointer or references that cannot because they are "read only". The pointed to object is never part of this process and the standard simply says that it's legal to take a const pointer and using it for writing after "casting away" const-ness but only if the pointed to object has not been declared constant.
Constness of a pointer and a reference never affects the machine code that will be generated by a compiler (another common misconception is that a compiler can produce better code if const references and pointers are used, but this is total bogus... for the optimizer a const reference and a const pointer are just a reference and a pointer).
Constness of pointers and references has been introduced to help programmers, not optmizers (btw I think that this alleged help for programmers is also quite questionable, but that's another story).
const_cast is a weapon that helps programmers fighting with broken const-ness declarations of pointers and references (e.g. in libraries) and with the broken very concept of constness of references and pointers (before mutable for example casting away constness was the only reasonable solution in many real life programs).
Misunderstanding of what is a const reference is also at the base of a very common C++ antipattern (used even in the standard library) that says that passing a const reference is a smart way to pass a value. See this answer for more details.
Is this piece of code valid (and defined behavior)?
int &nullReference = *(int*)0;
Both g++ and clang++ compile it without any warning, even when using -Wall, -Wextra, -std=c++98, -pedantic, -Weffc++...
Of course the reference is not actually null, since it cannot be accessed (it would mean dereferencing a null pointer), but we could check whether it's null or not by checking its address:
if( & nullReference == 0 ) // null reference
References are not pointers.
8.3.2/1:
A reference shall be initialized to
refer to a valid object or function.
[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior. As
described in 9.6, a reference cannot
be bound directly to a bit-field. ]
1.9/4:
Certain other operations are described
in this International Standard as
undefined (for example, the effect of
dereferencing the null pointer)
As Johannes says in a deleted answer, there's some doubt whether "dereferencing a null pointer" should be categorically stated to be undefined behavior. But this isn't one of the cases that raise doubts, since a null pointer certainly does not point to a "valid object or function", and there is no desire within the standards committee to introduce null references.
The answer depends on your view point:
If you judge by the C++ standard, you cannot get a null reference because you get undefined behavior first. After that first incidence of undefined behavior, the standard allows anything to happen. So, if you write *(int*)0, you already have undefined behavior as you are, from a language standard point of view, dereferencing a null pointer. The rest of the program is irrelevant, once this expression is executed, you are out of the game.
However, in practice, null references can easily be created from null pointers, and you won't notice until you actually try to access the value behind the null reference. Your example may be a bit too simple, as any good optimizing compiler will see the undefined behavior, and simply optimize away anything that depends on it (the null reference won't even be created, it will be optimized away).
Yet, that optimizing away depends on the compiler to prove the undefined behavior, which may not be possible to do. Consider this simple function inside a file converter.cpp:
int& toReference(int* pointer) {
return *pointer;
}
When the compiler sees this function, it does not know whether the pointer is a null pointer or not. So it just generates code that turns any pointer into the corresponding reference. (Btw: This is a noop since pointers and references are the exact same beast in assembler.) Now, if you have another file user.cpp with the code
#include "converter.h"
void foo() {
int& nullRef = toReference(nullptr);
cout << nullRef; //crash happens here
}
the compiler does not know that toReference() will dereference the passed pointer, and assume that it returns a valid reference, which will happen to be a null reference in practice. The call succeeds, but when you try to use the reference, the program crashes. Hopefully. The standard allows for anything to happen, including the appearance of pink elephants.
You may ask why this is relevant, after all, the undefined behavior was already triggered inside toReference(). The answer is debugging: Null references may propagate and proliferate just as null pointers do. If you are not aware that null references can exist, and learn to avoid creating them, you may spend quite some time trying to figure out why your member function seems to crash when it's just trying to read a plain old int member (answer: the instance in the call of the member was a null reference, so this is a null pointer, and your member is computed to be located as address 8).
So how about checking for null references? You gave the line
if( & nullReference == 0 ) // null reference
in your question. Well, that won't work: According to the standard, you have undefined behavior if you dereference a null pointer, and you cannot create a null reference without dereferencing a null pointer, so null references exist only inside the realm of undefined behavior. Since your compiler may assume that you are not triggering undefined behavior, it can assume that there is no such thing as a null reference (even though it will readily emit code that generates null references!). As such, it sees the if() condition, concludes that it cannot be true, and just throw away the entire if() statement. With the introduction of link time optimizations, it has become plain impossible to check for null references in a robust way.
TL;DR:
Null references are somewhat of a ghastly existence:
Their existence seems impossible (= by the standard),
but they exist (= by the generated machine code),
but you cannot see them if they exist (= your attempts will be optimized away),
but they may kill you unaware anyway (= your program crashes at weird points, or worse).
Your only hope is that they don't exist (= write your program to not create them).
I do hope that will not come to haunt you!
clang++ 3.5 even warns on it:
/tmp/a.C:3:7: warning: reference cannot be bound to dereferenced null pointer in well-defined C++ code; comparison may be assumed to
always evaluate to false [-Wtautological-undefined-compare]
if( & nullReference == 0 ) // null reference
^~~~~~~~~~~~~ ~
1 warning generated.
If your intention was to find a way to represent null in an enumeration of singleton objects, then it's a bad idea to (de)reference null (it C++11, nullptr).
Why not declare static singleton object that represents NULL within the class as follows and add a cast-to-pointer operator that returns nullptr ?
Edit: Corrected several mistypes and added if-statement in main() to test for the cast-to-pointer operator actually working (which I forgot to.. my bad) - March 10 2015 -
// Error.h
class Error {
public:
static Error& NOT_FOUND;
static Error& UNKNOWN;
static Error& NONE; // singleton object that represents null
public:
static vector<shared_ptr<Error>> _instances;
static Error& NewInstance(const string& name, bool isNull = false);
private:
bool _isNull;
Error(const string& name, bool isNull = false) : _name(name), _isNull(isNull) {};
Error() {};
Error(const Error& src) {};
Error& operator=(const Error& src) {};
public:
operator Error*() { return _isNull ? nullptr : this; }
};
// Error.cpp
vector<shared_ptr<Error>> Error::_instances;
Error& Error::NewInstance(const string& name, bool isNull = false)
{
shared_ptr<Error> pNewInst(new Error(name, isNull)).
Error::_instances.push_back(pNewInst);
return *pNewInst.get();
}
Error& Error::NOT_FOUND = Error::NewInstance("NOT_FOUND");
//Error& Error::NOT_FOUND = Error::NewInstance("UNKNOWN"); Edit: fixed
//Error& Error::NOT_FOUND = Error::NewInstance("NONE", true); Edit: fixed
Error& Error::UNKNOWN = Error::NewInstance("UNKNOWN");
Error& Error::NONE = Error::NewInstance("NONE");
// Main.cpp
#include "Error.h"
Error& getError() {
return Error::UNKNOWN;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
Error& getErrorNone() {
return Error::NONE;
}
int main(void) {
if(getError() != Error::NONE) {
return EXIT_FAILURE;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
if(getErrorNone() != nullptr) {
return EXIT_FAILURE;
}
}