Is using NULL references OK? - c++

I came across this code:
void f(const std::string &s);
And then a call:
f( *((std::string*)NULL) );
And I was wondering what others think of this construction, it is used to signal that function f() should use some default value (which it computes) instead of some user provided value.
I am not sure what to think of it, it looks weird but what do you think of this construction?

No. It is undefined behaviour and can lead to code to do anything (including reformatting you hard disk, core dumping or insulting your mother).
If you need to be able to pass NULL, then use pointers. Code that takes a reference can assume it refers to a valid object.
Addendum: The C++03 Standard (ISO/IEC 14882, 2nd edition 2003) says, in §8.3.2 "References", paragraph 4:
A reference shall be initialized to refer to a valid object
or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only
way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer,
which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. ]
[Bold added for emphasis]

You will sometimes see constructions like this in fairly esoteric template library code, but only inside a sizeof() where it is harmless.
Supposing you wanted to know the size of the return type of a function-like type F if it was passed a reference to a type T as an argument (both of those being template parameters). You could write:
sizeof(F(T()))
But what if T happens to have no public default constructor? So you do this instead:
sizeof(F(*((T *)0)))
The expression passed to sizeof never executes - it just gets analyzed to the point where the compiler knows the size of the result.

I'm curious - does function 'f' actually check for this condition? Because if it doesn't, and it tries to use the string, then this is clearly going to crash when you try to use it.
And if 'f' does check the reference for NULL, then why isn't it just using a pointer? Is there some hard and fast rule that you won't use pointers and some knucklehead obeyed the letter of the law without thinking about what it meant?
I'd just like to know...

Is using NULL references OK?
No, unless you do not like your boss and your job ;)
This is something VERY bad. One of most important point of reference that it
can't be NULL (unless you force it)

for the case you can make "empty object", which will play the role of the zero pointer
class Foo
{
static Foo empty;
public:
static bool isEmpty( const Foo& ref )
{
return &ref==∅
}
}

As others already said: A reference has to be valid. That's why it's a reference instead of a pointer.
If you want to make f() have a default behavior, you might want to use this:
static const std::string default_for_f;
void f(const std::string &s = default_for_f)
{
if (&s == &default_for_f)
{
// make default processing
}
else
...
}
...
void bar()
{
f(); // call with default behavior
f(default_for_f); // call with default behavior
f(std::string()); // call with other behavior
}
You can spare the default parameter for f(). (Some people hate default parameters.)

f( *((std::string*)NULL) );
This is essentially dereferencing NULL, which on most systems is #defined to be 0. Last I checked 0x00000000 is an invalid memory address for doing anything.
Whatever happened to just checking
if (std::string.length() > 0) ....

Related

C++ returns objects by value

The topic is pretty much in the title of the question. I saw this in Meyrses book "Effective C++":
the fact that C++ returns objects by value
What does that mean and how the C++ standard supports that message? For instanance, say that we have something like this:
int foo()
{
int a = 1;
return a;
}
That's pretty clear, the phrase would mean that we returns the copy of the value stored in the local variable. But consider this:
int& foo()
{
int a = 1;
return a;
}
A compiler should warn us about returning a reference to a local variable. How does that "returning by value fact" apply to that example?
Meyers is correct in the main, though you have to take that wording with a pinch of salt when dealing with references. At a certain level of abstraction, here you're passing the reference itself "by value".
But what he's really trying to say is that, beyond that, C++ passes by value by default, and that this contrasts with languages such as Java in which objects are always chucked around with reference semantics instead.
In fact, one could argue that the passage doesn't apply to your code at all, because the reference is not an "object".
When the book says that "C++ returns objects by value", it explains what happens when you use a "plain" class name as the return type without additional "decorations", such as ampersands or asterisks, e.g.
struct MyType {
... // Some members go here
};
MyType foo() {
...
}
In the example above foo() returns an object by value.
This quote should not suggest that C++ lacks other ways of returning data from a function: as you can easily construct a function that returns a reference or a pointer.
Note that returning an object by pointer or by reference creates undefined behavior only when you return a pointer or a reference to a local object. Accessing object past its lifetime always causes undefined behavior. Returning a local by reference or by pointer is perhaps the most common mistake that causes this undefined behavior.
Return by reference will give you a error as u are passing a reference of a variable to another function ( function which called function foo) which is beyond the scope of that local variable ( variable a).

Method in null class pointer (c++)

Let's say whe have
class Foo{
public:
bool error;
......
bool isValid(){return error==false;}
};
and somewhere
Foo *aFoo=NULL;
I usually would do if (aFoo!=NULL && aFoo->isValid()).....
But what if in the isValid method I test the nullity:
bool isValid(){return this!=NULL && error==false)
That would simplify the external testing with simply calling if (aFoo->isValid())
I've tested it in some compilers and it works but I wonder if it is standard and could cause problems when porting to other environments.
The compiler is free to optimize away the check -- calling any non-static member of any class through an invalid (or NULL pointer) is undefined behavior. Please don't do this.
Why not simply a namespace-scope function like this?
bool isValid(Foo* f) {return f && f->isValid();}
An if-Statement like
if (aFoo->isValid())
Implies that the pointer is pointing to a valid object. It would be a huge source of confusion and very error prone.
Finally, your code would indeed invoke undefined behavior - aFoo->isValid is per definition equivalent to (*aFoo).isValid:
N3337, §5.2.5/2
The expression E1->E2 is converted to the equivalent form
(*(E1)).E2;
which would dereference a null pointer to obtain a null reference, which is clearly undefined:
N3337, §8.3.2/5
[ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the
“object” obtained by dereferencing a null pointer, which causes
undefined behavior. […] — end note ]
Generally it would be bad design and in standard C++ it doesn't make much sense as your internal NULL check implies that you would call a null pointer, which is undefined behavior.
This topic was discusses here:
Checking if this is null

const casting an int in a class vs outside a class

I read on the wikipedia page for Null_pointer that Bjarne Stroustrup suggested defining NULL as
const int NULL = 0;
if "you feel you must define NULL." I instantly thought, hey.. wait a minute, what about const_cast?
After some experimenting, I found that
int main() {
const int MyNull = 0;
const int* ToNull = &MyNull;
int* myptr = const_cast<int*>(ToNull);
*myptr = 5;
printf("MyNull is %d\n", MyNull);
return 0;
}
would print "MyNull is 0", but if I make the const int belong to a class:
class test {
public:
test() : p(0) { }
const int p;
};
int main() {
test t;
const int* pptr = &(t.p);
int* myptr = const_cast<int*>(pptr);
*myptr = 5;
printf("t.p is %d\n", t.p);
return 0;
}
then it prints "t.p is 5"!
Why is there a difference between the two? Why is "*myptr = 5;" silently failing in my first example, and what action is it performing, if any?
First of all, you're invoking undefined behavior in both cases by trying to modify a constant variable.
In the first case the compiler sees that MyNull is declared as a constant and replaces all references to it within main() with a 0.
In the second case, since p is within a class the compiler is unable to determine that it can just replace all classInstance.p with 0, so you see the result of the modification.
Firstly, what happens in the first case is that the compiler most likely translates your
printf("MyNull is %d\n", MyNull);
into the immediate
printf("MyNull is %d\n", 0);
because it knows that const objects never change in a valid program. Your attempts to change a const object leads to undefined behavior, which is exactly what you observe. So, ignoring the undefined behavior for a second, from the practical point of view it is quite possible that your *myptr = 5 successfully modified your Null. It is just that your program doesn't really care what you have in your Null now. It knows that Null is zero and will always be zero and acts accordingly.
Secondly, in order to define NULL per recommendation you were referring to, you have to define it specifically as an Integral Constant Expression (ICE). Your first variant is indeed an ICE. You second variant is not. Class member access is not allowed in ICE, meaning that your second variant is significantly different from the first. The second variant does not produce a viable definition for NULL, and you will not be able to initialize pointers with your test::p even though it is declared as const int and set to zero
SomeType *ptr1 = Null; // OK
test t;
SomeType *ptr2 = t.p; // ERROR: cannot use an `int` value to initialize a pointer
As for the different output in the second case... undefined behavior is undefined behavior. It is unpredictable. From the practical point of view, your second context is more complicated, so the compiler was unable to prefrom the above optimization. i.e. you are indeed succeeded in breaking through the language-level restrictions and modifying a const-qualified variable. Language specification does not make it easy (or possible) for the compilers to optimize out const members of the class, so at the physical level that p is just another member of the class that resides in memory, in each object of that class. Your hack simply modifies that memory. It doesn't make it legal though. The behavior si still undefined.
This all, of course, is a rather pointless exercise. It looks like it all began from the "what about const_cast" question. So, what about it? const_cast has never been intended to be used for that purpose. You are not allowed to modify const objects. With const_cast, or without const_cast - doesn't matter.
Your code is modifying a variable declared constant so anything can happen. Discussing why a certain thing happens instead of another one is completely pointless unless you are discussing about unportable compiler internals issues... from a C++ point of view that code simply doesn't have any sense.
About const_cast one important thing to understand is that const cast is not for messing about variables declared constant but about references and pointers declared constant.
In C++ a const int * is often understood to be a "pointer to a constant integer" while this description is completely wrong. For the compiler it's instead something quite different: a "pointer that cannot be used for writing to an integer object".
This may apparently seem a minor difference but indeed is a huge one because
The "constness" is a property of the pointer, not of the pointed-to object.
Nothing is said about the fact that the pointed to object is constant or not.
The word "constant" has nothing to do with the meaning (this is why I think that using const it was a bad naming choice). const int * is not talking about constness of anything but only about "read only" or "read/write".
const_cast allows you to convert between pointers and references that can be used for writing and pointer or references that cannot because they are "read only". The pointed to object is never part of this process and the standard simply says that it's legal to take a const pointer and using it for writing after "casting away" const-ness but only if the pointed to object has not been declared constant.
Constness of a pointer and a reference never affects the machine code that will be generated by a compiler (another common misconception is that a compiler can produce better code if const references and pointers are used, but this is total bogus... for the optimizer a const reference and a const pointer are just a reference and a pointer).
Constness of pointers and references has been introduced to help programmers, not optmizers (btw I think that this alleged help for programmers is also quite questionable, but that's another story).
const_cast is a weapon that helps programmers fighting with broken const-ness declarations of pointers and references (e.g. in libraries) and with the broken very concept of constness of references and pointers (before mutable for example casting away constness was the only reasonable solution in many real life programs).
Misunderstanding of what is a const reference is also at the base of a very common C++ antipattern (used even in the standard library) that says that passing a const reference is a smart way to pass a value. See this answer for more details.

Is null reference possible?

Is this piece of code valid (and defined behavior)?
int &nullReference = *(int*)0;
Both g++ and clang++ compile it without any warning, even when using -Wall, -Wextra, -std=c++98, -pedantic, -Weffc++...
Of course the reference is not actually null, since it cannot be accessed (it would mean dereferencing a null pointer), but we could check whether it's null or not by checking its address:
if( & nullReference == 0 ) // null reference
References are not pointers.
8.3.2/1:
A reference shall be initialized to
refer to a valid object or function.
[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior. As
described in 9.6, a reference cannot
be bound directly to a bit-field. ]
1.9/4:
Certain other operations are described
in this International Standard as
undefined (for example, the effect of
dereferencing the null pointer)
As Johannes says in a deleted answer, there's some doubt whether "dereferencing a null pointer" should be categorically stated to be undefined behavior. But this isn't one of the cases that raise doubts, since a null pointer certainly does not point to a "valid object or function", and there is no desire within the standards committee to introduce null references.
The answer depends on your view point:
If you judge by the C++ standard, you cannot get a null reference because you get undefined behavior first. After that first incidence of undefined behavior, the standard allows anything to happen. So, if you write *(int*)0, you already have undefined behavior as you are, from a language standard point of view, dereferencing a null pointer. The rest of the program is irrelevant, once this expression is executed, you are out of the game.
However, in practice, null references can easily be created from null pointers, and you won't notice until you actually try to access the value behind the null reference. Your example may be a bit too simple, as any good optimizing compiler will see the undefined behavior, and simply optimize away anything that depends on it (the null reference won't even be created, it will be optimized away).
Yet, that optimizing away depends on the compiler to prove the undefined behavior, which may not be possible to do. Consider this simple function inside a file converter.cpp:
int& toReference(int* pointer) {
return *pointer;
}
When the compiler sees this function, it does not know whether the pointer is a null pointer or not. So it just generates code that turns any pointer into the corresponding reference. (Btw: This is a noop since pointers and references are the exact same beast in assembler.) Now, if you have another file user.cpp with the code
#include "converter.h"
void foo() {
int& nullRef = toReference(nullptr);
cout << nullRef; //crash happens here
}
the compiler does not know that toReference() will dereference the passed pointer, and assume that it returns a valid reference, which will happen to be a null reference in practice. The call succeeds, but when you try to use the reference, the program crashes. Hopefully. The standard allows for anything to happen, including the appearance of pink elephants.
You may ask why this is relevant, after all, the undefined behavior was already triggered inside toReference(). The answer is debugging: Null references may propagate and proliferate just as null pointers do. If you are not aware that null references can exist, and learn to avoid creating them, you may spend quite some time trying to figure out why your member function seems to crash when it's just trying to read a plain old int member (answer: the instance in the call of the member was a null reference, so this is a null pointer, and your member is computed to be located as address 8).
So how about checking for null references? You gave the line
if( & nullReference == 0 ) // null reference
in your question. Well, that won't work: According to the standard, you have undefined behavior if you dereference a null pointer, and you cannot create a null reference without dereferencing a null pointer, so null references exist only inside the realm of undefined behavior. Since your compiler may assume that you are not triggering undefined behavior, it can assume that there is no such thing as a null reference (even though it will readily emit code that generates null references!). As such, it sees the if() condition, concludes that it cannot be true, and just throw away the entire if() statement. With the introduction of link time optimizations, it has become plain impossible to check for null references in a robust way.
TL;DR:
Null references are somewhat of a ghastly existence:
Their existence seems impossible (= by the standard),
but they exist (= by the generated machine code),
but you cannot see them if they exist (= your attempts will be optimized away),
but they may kill you unaware anyway (= your program crashes at weird points, or worse).
Your only hope is that they don't exist (= write your program to not create them).
I do hope that will not come to haunt you!
clang++ 3.5 even warns on it:
/tmp/a.C:3:7: warning: reference cannot be bound to dereferenced null pointer in well-defined C++ code; comparison may be assumed to
always evaluate to false [-Wtautological-undefined-compare]
if( & nullReference == 0 ) // null reference
^~~~~~~~~~~~~ ~
1 warning generated.
If your intention was to find a way to represent null in an enumeration of singleton objects, then it's a bad idea to (de)reference null (it C++11, nullptr).
Why not declare static singleton object that represents NULL within the class as follows and add a cast-to-pointer operator that returns nullptr ?
Edit: Corrected several mistypes and added if-statement in main() to test for the cast-to-pointer operator actually working (which I forgot to.. my bad) - March 10 2015 -
// Error.h
class Error {
public:
static Error& NOT_FOUND;
static Error& UNKNOWN;
static Error& NONE; // singleton object that represents null
public:
static vector<shared_ptr<Error>> _instances;
static Error& NewInstance(const string& name, bool isNull = false);
private:
bool _isNull;
Error(const string& name, bool isNull = false) : _name(name), _isNull(isNull) {};
Error() {};
Error(const Error& src) {};
Error& operator=(const Error& src) {};
public:
operator Error*() { return _isNull ? nullptr : this; }
};
// Error.cpp
vector<shared_ptr<Error>> Error::_instances;
Error& Error::NewInstance(const string& name, bool isNull = false)
{
shared_ptr<Error> pNewInst(new Error(name, isNull)).
Error::_instances.push_back(pNewInst);
return *pNewInst.get();
}
Error& Error::NOT_FOUND = Error::NewInstance("NOT_FOUND");
//Error& Error::NOT_FOUND = Error::NewInstance("UNKNOWN"); Edit: fixed
//Error& Error::NOT_FOUND = Error::NewInstance("NONE", true); Edit: fixed
Error& Error::UNKNOWN = Error::NewInstance("UNKNOWN");
Error& Error::NONE = Error::NewInstance("NONE");
// Main.cpp
#include "Error.h"
Error& getError() {
return Error::UNKNOWN;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
Error& getErrorNone() {
return Error::NONE;
}
int main(void) {
if(getError() != Error::NONE) {
return EXIT_FAILURE;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
if(getErrorNone() != nullptr) {
return EXIT_FAILURE;
}
}

Pointer vs. Reference

What would be better practice when giving a function the original variable to work with:
unsigned long x = 4;
void func1(unsigned long& val) {
val = 5;
}
func1(x);
or:
void func2(unsigned long* val) {
*val = 5;
}
func2(&x);
IOW: Is there any reason to pick one over another?
My rule of thumb is:
Use pointers if you want to do pointer arithmetic with them (e.g. incrementing the pointer address to step through an array) or if you ever have to pass a NULL-pointer.
Use references otherwise.
I really think you will benefit from establishing the following function calling coding guidelines:
As in all other places, always be const-correct.
Note: This means, among other things, that only out-values (see item 3) and values passed by value (see item 4) can lack the const specifier.
Only pass a value by pointer if the value 0/NULL is a valid input in the current context.
Rationale 1: As a caller, you see that whatever you pass in must be in a usable state.
Rationale 2: As called, you know that whatever comes in is in a usable state. Hence, no NULL-check or error handling needs to be done for that value.
Rationale 3: Rationales 1 and 2 will be compiler enforced. Always catch errors at compile time if you can.
If a function argument is an out-value, then pass it by reference.
Rationale: We don't want to break item 2...
Choose "pass by value" over "pass by const reference" only if the value is a POD (Plain old Datastructure) or small enough (memory-wise) or in other ways cheap enough (time-wise) to copy.
Rationale: Avoid unnecessary copies.
Note: small enough and cheap enough are not absolute measurables.
This ultimately ends up being subjective. The discussion thus far is useful, but I don't think there is a correct or decisive answer to this. A lot will depend on style guidelines and your needs at the time.
While there are some different capabilities (whether or not something can be NULL) with a pointer, the largest practical difference for an output parameter is purely syntax. Google's C++ Style Guide (https://google.github.io/styleguide/cppguide.html#Reference_Arguments), for example, mandates only pointers for output parameters, and allows only references that are const. The reasoning is one of readability: something with value syntax should not have pointer semantic meaning. I'm not suggesting that this is necessarily right or wrong, but I think the point here is that it's a matter of style, not of correctness.
Pointers
A pointer is a variable that holds a memory address.
A pointer declaration consists of a base type, an *, and the variable name.
A pointer can point to any number of variables in lifetime
A pointer that does not currently point to a valid memory location is given the value null (Which is zero)
BaseType* ptrBaseType;
BaseType objBaseType;
ptrBaseType = &objBaseType;
The & is a unary operator that returns the memory address of its operand.
Dereferencing operator (*) is used to access the value stored in the variable which pointer points to.
int nVar = 7;
int* ptrVar = &nVar;
int nVar2 = *ptrVar;
Reference
A reference (&) is like an alias to an existing variable.
A reference (&) is like a constant pointer that is automatically dereferenced.
It is usually used for function argument lists and function return values.
A reference must be initialized when it is created.
Once a reference is initialized to an object, it cannot be changed to refer to another object.
You cannot have NULL references.
A const reference can refer to a const int. It is done with a temporary variable with value of the const
int i = 3; //integer declaration
int * pi = &i; //pi points to the integer i
int& ri = i; //ri is refers to integer i – creation of reference and initialization
You should pass a pointer if you are going to modify the value of the variable.
Even though technically passing a reference or a pointer are the same, passing a pointer in your use case is more readable as it "advertises" the fact that the value will be changed by the function.
If you have a parameter where you may need to indicate the absence of a value, it's common practice to make the parameter a pointer value and pass in NULL.
A better solution in most cases (from a safety perspective) is to use boost::optional. This allows you to pass in optional values by reference and also as a return value.
// Sample method using optional as input parameter
void PrintOptional(const boost::optional<std::string>& optional_str)
{
if (optional_str)
{
cout << *optional_str << std::endl;
}
else
{
cout << "(no string)" << std::endl;
}
}
// Sample method using optional as return value
boost::optional<int> ReturnOptional(bool return_nothing)
{
if (return_nothing)
{
return boost::optional<int>();
}
return boost::optional<int>(42);
}
Use a reference when you can, use a pointer when you have to.
From C++ FAQ: "When should I use references, and when should I use pointers?"
A reference is an implicit pointer. Basically you can change the value the reference points to but you can't change the reference to point to something else. So my 2 cents is that if you only want to change the value of a parameter pass it as a reference but if you need to change the parameter to point to a different object pass it using a pointer.
Consider C#'s out keyword. The compiler requires the caller of a method to apply the out keyword to any out args, even though it knows already if they are. This is intended to enhance readability. Although with modern IDEs I'm inclined to think that this is a job for syntax (or semantic) highlighting.
Pass by const reference unless there is a reason you wish to change/keep the contents you are passing in.
This will be the most efficient method in most cases.
Make sure you use const on each parameter you do not wish to change, as this not only protects you from doing something stupid in the function, it gives a good indication to other users what the function does to the passed in values. This includes making a pointer const when you only want to change whats pointed to...
Pointers:
Can be assigned nullptr (or NULL).
At the call site, you must use & if your type is not a pointer itself,
making explicitly you are modifying your object.
Pointers can be rebound.
References:
Cannot be null.
Once bound, cannot change.
Callers don't need to explicitly use &. This is considered sometimes
bad because you must go to the implementation of the function to see if
your parameter is modified.
A reference is similar to a pointer, except that you don’t need to use a prefix ∗ to access the value referred to by the reference. Also, a reference cannot be made to refer to a different object after its initialization.
References are particularly useful for specifying function arguments.
for more information see "A Tour of C++" by "Bjarne Stroustrup" (2014) Pages 11-12