Do classes with uninitialized pointers have undefined behavior?

Do classes with uninitialized pointers have undefined behavior? - c++

class someClass
{
public:
int* ptr2Int;
};
Is this a valid class (yes it compiles)? Provided one assigns a value to ptr2Int before dereferencing it, is the class guaranteed to work as one would expect?

An uninitialized pointer inside a class is in no way different from a standalone uninitailized pointer. As long as you are not using the pointer in any dangerous way, you are fine.
Keep in mind though that "dangerous ways" of using an uninitialized pointer include a mere attempt to read its value (no dereference necessary). The implicit compiler-provided copy-constructor and copy-assignment operators present in your class might perform such an attempt if you use these implicit member functions before the pointer gets assigned a valid value.
Actually, if I'm not mistaken, this issue was a matter of some discussion at the level of the standardization committee. Are the implicitly generated member functions allowed to trip over trap representations possibly present in the non-initialized members of the class? I don't remember what was the verdict. (Or maybe I saw that discussion in the context of C99?)

Yes, it's fine. The pointer itself exists, just its value is just unknown, so dereferencing it is unsafe. Having an uninitialized variable is perfectly fine, and pointers aren't any different

Yes, this is exactly the same as a struct with a single uninitialized pointer, and both are guaranteed to work just fine (as long as you set the pointer before any use of it, of course).

Until you dereference the pointer it's all good, then it's undefined territory.
Some compilers will set pointers to default values (like null) depending if you compile in debug or release mode. So things could work in one mode and suddenly everything falls apart in another.

Related

Does a member have to be initialized to take its address?

Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
#include <string>
class Klass {
public:
Klass()
: ptr_str{&str}
, str{}
{}
private:
std::string *ptr_str;
std::string str;
};
this question is similar to mine, but the order is correct there, and the answer says
I'd advise against coding like this in case someone changes the order of the members in your class.
Which seems to mean reversing the order would be illegal but I couldn't be sure.

Does a member have to be initialized to take its address?
No.
Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
Yes. Yes.
There is no restriction that operand of unary & need to be initialised. There is an example in the standard in specification of unary & operator:
int a;
int* p1 = &a;
Here, the value of a is indeterminate and it is OK to point to it.
What that example doesn't demonstrate is pointing to an object before its lifetime has begun, which is what happens in your example. Using a pointer to an object before and after its lifetime is explicitly allowed if the storage is occupied. Standard draft says:
[basic.life] Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways ...
The rule goes on to list how the usage is restricted. You can get by with common sense. In short, you can treat it as you could treat a void*, except violating these restrictions is UB rather than ill-formed. Similar rule exists for references.
There are also restrictions on computing the address of non-static members specifically. Standard draft says:
[class.cdtor] ... To form a pointer to (or access the value of) a direct non-static member of an object obj, the construction of obj shall have started and its destruction shall not have completed, otherwise the computation of the pointer value (or accessing the member value) results in undefined behavior.
In the constructor of Klass, the construction of Klass has started and destruction hasn't completed, so the above rule is satisfied.
P.S. Your class is copyable, but the copy will have a pointer to the member of another instance. Consider whether that makes sense for your class. If not, you will need to implement custom copy and move constructors and assignment operators. A self-reference like this is a rare case where you may need custom definitions for those, but not a custom destructor, so it is an exception to the rule of five (or three).
P.P.S If your intention is to point to one of the members, and no object other than a member, then you might want to use a pointer to member instead of pointer to object.

Funny question.
It is legitimate and will "work", though barely. There is a little "but" related to types which makes the whole thing a bit awkward with a bad taste (but not illegitimate), and which might make it illegal some border cases involving inheritance.
You can, of course, take the address of any object whether it's initialized or not, as long as it exists in the scope and has a name which you can prepend operator& to. Dereferencing the pointer is a different thing, but that wasn't the question.
Now, the subtle problem is that the standard defines the result of operator& for non-static struct members as "“pointer to member of class C of type T” and is a prvalue designating C::m".
Which basically means that ptr_str{&str} will take the address of str, but the type is not pointer-to, but pointer-to-member-of. It is then implicitly and silently cast to pointer-to.
In other words, although you do not need to explicitly write &this->str, that's nevertheless what its type is -- it's what it is and what it means [1].
Is this valid, and is it safe to use it within the initializer list? Well yes, just... barely. It's safe to use it as long as it's not being used to access uninitialized members or virtual functions, directly or indirectly. Which, as it happens, is the case here (it might not be the case in a different, arguably contrived case).
[1] Funnily, paragraph 4 starts with a clause that says that no member pointer is formed when you put stuff in parentheses. That's remarkable because most people would probably do that just to be 100% sure they got operator precedence right. But if I read correctly, then &this->foo and &(this->foo) are not in any way the same!

c++: why can't 'this' be a nullptr?

In my early days with C++, I seem to recall you could call a member function with a NULL pointer, and check for that in the member function:
class Thing {public: void x();}
void Thing::x()
{ if (this == NULL) return; //nothing to do
...do stuff...
}
Thing* p = NULL; //nullptr these days, of course
p->x(); //no crash
Doing this may seem silly, but it was absolutely wonderful when writing recursive functions to traverse data structures, where navigating could easily run into the blind alley of a NULL; navigation functions could do a single check for NULL at the top and then blithely call themselves to try to navigate deeper without littering the code with additional checks.
According to g++ at least, the freedom (if it ever existed) has been revoked. The compiler warns about it, and if compiling optimized, it causes crashes.
Question 1: does the C++ standard (any flavor) disallow a NULL this? Or is g++ just getting in my face?
Question 2. More philosophically, why? 'this' is just another pointer. The glory of pointers is that they can be nullptr, and that's a useful condition.
I know I can get around this by making static functions, passing as first parameter a pointer to the data structure (hellllo Days of C) and then check the pointer. I'm just surprised I'd need to.
Edit: To upvote an answer I'd like to see chapter and verse from the standard on why this is disallowed. Note that my example at NO POINT dereferences NULL. Nothing is virtual here, and p is copied to "argument this" but then checked before use. No defererence occurs! so dereference of NULL can't be used as a claim of UB.
People are making a knee-jerk reaction to *p and assuming it isn't valid if p is NULL. But it is, and the evidence is here:
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232
In fact it calls out two cases when a pointer, p, is surprisingly valid as *p: when p is null or when p points one element past the end of an array. What you must never do is USE the value of *p... other than to take the address of it. &*p where p == nullptr for any pointer type p IS valid. It's fine to point out that p->x() is really (*p).x(), but at the end of the day that translates to x(&*p) and that is perfectly well formed and valid. For p=nullptr... it simply becomes x(nullptr).
I think my debate should be with the standards community; in their haste to undercut the concept of a null reference, they left wording unclear. Since no one here has demanded p->x() is UB without trying to demand that it's UB because *p is UB; and because *p is definitely not UB because no aspect of x() uses the referenced value, I'm going to put this down to g++ overreaching on a standard ambiguity. The absolutely identical mechanism using a static function and extra parameter is well defined, so it's not like it stops my refactor effort. Consider the question withdrawn; portable code can't assume this==nullptr will work but there's a portable solution available, so in the end it doesn't matter.

To be in a situation where this is nullptr implies you called a non-static member function without using a valid instance such as with a pointer set to nullptr. Since this is forbidden, to obtain a null this you must already be in undefined behavior. In other words, this is never nullptr unless you have undefined behavior. Due to the nature of undefined behavior, you can simplify the statement to simply be "this is never nullptr" since no rule needs to be upheld in the presence of undefined behavior.

Question 1: does the C++ standard (any flavor) disallow a NULL this?
Or is g++ just getting in my face?
The C++ standard disallows it -- calling a method on a NULL pointer is officially 'undefined behavior' and you must avoid doing it or you will get bit. In particular, optimizers will assume that the this-pointer is non-NULL when making optimizations, leading to strange/unexpected behaviors at runtime (I know this from experience :))
Question 2. More philosophically, why? 'this' is just another pointer.
The glory of pointers is that they can be nullptr, and that's a useful
condition.
I'm not sure it matters, really; it's what is specified in the C++ standard, and they probably had their reasons (philosophical or otherwise), but since the standard specifies it, the compilers expect it, therefore as programmers we have to abide by it, or face undefined behavior. (One can imagine an alternate universe where NULL this-pointers are allowed, but we don't live there)

The question has already been answered - it is undefined behavior to dereference a null pointer, and using *obj or obj-> are both dereferencing.
Now (since I assume you have a question on how to work around this) the solution is to use static function:
class Foo {
static auto bar_st(Foo* foo) { if (foo) return foo->bar(); }
}
Having said that, I do think that gcc's decision of eliminating all branches for nullptr this was not a wise one. Nobody gained by that, and a lot of people suffered. What's the benefit?

C++ does not allow calling member functions of null object. Objects need identity and that can not be stored to null pointer. What would happen if member function would read or write a field of a object referenced by null pointer?
It sounds like you could use null object pattern in your code to create wanted result.
Null pointer is recognised a problematic entity in object oriented languages because in most languages it is not a object. This creates a need for code that specifically handles the case something being null. While checking for special null pointer is the norm. There are other approaches. Smalltalk actually has a NullObject which has methods its own methods. As all objects it can also be extended. Go programming language does allow calling struct member functions for something that is nil (which sounds like something required in the question).

this might be null too if you delete this (which is possible but not recommended)

Does taking address of member variable through a null pointer yield undefined behavior?

The following code (or its equivalent which uses explicit casts of null literal to get rid of temporary variable) is often used to calculate the offset of a specific member variable within a class or struct:
class Class {
public:
int first;
int second;
};
Class* ptr = 0;
size_t offset = reinterpret_cast<char*>(&ptr->second) -
reinterpret_cast<char*>(ptr);
&ptr->second looks like it is equivalent to the following:
&(ptr->second)
which in turn is equivalent to
&((*ptr).second)
which dereferences an object instance pointer and yields undefined behavior for null pointers.
So is the original fine or does it yield UB?

Despite the fact that it does nothing, char* foo = 0; *foo; is could be undefined behavior.
Dereferencing a null pointer is could be undefined behavior. And yes , ptr->foo is equivalent to (*ptr).foo, and *ptr dereferences a null pointer.
There is currently an open issue in the working groups about if *(char*)0 is undefined behavior if you don't read or write to it. Parts of the standard imply it is, other parts imply it is not. The current notes there seem to lean towards making it defined.
Now, this is in theory. How about in practice?
Under most compilers, this works because no checks are done at dereferencing time: memory around where null pointer point to is guarded against access, and the above expression simply takes an address of something around null, it does not read or write the value there.
This is why cpp reference offsetof lists pretty much that trick as a possible implementation. The fact that some (many? most? every one I've checked?) compilers implement offsetof in a similar or equivalent manner does not mean that the behavior is well defined under the C++ standard.
However, given the ambiguity, compilers are free to add checks at every instruction that dereferences a pointer, and execute arbitrary code (fail fast error reporting, for example) if null is indeed dereferenced. Such instrumentation might even be useful to find bugs where they occur, instead of where the symptom occurs. And on systems where there is writable memory near 0 such instrumentation could be key (pre-OSX MacOS had some writable memory that controlled system functions near 0).
Such compilers could still write offsetof that way, and introduce pragmas or the like to block the instrumentation in the generated code. Or they could switch to an intrinsic.
Going a step further, C++ leaves lots of latitude on how non-standard-layout data is arranged. In theory, classes could be implemented as rather complex data structures and not the nearly standard-layout structures we have grown to expect, and the code would still be valid C++. Accessing member variables to non-standard-layout types and taking their address could be problematic: I do not know if there is any guarantee that the offset of a member variable in a non-standard layout type does not change between instances!
Finally, some compilers have aggressive optimization settings that find code that executes undefined behavior (at least under certain branches or conditions), and uses that to mark that branch as unreachable. If it is decided that null dereference is undefined behavior, this could be a problem. A classic example is gcc's aggressive signed integer overflow branch eliminator. If the standard dictates something is undefined behavior, the compiler is free to consider that branch unreachable. If the null dereference is not behind a branch in a function, the compiler is free to declare all code that calls that function to be unreachable, and recurse.
And it would be free to do this in not the current, but the next version of your compiler.
Writing code that is standards-valid is not just about writing code that compiles today cleanly. While the degree to which dereferencing and not using a null pointer is defined is currently ambiguous, relying on something that is only ambiguously defined is risky.

Is there a practical benefit to casting a NULL pointer to an object and calling one of its member functions?

Ok, so I know that technically this is undefined behavior, but nonetheless, I've seen this more than once in production code. And please correct me if I'm wrong, but I've also heard that some people use this "feature" as a somewhat legitimate substitute for a lacking aspect of the current C++ standard, namely, the inability to obtain the address (well, offset really) of a member function. For example, this is out of a popular implementation of a PCRE (Perl-compatible Regular Expression) library:
#ifndef offsetof
#define offsetof(p_type,field) ((size_t)&(((p_type *)0)->field))
#endif
One can debate whether the exploitation of such a language subtlety in a case like this is valid or not, or even necessary, but I've also seen it used like this:
struct Result
{
void stat()
{
if(this)
// do something...
else
// do something else...
}
};
// ...somewhere else in the code...
((Result*)0)->stat();
This works just fine! It avoids a null pointer dereference by testing for the existence of this, and it does not try to access class members in the else block. So long as these guards are in place, it's legitimate code, right? So the question remains: Is there a practical use case, where one would benefit from using such a construct? I'm especially concerned about the second case, since the first case is more of a workaround for a language limitation. Or is it?
PS. Sorry about the C-style casts, unfortunately people still prefer to type less if they can.

The first case is not calling anything. It's taking the address. That's a defined, permitted, operation. It yields the offset in bytes from the start of the object to the specified field. This is a very, very, common practice, since offsets like this are very commonly needed. Not all objects can be created on the stack, after all.
The second case is reasonably silly. The sensible thing would be to declare that method static.

I don't see any benefit of ((Result*)0)->stat(); - it is an ugly hack which will likely break sooner than later. The proper C++ approach would be using a static method Result::stat() .
offsetof() on the other hand is legal, as the offsetof() macro never actually calls a method or accesses a member, but only performs address calculations.

Everybody else has done a good job of reiterating that the behavior is undefined. But lets pretend it wasn't, and that p->member is allowed to behave in a consistent manner under certain circumstances even if p isn't a valid pointer.
Your second construct would still serve almost no purpose. From a design perspective, you've probably done something wrong if a single function can do its job both with and without accessing members, and if it can then splitting the static portion of the code into a separate, static function would be much more reasonable than expecting your users to create a null pointer to operate on.
From a safety perspective, you've only protected against a small portion of the ways an invalid this pointer can be created. There's uninitialized pointers, for starters:
Result* p;
p->stat(); //Oops, 'this' is some random value
There's pointers that have been initialized, but are still invalid:
Result* p = new Result;
delete p;
p->stat(); //'this' points to "safe" memory, but the data doesn't belong to you
And even if you always initialize your pointers, and absolutely never accidentally reuse free'd memory:
struct Struct {
int i;
Result r;
}
int main() {
((Struct*)0)->r.stat(); //'this' is likely sizeof(int), not 0
}
So really, even if it weren't undefined behavior, it is worthless behavior.

Although libraries targeting specific C++ implementations may do this, that doesn't mean it's "legitimate" generally.
This works just fine! It avoids a null
pointer dereference by testing for the
existence of this, and it does not try
to access class members in the else
block. So long as these guards are in
place, it's legitimate code, right?
No, because although it might work fine on some C++ implementations, it is perfectly okay for it to not work on any conforming C++ implementation.

Dereferencing a null-pointer is undefined behavior and anything can happen if you do it. Don't do it if you want a program that works.
Just because it doesn't immediately crash in one specific test case doesn't mean that it won't get you into all kinds of trouble.

Undefined behaviour is undefined behaviour. Do this tricks "work" for your particular compiler? well, possibly. will they work for the next iteration of it. or for another compiler? Possibly not. You pays your money and you takes your choice. I can only say that in nearly 25 years of C++ programming I've never felt the need to do any of these things.

Regarding the statement:
It avoids a null pointer dereference by testing for the existence of this, and it does not try to access class members in the else block. So long as these guards are in place, it's legitimate code, right?
The code is not legitimate. There is no guarantee that the compiler and/or runtime will actually call to the method when the pointer is NULL. The checking in the method is of no help because you can't assume that the method will actually end up being called with a NULL this pointer.

C++: member pointer initialised?

Code sample should explain things:
class A
{
B* pB;
C* pC;
D d;
public :
A(int i, int j) : d(j)
{
pC = new C(i, "abc");
} // note pB is not initialised, e.g. pB(NULL)
...
};
Obviously pB should be initialised to NULL explicitly to be safe (and clear), but, as it stands, what is the value of pB after construction of A? Is it default initialised (which is zero?) or not (i.e. indeterminate and whatever was in memory). I realise initialisation in C++ has a fair few rules.
I think it isn't default initialised; as running in debug mode in Visual Studio it has set pB pointing to 0xcdcdcdcd - which means the memory has been new'd (on the heap) but not initialised. However in release mode, pB always points to NULL. Is this just by chance, and therefore not to be relied upon; or are these compilers initialising it for me (even if it's not in the standard)? It also seems to be NULL when compiled with Sun's compiler on Solaris.
I'm really looking for a specific reference to the standard to say one way or the other.
Thanks.

Here is the relevant passage fromt he standard:
12.6.2 Initializing bases and members [class.base.init]
4 If a given nonstatic data member or
base class is not named by a mem-
initializer-id in the
mem-initializer-list, then
--If the entity is a nonstatic data
member of (possibly cv-qualified)
class type (or array thereof) or a base class, and the entity class
is a non-POD class, the entity is default-initialized (dcl.init).
If the entity is a nonstatic data member of a const-qualified type,
the entity class shall have a user-declared default constructor.
--Otherwise, the entity is not
initialized. If the entity is of
const-qualified type or reference type, or of a (possibly cv-quali-
fied) POD class type (or array thereof) containing (directly or
indirectly) a member of a const-qualified type, the program is
ill-
formed.
After the call to a constructor for
class X has completed, if a member
of X is neither specified in the
constructor's mem-initializers, nor
default-initialized, nor initialized
during execution of the body of
the constructor, the member has
indeterminate value.

According to the C++0x standard section 12.6.2.4, in the case of your pointer variable, if you don't include it in the initializer list and you don't set it in the body of the constructor, then it has indeterminate value. 0xCDCDCDCD and 0 are two possible such values, as is anything else. :-)

I believe this is a artifact from the good old C days when you could not have expectations on what alloc'd memory contains. As the standards progressed to C++ this "convention" was maintained. As the C++ compilers developed the individual authors took it upon themselves to "fix" this problem. Therefore your mileage may vary depending on your compiler of choice.
The "0xcdcdcdcd" looks to be a readily identifiable pattern that "helps" in debugging you code. That is why it doesn't show in release mode.
I hope this helped in a little way and good luck.

Uninitialised pointers are allow to basically contain a random value, although some compilers tend to fill them with 0 or some other recognisable value, especially in debug mode.
IMHO this is due to C++'s "don't pay for what you don't use" design. If you don't consider it important, the compiler does not need to go through the expense of initialising the variable for you. Of course, once you've chased a random pointer you might find it prudent to initialise it the next time around...

The value of pB is undefined. It may or may not be consistently the same value - usually depends on what was previously at the same place in memory prior to the allocation of a particular instance of A.

Uninitialised pointers can point to anything. Some compiler vendors will help you out and make them point to 0 or 0xcdcdcdcd or whatever.
To make sure your code is safe and portable you should always initialise your pointers. either to 0 or to a valid value.
e.g.
C* pc = 0;
or
C* pc = new C(...);
If you always initialise pointers to 0 then this is safe :
if (!pc)
pc = new C(...);
If you don't initialise then you've got no way of telling initialised and uninitialised pointers apart.
As an aside, there's no such keyword in C++ as NULL. Most compilers define NULL as 0, but it's not considered portable to use it. The new c++0x standard will introduce a new keyword, nullptr, so when that comes out we'll finally have a portable null pointer constant.

It's rare that I'll recommend not learning something about the language you're using, but in this case, whether or not pB is initialized isn't useful information. Just initialize it. If it's automatically initialized, the compiler will optimize out the extra initialization. If it isn't, you've added one extra processor instruction and prevented a whole slew of potential bugs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js