Are there any uses with unitialized variables? - c++

In C++ Primer Plus (6th. edition), page 73, it states:
If you don't initialize a variable that is defined inside a function,
the variable's value is indeterminate. That means the value of
whatever happened to be sitting at that memory location prior to the
creation of that variable.
Does that mean I can use an uninitialized variable to get data on the memory location at that point in the program? If true, are there any instances where this property is useful?

The standard's wording for what you quoted is as follows (§8.5 [dcl.init]):
If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value.
The book is a little misleading about the previous value being available at that location. It is often true, but not necessarily.
The important point is that accessing the value of such a object results in undefined behaviour. Accessing the value of a object is formally known as lvalue-to-rvalue conversion, which says this (§4.1 [conv.lval]):
If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.
This occurs only when an operator in an expression requires an rvalue operand, which is usually the case. However, the unary & operator requires an lvalue operand, so lvalue-to-rvalue conversion is not applied. That means taking the address of an uninitialized variable is fine. This makes logical sense, because the object exists and has a valid address, it just isn't initialized. Taking the address doesn't require accessing the object's value.
Why would you do this? It's hard to think of a specific example because the idea is very broad and because we don't often leave our variables uninitialized (if at all). If you needed to store the address of an object before you assigned a value to it though, this is what you would need to do. You could later access the object through that pointer (once it has been assigned to). In fact, you could assign to it through the pointer.

While the standard effectively forbids ever using the value of an uninitialized variable, you are usually able to read it anyway, and it will usually indeed contain whatever value was last stored at that location.
However, this information is not useful in any way. Since your program always runs in its own private virtual address space, anything you might read has been written by your own program. I. e., it's information that you have already anyway, and which you can pass around in much more secure ways. This is especially true if you are using an uninitialized variable within a function: all you will ever be able to see that way is information that was written to a local variable by some function that you called earlier.
You might be able to find out some internals of the standard C library and other libraries that you use because their code runs within your address space. But most information will not be stored on the stack anyway, so you would need to read data from some other memory locations than an uninitialized variable. You would need to dereference pointers to stuff that you don't own, and that is really deep down in undefined behaviour. If you try this, you will most likely get segfaults.
So, yes, it's possible, but...

Related

Does a member have to be initialized to take its address?

Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
#include <string>
class Klass {
public:
Klass()
: ptr_str{&str}
, str{}
{}
private:
std::string *ptr_str;
std::string str;
};
this question is similar to mine, but the order is correct there, and the answer says
I'd advise against coding like this in case someone changes the order of the members in your class.
Which seems to mean reversing the order would be illegal but I couldn't be sure.
Does a member have to be initialized to take its address?
No.
Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
Yes. Yes.
There is no restriction that operand of unary & need to be initialised. There is an example in the standard in specification of unary & operator:
int a;
int* p1 = &a;
Here, the value of a is indeterminate and it is OK to point to it.
What that example doesn't demonstrate is pointing to an object before its lifetime has begun, which is what happens in your example. Using a pointer to an object before and after its lifetime is explicitly allowed if the storage is occupied. Standard draft says:
[basic.life] Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways ...
The rule goes on to list how the usage is restricted. You can get by with common sense. In short, you can treat it as you could treat a void*, except violating these restrictions is UB rather than ill-formed. Similar rule exists for references.
There are also restrictions on computing the address of non-static members specifically. Standard draft says:
[class.cdtor] ... To form a pointer to (or access the value of) a direct non-static member of an object obj, the construction of obj shall have started and its destruction shall not have completed, otherwise the computation of the pointer value (or accessing the member value) results in undefined behavior.
In the constructor of Klass, the construction of Klass has started and destruction hasn't completed, so the above rule is satisfied.
P.S. Your class is copyable, but the copy will have a pointer to the member of another instance. Consider whether that makes sense for your class. If not, you will need to implement custom copy and move constructors and assignment operators. A self-reference like this is a rare case where you may need custom definitions for those, but not a custom destructor, so it is an exception to the rule of five (or three).
P.P.S If your intention is to point to one of the members, and no object other than a member, then you might want to use a pointer to member instead of pointer to object.
Funny question.
It is legitimate and will "work", though barely. There is a little "but" related to types which makes the whole thing a bit awkward with a bad taste (but not illegitimate), and which might make it illegal some border cases involving inheritance.
You can, of course, take the address of any object whether it's initialized or not, as long as it exists in the scope and has a name which you can prepend operator& to. Dereferencing the pointer is a different thing, but that wasn't the question.
Now, the subtle problem is that the standard defines the result of operator& for non-static struct members as "“pointer to member of class C of type T” and is a prvalue designating C::m".
Which basically means that ptr_str{&str} will take the address of str, but the type is not pointer-to, but pointer-to-member-of. It is then implicitly and silently cast to pointer-to.
In other words, although you do not need to explicitly write &this->str, that's nevertheless what its type is -- it's what it is and what it means [1].
Is this valid, and is it safe to use it within the initializer list? Well yes, just... barely. It's safe to use it as long as it's not being used to access uninitialized members or virtual functions, directly or indirectly. Which, as it happens, is the case here (it might not be the case in a different, arguably contrived case).
[1] Funnily, paragraph 4 starts with a clause that says that no member pointer is formed when you put stuff in parentheses. That's remarkable because most people would probably do that just to be 100% sure they got operator precedence right. But if I read correctly, then &this->foo and &(this->foo) are not in any way the same!

What is the rationale behind returning unique addresses for allocations of zero size in C++?

What is the rationale behind returning unique addresses for allocations of zero size in C++?
Background: the C11 standard says about malloc (7.20.3 Memory management functions):
If the size of the space requested is zero, the behavior is implementation defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
That is, as I see it, malloc always succeeds for allocations of zero size since the only you can do with the pointer of a zero-sized allocation is call some other memory allocation function like free with it:
if malloc returns NULL, free(NULL) is ok so this can be considered a success,
if it returns some other value, that's also a success (because it isn't NULL), the only condition is that free on the value should also work.
Also, C11 (also 7.20.3) does not specify that the address returned from malloc must be unique, only that they must point to disjoint memory regions:
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends
from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.
All objects of zero size are disjoint AFAICT, and that would mean that malloc can return the same pointer for multiple zero-sized allocations (e.g. NULL would be fine), or different pointers each time, or the same pointer for some, etc.
Then C++98 came along with two raw memory allocation functions:
void* operator new(std::size_t size);
void* operator new(std::size_t size, std::align_val_t alignment);
Note that these functions only return raw memory: they do not create or initialize any objects of any type AFAICT.
You call them like this:
#include <iostream>
#include <new>
int main() {
void* ptr = operator new(std::size_t{0});
std::cout << ptr << std::endl;
operator delete(ptr, std::size_t{0});
return 0;
}
The [new.delete.single] section of the C++17 standard explains them, but the key guarantee as I see it is given in [basic.stc.dynamic.allocation]:
Even if the size of the space requested is zero, the request can fail. If the request succeeds, the value returned shall be a non-null pointer value (7.11) p0 different from any previously returned value p1, unless that value p1 was subsequently passed to an operator delete. Furthermore, for the library allocation functions in 21.6.2.1 and 21.6.2.2, p0 shall represent the address of a block of storage disjoint from the storage for any other object accessible to the caller. The effect of indirecting through a pointer returned as a request for zero size is undefined.38
That is, they must always return distinct pointers on success. That's a bit change from malloc.
My question is: What is the rationale behind this change? (that is, behind returning unique addresses for allocations of zero size in C++)
Ideally the answer would be just a link to the paper (or some other source) that explored the alternatives and motivated their semantics. Typically I go for The Design and Evolution of C++ for these C++98 questions, but Section 10 (Memory Management) does not mention anything about it. Otherwise, some sort of authoritative reference would be nice.
Disclaimer: I asked it on reddit but I did not ask nicely enough so I did not got any useful answer. I would like to kindly ask you that if you only have an hypothesis, please feel free to post it as an answer but mention that it is only an hypothesis.
Also, on reddit people went on and on about zero-sized types, whether I have a proposal to change the standard, etc. This question is about the semantics of the raw memory allocation functions when passed a size equal to zero. If topics like zero-sized types are relevant for your answer, please include them! But try not to get too derailed with tangential issues.
Also, on reddit people also threw arguments like "that's for optimization purposes" without really being able to mention anything more concrete. I'd expect something more concrete than "because optimizations" in an answer. For example, one redditor mentioned aliasing optimizations, but I wondered which kind of aliasing optimizations apply to pointers that cannot be dereferenced, and wasn't able to get anyone to comment on that. So maybe if you are going to mention optimizations, a small example that shows it would enrich the discussion.
The problem is that objects (no matter their size) in C++ must have a unique identity. So different coexisting objects (no matter their size) must have different address, since two pointer that compare as equal are assumed to point to a same object.
If you admit that zero-sized objects can have same address you cannot anymore distinguish if two address are or not a same object.
Many comments about the "new does not return objects" issue.
Please FORGET OOP terminology in this context:
C++ specification have a precise definition of what the word "Object" means.
CPP Reference:Object
In particular:
C++ programs create, destroy, refer to, access, and manipulate objects.
An object, in C++, is a region of storage that has
size (can be determined with sizeof);
alignment requirement (can be determined with alignof);
storage duration (automatic, static, dynamic, thread-local);
lifetime (bounded by storage duration or temporary);
type;
value (which may be indeterminate, e.g. for default-initialized non-class types);
optionally, a name.
The following entities are not objects: value, reference, function,
enumerator, type, non-static class member, bit-field, template, class or
function template specialization, namespace, parameter pack, and this.
A variable is an object or a reference that is not a non-static data member,
that is introduced by a declaration.
Objects are created by definitions, new-expressions, throw-expressions, when
changing the active member of a union, and where temporary objects are
required.
The reason is simply that code should not require special handling of boundary conditions. Many, I would say most, algorithms have to deal with zero-sized objects as boundary conditions. Less common is the algorithm that compare pointers to objects to see if they are the same object, but this still should work even for zero-sized objects.
However, your question assumes that this is a change. Apart from a brief hiatus in the late 1980's all C and C++ implementations that I am aware of have always behaved like this.
The original C compiler by dmr behaved like this but then around 1987 the draft C standard specified that malloc of a zero sized object return NULL. This was truly bizarre and even the final C-89 standard made it implementation-defined but I have never since encountered an implementation that did this horrible thing.
I talk more about this in my blog in the section "Malloc Madness".

Volatile Pointer to Non Volatile Data

Suppose I have the following declaration:
int* volatile x;
I believe that this defines a volatile pointer "normal" variable.
To me this could mean one of two things:
First Guess
The pointer can change, but the number will not change without notice. This means that some other thread (that the compiler doesn't know about) can change the pointer, but if the old pointer was pointing to a "12" then the new pointer (the new value of the pointer, because the thread changes it) would point to another "12".
To me this seems fairly useless, and I would assume that this is not what the real operation is.
Second Guess
The pointer can change, and thus if the pointer changes, the compiler must reload the value in the pointer before using it. But if it verifies that the pointer did not change (with an added check), it can then assume that the value it points to remained the same also.
So my question is this:
What does declaring a volatile pointer to non volatile data actually do?
int *volatile x; declares a volatile pointer to a non-volatile int.
Whenever the pointer is accessed, the volatile qualifier guarantees that its value (the value of the pointer) is re-read from memory.
Since the pointed-to int is non-volatile, the compiler is allowed to reuse a previously cached value at the address pointed to by the current value of the pointer. Technically this is allowed regardless of whether the pointer has changed or not, as long as there exists a cached value originally retrieved from the current address.
[ EDIT ] To address #DavidSchwartz's comment, I should note that "re-read from memory" is a (not pedantically precise, but AFAIK commonly used) shorthand for "as if it were re-read from memory in the abstract machine".
For example, C11 draft N1570 6.7.3/7 says:
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously (134). What constitutes an access to an object that has volatile-qualified type is implementation-defined.
The same draft has a footnote for 6.5.16/3 (assignment operators):
The implementation is permitted to read the object to determine the value but is not required to, even when the object has volatile-qualified type
So in the end volatile does not require a physical memory read, but the observable behavior of a compliant implementation must be as if one was made regardless.
The volatile means that the value of the pointer (i.e., the memory location that it points to) can change; consequently, the compiler must ensure that the various caches have the same value for that pointer or load the pointer from memory for every read and write it to memory for every write.
The volatile says nothing about the pointed-to value, however. So it can change and may have different values in different threads.

Is it legal to compare dangling pointers?

Is it legal to compare dangling pointers?
int *p, *q;
{
int a;
p = &a;
}
{
int b;
q = &b;
}
std::cout << (p == q) << '\n';
Note how both p and q point to objects that have already vanished. Is this legal?
Introduction: The first issue is whether it is legal to use the value of p at all.
After a has been destroyed, p acquires what is known as an invalid pointer value. Quote from N4430 (for discussion of N4430's status see the "Note" below):
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of the deallocated storage become invalid pointer values.
The behaviour when an invalid pointer value is used is also covered in the same section of N4430 (and almost identical text appears in C++14 [basic.stc.dynamic.deallocation]/4):
Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
[ Footnote: Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault. — end footnote ]
So you will need to consult your implementation's documentation to find out what should happen here (since C++14).
The term use in the above quotes means necessitating lvalue-to-rvalue conversion, as in C++14 [conv.lval/2]:
When an lvalue-to-rvalue conversion is applied to an expression e, and [...] the object to which the glvalue refers contains an invalid pointer value, the behaviour is implementation-defined.
History: In C++11 this said undefined rather than implementation-defined; it was changed by DR1438. See the edit history of this post for the full quotes.
Application to p == q: Supposing we have accepted in C++14+N4430 that the result of evaluating p and q is implementation-defined, and that the implementation does not define that a hardware trap occurs; [expr.eq]/2 says:
Two pointers compare equal if they are both null, both point to the same function, or both represent the same address (3.9.2), otherwise they compare unequal.
Since it's implementation-defined what values are obtained when p and q are evaluated, we can't say for sure what will happen here. But it must be either implementation-defined or unspecified.
g++ appears to exhibit unspecified behaviour in this case; depending on the -O switch I was able to have it say either 1 or 0, corresponding to whether or not the same memory address was re-used for b after a had been destroyed.
Note about N4430: This is a proposed defect resolution to C++14, that hasn't been accepted yet. It cleans up a lot of wording surrounding object lifetime, invalid pointers, subobjects, unions, and array bounds access.
In the C++14 text, it is defined under [basic.stc.dynamic.deallocation]/4 and subsequent paragraphs that an invalid pointer value arises when delete is used. However it's not clearly stated whether or not the same principle applies to static or automatic storage.
There is a definition "valid pointer" in [basic.compound]/3 but it is too vague to use sensibly.The [basic.life]/5 (footnote) refers to the same text to define the behaviour of pointers to objects of static storage duration, which suggests that it was meant to apply to all types of storage.
In N4430 the text is moved from that section up one level so that it does clearly apply to all storage durations. There is a note attached:
Drafting note: this should apply to all storage durations that can end, not just to dynamic storage duration. On an implementation supporting threads or segmented stacks, thread and automatic storage may behave in the same way that dynamic storage does.
My opinion: I don't see any consistent way to interpret the standard (pre-N4430) other than to say that p acquires an invalid pointer value. The behaviour doesn't seem to be covered by any other section besides what we have already looked at. So I am happy to treat the N4430 wording as representing the intent of the standard in this case.
Historically, there have been some systems where using a pointer as an rvalue might cause the system to fetch some information identified by some bits in that pointer. For example, if a pointer could contain the address of an object's header along with an offset into the object, fetching a pointer could cause the system to also fetch some information from that header. If the object has ceased to exist, the attempt to fetch information from its header could fail with arbitrary consequences.
That having been said, in the vast majority of C implementations, all pointers that were alive at some particular moment in time will forever hold the same relationships with regard to the relational and subtraction operators as they had at that particular time. Indeed, in most implementations if one has char *p, one may determine whether it identifies part of an object identified by char *base; size_t size; by checking whether (size_t)(p-base) < size; such comparison will work even retrospectively if there is any overlap in the objects' lifetime.
Unfortunately, the Standard defines no means by which code can indicate that it requires any of the latter guarantees, nor is there a standard means by which code can ask whether a particular implementation can promise any of the latter behaviors and refuse compilation if it does not. Further, some hyper-modern implementations will regard any use of relational or subtraction operators on two pointers as a promise by the programmer that the pointers in question will always identify the same live object, and omit any code which would only be relevant if that assumption didn't hold. Consequently, even though many hardware platforms would be able to offer guarantees that would be useful to many algorithms, there's no safe way by which code can exploit any such guarantees even if code will never need to run on hardware which does not naturally provide them.
The pointers contain the addresses of the variables they reference. The addresses are valid even when the variables that used to be stored there are released / destroyed / unavailable.
As long as you don't try to use the values at those addresses you are safe, meaning *p and *q will be undefined.
Obviously the result is implementation defined, therefore this code example can be used to study the features of your compiler if one doesn't want to dig into to assembly code.
Whether this is a meaningful practice is totally different discussion.

What happens when I do int*p=p in c/cpp?

Below code is getting compiled in MinGw. How does it get compiled? How is it possible to assign a variable which is not yet created?
int main()
{
int*p=p;
return 0;
}
How does it get compiled?
The point of declaration of a variable starts at the end of its declarator, but before its initialiser. This allows more legitimate self-referential declarations like
void * p = &p;
as well as undefined initialisations like yours.
How is it possible to assign a variable which is not yet created?
There is no assignment here, just initialisation.
The variable has been created (in the sense of having storage allocated for it), but not initialised. You initialise it from whatever indeterminate value happened to be in that storage, with undefined behaviour.
Most compilers will give a warning or error about using uninitialised values, if you ask them to.
Let's take a look at what happens with the int*p=p; statement:
The compiler allocates space on the stack to hold the yet uninitialized value of variable p
Then the compiler initializes p with its uninitialized value
So, essentially there should be no problem with the code except that it assigns a variable an uninitialized value.
Actually there is no much difference than the following code:
int *q; // define a pointer and do not initialize it
int *p = q; // assign the value of the uninitizlized pointer to another pointer
The likely result ("what it compiles to") will be the declaration of a pointer variable that is not initialized at all (which is subsequently optimized out since it is not used, so the net result would be "empty main").
The pointer is declared and initialized. So far, this is an ordinary and legal thing. However, it is initialized to itself, and its value is only in a valid, initialized state after the end of the statement (that is, at the location of the semicolon).
This, unsurprisingly, makes the statement undefined behavior.
By definition, invoking undefined behavior could in principle cause just about everything (although often quoted dramatic effects like formatting your harddrive or setting the computer on fire are exaggerated).
The compiler might actually generate an instruction that moves a register (or memory location) to itself, which would be a no-op instruction on most architectures, but could cause a hardware exception killing your process on some exotic architectures which have special validating registers for pointers (in case the "random" value is incidentially an invalid address).
The compiler will however not insert any "format harddisk" statements.
In practice, optimizing compilers will nowadays often assume "didn't happen" when they encounter undefined behavior, so it is most likely that the compiler will simply honor the declaration, and do nothing else.
This is, in every sense, perfectly allowable in the light of undefined behavior. Further, it is the easiest and least troublesome option for the compiler.