[1]
Are there any cases in which the addition of p0593r6 into C++20 (§ 6.7.2.11 Object model [intro.object]) made std::launder not necessary, where the same use case in C++17 required std::launder, or are they completely orthogonal?
[2]
The example in the spec for [ptr::launder] is:
struct X { int n; };
const X *p = new const X{3};
const int a = p->n;
new (const_cast<X*>(p)) const X{5}; // p does not point to new object ([basic.life]) because its type is const
const int b = p->n; // undefined behavior
const int c = std::launder(p)->n; // OK
Another example is given by #Nicol Bolas in this SO answer, using a pointer that points to a valid storage but of a different type:
aligned_storage<sizeof(int), alignof(int)>::type data;
new(&data) int;
int *p = std::launder(reinterpret_cast<int*>(&data));
Are there other use cases, not related to allowing casting of two objects which are not transparently replaceable, for using std::launder?
Specifically:
Would reinterpret_cast from A* to B*, both are pointer-interconvertible, may require using std::launder in any case? (i.e. can two pointers be pointer-interconvertible and yet not be transparently replaceable? the spec didn't relate between these two terms).
Does reinterpret_cast from void* to T* require using std::launder?
Does the following code below require use of std::launder? If so, under which case in the spec does it fall to require that?
A struct with reference member, inspired by this discussion:
struct A {
constexpr A(int &x) : ref(x) {}
int &ref;
};
int main() {
int n1 = 1, n2 = 2;
A a { n1 };
a.~A();
new (&a) A {n2};
a.ref = 3; // do we need to launder somebody here?
std::cout << a.ref << ' ' << n1 << ' ' << n2 << std::endl;
}
Before C++17, a pointer with a given address and type always pointed to an object of that type located at that address, provided that the code respects the rules of [basic.life]. (see: Is a pointer with the right address and type still always a valid pointer since C++17?).
But in the C++17 standard added a new quality to a pointer value. This quality is not encode within the pointer type but qualifies directly the value, independently of the type (this is the case also of the traceability). It is described in [basic.compound]/3
Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value for that type, or
an invalid pointer value.
This quality of a pointer value has its own semantic (transition rules), and for the case of reinterpret_cast it is described in the next paragraph:
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast.
In [basic-life], we can find an other rule that describes how transitions this quality when an object storage is reused:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, [...]
As you can see the quality "pointer to an object" is attached to a specific object.
That means that in the variation bellow of the first example you give, the reinterpret_cast does not allow us not to use the pointer optimization barrier:
struct X { int n; };
const X *p = new const X{3};
const int a = p->n;
new (const_cast<X*>(p)) const X{5}; // p does not point to new object ([basic.life]) because its type is const
const int b = *reinterpret_cast <int*> (p); // undefined behavior
const int c = *std::launder(reinterpret_cast <int*> (p));
A reinterpret_cast is not a pointer optimization barrier: reinterpret_cast <int*>(p) points to the member of the destroyed object.
An other way to conceive it is that the "pointer to" quality is conserved by reinterpret_cast as long as the object are pointer inter-convertible or if its casted to void and then back to a pointer inter-convertible type. (See [exp.static_cast]/13). So reinterpret_cast <int*>(reinterpret_cast <void*>(p)) still points to the destroyed object.
For the last example you gives, the name a refers to a non const complete object, so the original a is transparently replaceable by the new object.
For the first question you ask: "Are there any cases in which the addition of p0593r6 into C++20 (§ 6.7.2.11 Object model [intro.object]) made std::launder not necessary, where the same use case in C++17 required std::launder, or are they completely orthogonal?"
Honestly, I have not been able to find any cases that where std::launder could compensate implict-lifetime objects. But I found an example were implicit-lifetime object makes std::launder usefull:
class my_buffer {
alignas(int) std::byte buffer [2*sizeof(int)];
int * begin(){
//implictly created array of int inside the buffer
//nevertheless to get a pointer to this array,
//std::launder is necessary as the buffer is not
//pointer inconvertible with that array
return *std::launder (reinterpret_cast <int(*)[2]>(&buffer));
}
create_int(std::size_t index, int value){
new (begin()+index) auto{value};
}
};
Related
I'm still struggling to understand what's allowed and not allowed with strict aliasing. With this concrete example is it violation of strict aliasing rule? If not, why? Is it because I placement new a different type into a char* buffer?
template <typename T>
struct Foo
{
struct ControlBlock { unsigned long long numReferences; };
Foo()
{
char* buffer = new char[sizeof(T) + sizeof(ControlBlock)];
// Construct control block
new (buffer) ControlBlock{};
// Construct the T after the control block
this->ptr = buffer + sizeof(ControlBlock);
new (this->ptr) T{};
}
char* ptr;
T* get() {
// Here I cast the char* to T*.
// Is this OK because T* can alias char* or because
// I placement newed a T at char*
return (T*)ptr;
}
};
For the record, a void* can alias any other type pointer, and any type pointer can alias a void*. A char* can alias any type pointer, but is the reverse true? Can any type alias a char* assuming the alignment is correct? So is the following allowed?
char* buffer = (char*)malloc(16);
float* pFloat = buffer;
*pFloat = 6; // Can any type pointer alias a char pointer?
// If the above is illegal, then how about:
new (pFloat) float; // Placement new construct a float at pointer
*pFloat = 7; // What about now?
Once I've assigned char* buffer pointer to the new allocation, in order to use it as a float buffer do I need to loop through and placement new a float at each place? If I had not assigned the allocation to a char* in the first place, but a float* to begin with, I'd be able to use it immediately as a float buffer, right?
Strict aliasing means that to dereference a T* ptr, there must be a T object at that address, alive obviously. Effectively this means you cannot naively bit-cast between two incompatible types and also that a compiler can assume that no two pointers of incompatible types point to the same location.
The exception is unsigned char , char and std::byte, meaning you can reinterpret cast any object pointer to a pointer of these 3 types and dereference it.
(T*)ptr; is valid because at ptr there exists a T object. That is all that is required, it does not matter how you got that pointer*, through how many casts it went. There are some more requirements when T has constant members but that has to do more with placement new and object resurrection - see this answer if you are interested.
*It does matter even in case of no const members, probably, not sure, relevant question . #eerorika 's answer is more correct to suggest std::launder or assigning from the placement new expression.
For the record, a void* can alias any other type pointer, and any type pointer can alias a void*.
That is not true, void is not one of the three allowed types. But I assume you are just misinterpreting the word "alias" - strict aliasing only applies when a pointer is dereferenced, you are of course free to have as many pointers pointing to wherever you want as long as you do not dereference them. Since void* cannot be dereferenced, it's a moo point.
Addresing your second example
char* buffer = (char*)malloc(16); //OK
// Assigning pointers is always defined the rules only say when
// it is safe to dereference such pointer.
// You are missing a cast here, pointer cannot be casted implicitly in C++, C produces a warning only.
float* pFloat = buffer;
// -> float* pFloat =reinterpret_cast<float*>(buffer);
// NOT OK, there is no float at `buffer` - violates strict aliasing.
*pFloat = 6;
// Now there is a float
new (pFloat) float;
// Yes, now it is OK.
*pFloat = 7;
Is this strict aliasing violation?
Yes.
Can any type pointer alias a char pointer?
No.
You can launder the pointer:
T* get() {
return std::launder(reinterpret_cast<T*>(ptr)); // OK
}
Or, you could store the result of the placement new:
Foo()
{
...
this->ptr = new (buffer + sizeof(ControlBlock)) T{};
}
T* ptr;
T* get() {
return ptr; // OK
}
do I need to loop through and placement new a float at each place
Not since the proposal P0593R6 was accepted into the language (C++20). Prior to that, placement-new was required by the standard. You don't necessarily have to write that loop yourself since there are function templates for that in the standard library: std::uninitialized_fill_n, uninitialized_default_construct_n etc. Also, you can rest assured that a decent optimiser will compile such loop to zero instructions.
constexpr std::size_t N = 4;
float* pFloat = static_cast<float*>(malloc(N * sizeof(float)));
// OK since P0593R6, C++20
pFloat[0] = 6;
// OK prior to P0593R6, C++20 (to the extent it can be OK)
std::uninitialized_default_construct_n(pFloat, N);
pFloat[0] = 7;
// don't forget
free(pFloat);
P.S. Don't use std::malloc in C++, unless you need it for interacting with C API that requires it (which is a somewhat rare requirement even in C). I also recommend against reusal of new char[] buffer as it is unnecessary for the demonstrated purpose. Instead, use the operator ::new which allocates storage without creating objects (even trivial ones). Or even better, since you already have a template, let the user of the template provide an allocator of their own to make your template more generally useful.
This is a code example from the C++20 spec ([basic.life]/8):
struct C {
int i;
void f();
const C& operator=( const C& );
};
const C& C::operator=( const C& other) {
if ( this != &other ) {
this->~C(); // lifetime of *this ends
new (this) C(other); // new object of type C created
f(); // well-defined
}
return *this;
}
int main() {
C c1;
C c2;
c1 = c2; // well-defined
c1.f(); // well-defined; c1 refers to a new object of type C
}
Would the following be legal or undefined behavior:
struct C {
int& i; // <= the field is now a reference
void foo(const C& other) {
if ( this != &other ) {
this->~C();
new (this) C(other);
}
}
};
int main() {
int i = 3, j = 5;
C c1 {.i = i};
std::cout << c1.i << std::endl;
C c2 {.i = j};
c1.foo(c2);
std::cout << c1.i << std::endl;
}
In case it is illegal, would std::launder make it legal? where should it be added?
Note: p0532r0 (page 5) uses launder for a similar case.
In case it is legal, how can it work without "Pointer optimization barrier" (i.e. std::launder)? how do we avoid the compiler from caching the value of c1.i?
The question relates to an old ISO thread regarding Implementability of std::optional.
The question applies also, quite similarly, to a constant field (i.e. if above i in struct C is: const int i).
EDIT
It seems, as #Language Lawyer points out in an answer below, that the rules have been changed in C++20, in response to RU007/US042 NB comments.
C++17 Specifications [ptr.launder] (§ 21.6.4.4): --emphasis mine--
[ Note: If a new object is created in storage occupied by an existing
object of the same type, a pointer to the original object can be used
to refer to the new object unless the type contains const or reference
members; in the latter cases, this function can be used to obtain a
usable pointer to the new object. ...— end note ]
C++17 [ptr.launder] code example in the spec (§ 21.6.4.5):
struct X { const int n; };
X *p = new X{3};
const int a = p->n;
new (p) X{5}; // p does not point to new object (6.8) because X::n is const
const int b = p->n; // undefined behavior
const int c = std::launder(p)->n; // OK
C++20 [ptr.launder] Specifications (§ 17.6.4.5):
[ Note: If a new object is created in storage occupied by an existing
object of the same type, a pointer to the original object can be used
to refer to the new object unless its complete object is a const
object or it is a base class subobject; in the latter cases, this
function can be used to obtain a usable pointer to the new object.
...— end note ]
Note that the part:
unless the type contains const or reference members;
that appeared in C++17 was removed in C++20, and the example was changed accordingly.
C++20 [ptr.launder] code example in the spec (§ 17.6.4.6):
struct X { int n; };
const X *p = new const X{3};
const int a = p->n;
new (const_cast<X*>(p)) const X{5}; // p does not point to new object ([basic.life])
// because its type is const
const int b = p->n; // undefined behavior
const int c = std::launder(p)->n; // OK
Thus, apparently the code in question is legal in C++20 as is, while with C++17 it requires using std::launder when accessing the new object.
Open Questions:
What is the case of such code in C++14 or before (when std::launder didn't exist)? Probably it is UB - this is why std::launder was brought to the game, right?
If in C++20 we do not need std::launder for such a case, how the compiler can understand that the reference is being manipulated without our help (i.e. without "Pointer optimization barrier") to avoid caching of the reference value?
Similar questions here, here, here and here got contradicting answers, some see that as a valid syntax but advise to rewrite it. I'm focusing on the validity of the syntax and the need (yes or no) for std::launder, in the different C++ versions.
It is legal to replace objects with const-qualified and reference non-static data members. And now, in C++20, [the name of|a [pointer|reference] to] the original object will refer to the new object after replacement. The rules has been changed in response to RU007/US042 NB comments http://wg21.link/p1971r0#RU007:
RU007. [basic.life].8.3 Relax pointer value/aliasing rules
...
Change 6.7.3 [basic.life] bullet 8.3 as follows:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
...
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type neither a complete object that is const-qualified nor a subobject of such an object, and
...
To answer the currently open questions:
First question:
What is the case of such code in C++14 or before (when std::launder didn't exist)? Probably it is UB - this is why std::launder was brought to the game, right?
Yes, it was UB. This is mentioned explicitly in the NB issues #Language Lawyer referred to:
Because of that issue all the standard libraries have undefined behaviors in widely used types. The only way to fix that issue is to adjust the lifetime rules to auto-launder the placement new.
(https://github.com/cplusplus/nbballot/issues/7)
Second question:
If in C++20 we do not need std::launder for such a case, how the compiler can understand that the reference is being manipulated without our help (i.e. without "Pointer optimization barrier") to avoid caching of the reference value?
Compilers already know to not optimize object (or sub-object) value this way if a non-const member function was called between two usages of the object or if any function was called with the object as a parameter (passed by-ref), because this value may be changed by those functions. This change to the standard just added a few more cases where such optimization is illegal.
I was reading about strict aliasing, but its still kinda foggy and I am never sure where is the line of defined / undefined behaviour. The most detailed post i found concentrates on C. So it would be nice if you could tell me if this is allowed and what has changed since C++98/11/...
#include <iostream>
#include <cstring>
template <typename T> T transform(T t);
struct my_buffer {
char data[128];
unsigned pos;
my_buffer() : pos(0) {}
void rewind() { pos = 0; }
template <typename T> void push_via_pointer_cast(const T& t) {
*reinterpret_cast<T*>(&data[pos]) = transform(t);
pos += sizeof(T);
}
template <typename T> void pop_via_pointer_cast(T& t) {
t = transform( *reinterpret_cast<T*>(&data[pos]) );
pos += sizeof(T);
}
};
// actually do some real transformation here (and actually also needs an inverse)
// ie this restricts allowed types for T
template<> int transform<int>(int x) { return x; }
template<> double transform<double>(double x) { return x; }
int main() {
my_buffer b;
b.push_via_pointer_cast(1);
b.push_via_pointer_cast(2.0);
b.rewind();
int x;
double y;
b.pop_via_pointer_cast(x);
b.pop_via_pointer_cast(y);
std::cout << x << " " << y << '\n';
}
Please dont pay too much attention to a possible out-of-bounds access and the fact that maybe there is no need to write something like that. I know that char* is allowed to point to anything, but I also have a T* that points to a char*. And maybe there is something else I am missing.
Here is a complete example also including push/pop via memcpy, which afaik isn't affected by strict aliasing.
TL;DR: Does the above code exhibit undefined behaviour (neglecting a out-of-bound acces for the moment), if yes, why? Did anything change with C++11 or one of the newer standards?
Aliasing is a situation when two entities refer to the same object. It may be either references or pointers.
int x;
int* p = &x;
int& r = x;
// aliases: x, r и *p refer to same object.
It's important for compiler to expect that if a value was written using one name it would be accessible through another.
int foo(int* a, int* b) {
*a = 0;
*b = 1;
return *a;
// *a might be 0, might be 1, if b points at same object.
// Compiler can't short-circuit this to "return 0;"
}
Now if pointers are of unrelated types, there is no reason for compiler to expect that they point at same address. This is the simplest UB:
int foo( float *f, int *i ) {
*i = 1;
*f = 0.f;
return *i;
}
int main() {
int a = 0;
std::cout << a << std::endl;
int x = foo(reinterpret_cast<float*>(&a), &a);
std::cout << a << "\n";
std::cout << x << "\n"; // Surprise?
}
// Output 0 0 0 or 0 0 1 , depending on optimization.
Simply put, strict aliasing means that compiler expects names of unrelated types refer to object of different type, thus located in separate storage units. Because addresses used to access those storage units are de-facto same, result of accessing stored value is undefined and usually depends on optimization flags.
memcpy() circumvents that by taking the address, by pointer to char, and makes copy of data stored, within code of library function.
Strict aliasing applies to union members, which described separately, but reason is same: writing to one member of union doesn't guarantee the values of other members to change. That doesn't apply to shared fields in beginning of struct stored within union. Thus, type punning by union is prohibited. (Most compilers do not honor this for historical reasons and convenience of maintaining legacy code.)
From 2017 Standard: 6.10 Lvalues and rvalues
8 If a program attempts to access the stored value of an object
through a glvalue of other than one of the following types the
behavior is undefined
(8.1) — the dynamic type of the object,
(8.2) — a cv-qualified version of the dynamic type of the object,
(8.3) — a type similar (as defined in 7.5) to the dynamic type of the
object,
(8.4) — a type that is the signed or unsigned type corresponding to
the dynamic type of the object,
(8.5) — a type that is the signed or unsigned type corresponding to a
cv-qualified version of the dynamic type of the object,
(8.6) — an aggregate or union type that includes one of the
aforementioned types among its elements or nonstatic data members
(including, recursively, an element or non-static data member of a
subaggregate or contained union),
(8.7) — a type that is a (possibly cv-qualified) base class type of
the dynamic type of the object,
(8.8) — a char, unsigned char, or std::byte type.
In 7.5
1 A cv-decomposition of a type T is a sequence of cvi and Pi such that T is “cv0 P0 cv1 P1 · · · cvn−1 Pn−1 cvn U” for n > 0, where each
cvi is a set of cv-qualifiers (6.9.3), and each Pi is “pointer to”
(11.3.1), “pointer to member of class Ci of type” (11.3.3), “array of
Ni”, or “array of unknown bound of” (11.3.4). If Pi designates an
array, the cv-qualifiers cvi+1 on the element type are also taken as
the cv-qualifiers cvi of the array. [ Example: The type denoted by the
type-id const int ** has two cv-decompositions, taking U as “int” and
as “pointer to const int”. —end example ] The n-tuple of cv-qualifiers
after the first one in the longest cv-decomposition of T, that is,
cv1, cv2, . . . , cvn, is called the cv-qualification signature of T.
2 Two types T1 and T2 are similar if they have cv-decompositions with
the same n such that corresponding Pi components are the same and the
types denoted by U are the same.
Outcome is: while you can reinterpret_cast the pointer to a different, unrelated and not similar type, you can't use that pointer to access stored value:
char* pc = new char[100]{1,2,3,4,5,6,7,8,9,10}; // Note, initialized.
int* pi = reinterpret_cast<int*>(pc); // no problem.
int i = *pi; // UB
char* pc2 = reinterpret_cast<char*>(pi+2); // *(pi+2) would be UB
char c = *pc2; // no problem, unless increment didn't put us beyond array bound.
// c equals to 9
'reinterpret_cast' doesn't create objects. To dereference a pointer at a non-existing object is Undefined Behaviour, so you can't use dereferenced result of cast for writing if class it points to wasn't trivial.
I know that char* is allowed to point to anything, but I also have a T* that points to a char*.
Right, and that is a problem. While the pointer cast itself has defined behaviour, using it to access a non-existent object of type T is not.
Unlike C, C++ does not allow impromptu creation of objects*. You cannot simply assign to some memory location as type T and have an object of that type be created, you need an object of that type to be there already. This requires placement new. Previous standards were ambiguous on it, but currently, per [intro.object]:
1 [...] An object is created by a definition (6.1), by a new-expression (8.3.4), when implicitly changing the active member of a union (12.3), or when a temporary object is created (7.4, 15.2). [...]
Since you are not doing any of these things, no object is created.
Furthermore, C++ does not implicitly consider pointers to different object at the same address as equivalent. Your &data[pos] computes a pointer to a char object. Casting it to T* does not make it point to any T object residing at that address, and dereferencing that pointer has undefined behaviour. C++17 adds std::launder, which is a way to let the compiler know that you want to access a different object at that address than what you have a pointer to.
When you modify your code to use placement new and std::launder, and ensure you have no misaligned accesses (I presume you left that out for brevity), your code will have defined behaviour.
* There is discussion on allowing this in a future version of C++.
Short answer:
You may not do this: *reinterpret_cast<T*>(&data[pos]) = until there has been an object of type T constructed at the pointed-to address. Which you can accomplish by placement new.
Even then, you might need to use std::launder as for C++17 and later, since you access the created object (of type T) through a pointer &data[pos] of type char*.
"Direct" reinterpret_cast is allowed only in some special cases, e.g., when T is std::byte, char, or unsigned char.
Before C++17 I would use the memcpy-based solution. Compiler will likely optimize away any unnecessary copies.
For example:
I could make a constant pointer, which points to an object that I can change through my pointer. The pointer cannot be reassigned:
MyObj const *ptrObj = MyObj2
Why would I use this over:
MyObj &ptrObj = MyObj2
What you have there isn't a const pointer, it's a pointer to a const object - that is, the pointer can be changed but the object can't. A const pointer would be:
MyObj *const ptrObj = &MyObj2;
As to why you might prefer it over a reference, you might want the flexibility of using the NULL special value for something - you don't get that with a reference.
You got it wrong. What you have is a mutable pointer to a constant object:
T const * p;
p = 0; // OK, p isn't const
p->mutate(); // Error! *p is const
T const & r = *p; // "same thing"
What you really want is a constant pointer to mutable object:
T * const p = &x; // OK, cannot change p
T & r = x; // "same thing"
p->mutate(); // OK, *p is mutable
Indeed, references are morally equivalent to constant pointers, i.e. T & vs T * const, and the constant version T const & vs T const * const.
If you insist on getting some advice, then I'd say, "don't use pointers".
The important difference between a pointer and a reference is how many objects they may refer to. A reference always refers to exactly one object. A pointer may refer to zero (when the pointer is null), one (when the pointer was assigned the location of a single object) or n objects (when the pointer was assigned to some point inside an array).
The ability of pointers to refer to 0 to n objects means that a pointer is more flexible in what it can represent. When the extra flexibility of a pointer is not necessary it is generally better to use a reference. That way someone reading your code doesn't have to work out whether the pointer refers to zero, one or n objects.
I have a fairly good understanding of the dereferencing operator, the address of operator, and pointers in general.
I however get confused when I see stuff such as this:
int* returnA() {
int *j = &a;
return j;
}
int* returnB() {
return &b;
}
int& returnC() {
return c;
}
int& returnC2() {
int *d = &c;
return *d;
}
In returnA() I'm asking to return a pointer; just to clarify this works because j is a pointer?
In returnB() I'm asking to return a pointer; since a pointer points to an address, the reason why returnB() works is because I'm returning &b?
In returnC() I'm asking for an address of int to be returned. When I return c is the & operator automatically "appended" c?
In returnC2() I'm asking again for an address of int to be returned. Does *d work because pointers point to an address?
Assume a, b, c are initialized as integers as Global.
Can someone validate if I am correct with all four of my questions?
Although Peter answered your question, one thing that's clearly confusing you is the symbols * and &. The tough part about getting your head around these is that they both have two different meanings that have to do with indirection (even excluding the third meanings of * for multiplication and & for bitwise-and).
*, when used as part of a type
indicates that the type is a pointer:
int is a type, so int* is a
pointer-to-int type, and int** is a
pointer-to-pointer-to-int type.
& when used as part of a type indicates that the type is a reference. int is a type, so int& is a reference-to-int (there is no such thing as reference-to-reference). References and pointers are used for similar things, but they are quite different and not interchangable. A reference is best thought of as an alias, or alternate name, for an existing variable. If x is an int, then you can simply assign int& y = x to create a new name y for x. Afterwords, x and y can be used interchangeably to refer to the same integer. The two main implications of this are that references cannot be NULL (since there must be an original variable to reference), and that you don't need to use any special operator to get at the original value (because it's just an alternate name, not a pointer). References can also not be reassigned.
* when used as a unary operator performs an operation called dereference (which has nothing to do with reference types!). This operation is only meaningful on pointers. When you dereference a pointer, you get back what it points to. So, if p is a pointer-to-int, *p is the int being pointed to.
& when used as a unary operator performs an operation called address-of. That's pretty self-explanatory; if x is a variable, then &x is the address of x. The address of a variable can be assigned to a pointer to the type of that variable. So, if x is an int, then &x can be assigned to a pointer of type int*, and that pointer will point to x. E.g. if you assign int* p = &x, then *p can be used to retrieve the value of x.
So remember, the type suffix & is for references, and has nothing to do with the unary operatory &, which has to do with getting addresses for use with pointers. The two uses are completely unrelated. And * as a type suffix declares a pointer, while * as a unary operator performs an action on pointers.
In returnA() I'm asking to return a pointer; just to clarify this works because j is a pointer?
Yes, int *j = &a initializes j to point to a. Then you return the value of j, that is the address of a.
In returnB() I'm asking to return a pointer; since a pointer points to an address, the reason why returnB() works is because I'm returning &b?
Yes. Here the same thing happens as above, just in a single step. &b gives the address of b.
In returnC() I'm asking for an address of int to be returned. When I return c is the & operator automatically appended?
No, it is a reference to an int which is returned. A reference is not an address the same way as a pointer is - it is just an alternative name for a variable. Therefore you don't need to apply the & operator to get a reference of a variable.
In returnC2() I'm asking again for an address of int to be returned. Does *d work because pointers point to an address?
Again, it is a reference to an int which is returned. *d refers to the original variable c (whatever that may be), pointed to by c. And this can implicitly be turned into a reference, just as in returnC.
Pointers do not in general point to an address (although they can - e.g. int** is a pointer to pointer to int). Pointers are an address of something. When you declare the pointer like something*, that something is the thing your pointer points to. So in my above example, int** declares a pointer to an int*, which happens to be a pointer itself.
Tyler, that was very helpful explanation, I did some experiment using visual studio debugger to clarify this difference even further:-
int sample = 90;
int& alias = sample;
int* pointerToSample = &sample;
Name Address Type
&alias 0x0112fc1c {90} int *
&sample 0x0112fc1c {90} int *
pointerToSample 0x0112fc1c {90} int *
*pointerToSample 90 int
alias 90 int &
&pointerToSample 0x0112fc04 {0x0112fc1c {90}} int * *
Memory Layout
PointerToSample Sample/alias
_______________......____________________
0x0112fc1c | | 90 |
___________|___.....__|________|_______...
[0x0112fc04] ... [0x0112fc1c
In returnC() and returnC2() you are not asking to return the address.
Both these functions return references to objects.
A reference is not the address of anything it is an alternative name of something (this may mean the compiler may (or may not depending on situation) use an address to represent the object (alternatively it may also know to keep it in register)).
All you know that a reference points at a specific object.
While a reference itself is not an object just an alternative name.
All of your examples produce undefined run-time behavior. You are returning pointers or references to items that disappear after execution leaves the function.
Let me clarify:
int * returnA()
{
static int a; // The static keyword keeps the variable from disappearing.
int * j = 0; // Declare a pointer to an int and initialize to location 0.
j = &a; // j now points to a.
return j; // return the location of the static variable (evil).
}
In your function, the variable j is assigned to point to a's temporary location. Upon exit of your function the variable a disappears, but it's former location is returned via j. Since a no longer exists at the location pointed to by j, undefined behavior will happen with accessing *j.
Variables inside functions should not be modified via reference or pointer by other code. It can happen although it produces undefined behavior.
Being pedantic, the pointers returned should be declared as pointing to constant data. The references returned should be const:
const char * Hello()
{
static const char text[] = "Hello";
return text;
}
The above function returns a pointer to constant data. Other code can access (read) the static data but cannot be modified.
const unsigned int& Counter()
{
static unsigned int value = 0;
value = value + 1;
return value;
}
In the above function, the value is initialized to zero on the first entry. All next executions of this function cause value to be incremented by one. The function returns a reference to a constant value. This means that other functions can use the value (from afar) as if it was a variable (without having to dereference a pointer).
In my thinking, a pointer is used for an optional parameter or object. A reference is passed when the object must exist. Inside the function, a referenced parameter means that the value exists, however a pointer must be checked for null before dereferencing it. Also, with a reference, there is more guarantee that the target object is valid. A pointer could point to an invalid address (not null) and cause undefined behavior.
Semantically, references do act as addresses. However, syntactically, they are the compiler's job, not yours, and you can treat a reference as if it is the original object it points to, including binding other references to it and having them refer to the original object too. Say goodbye to pointer arithmetic in this case.
The downside of that is that you can't modify what they refer to - they are bound at construct time.