How can a reference require no storage? - c++

From this question, and consequently, from the Standard (ISO C++-03):
It is unspecified whether or not a reference requires storage (3.7).
In some answers in that thread, it's said that references have, internally, the same structure of a pointer, thus, having the same size of it (32/64 bits).
What I'm struggling to grasp is: how would a reference come not to require storage?
Any sample code exemplifying this would be greatly appreciated.
Edit:
From #JohannesSchaub-litb comment, is there anything like, if I'm not using a const &, or if I'm using a const & with default value, it requires allocation? It seems to me, somehow, that there should be no allocations for references at all -- except, of course, when there are explicit allocations involved, like:
A& new_reference(*(new A())); // Only A() instance would be allocated,
// not the new_reference itself
Is there any case like this?

Take something simple:
int foo() {
int x = 5;
int& r = x;
r = 10;
return x;
}
The implementation may use a pointer to x behind the scenes to implement that reference, but there's no reason it has to. It could just as well translate the code to the equivalent form of:
int foo() {
int x = 10
return x;
}
Then no pointers are needed whatsoever. The compiler can just bake it right into the executable that r is the same as x, without storing and dereferencing a pointer that points at x.
The point is, whether the reference requires any storage is an implementation detail that you shouldn't need to care about.

I believe the key point to understanding is that reference types are not object types.
An object type is a (possibly cv-qualified) type that is not a function type, not a reference type, and not a
void type (§3.9[basic.types]/8)
Objects require storage ("An object is a region of storage." -- §1.8[intro.object]/1)
Moreover, C++ programs operate on objects: "The constructs in a C++ program create, destroy, refer to, access, and manipulate objects." -- same paragraph
So, when the compiler encounters a reference in the program, it is up to the compiler whether it has to synthesize an object (typically of a pointer type), and, therefore, use some storage, or find some other way to implement the desired semantics in terms of object model (which may involve no storage).

Related

When is it safe to re-use memory from a trivially destructible object without laundering

Regarding the following code:
class One {
public:
double number{};
};
class Two {
public:
int integer{};
}
class Mixture {
public:
double& foo() {
new (&storage) One{1.0};
return reinterpret_cast<One*>(&storage)->number;
}
int& bar() {
new (&storage) Two{2};
return reinterpret_cast<Two*>(&storage)->integer;
}
std::aligned_storage_t<8> storage;
};
int main() {
auto mixture = Mixture{};
cout << mixture.foo() << endl;
cout << mixture.bar() << endl;
}
I haven't called the destructor for the types because they are trivially destructible. My understanding of the standard is that for this to be safe, we would need to launder the pointer to storage before passing it to the reinterpret_cast. However, std::optional's implementation in libstdc++ does not seem to use std::launder() and simply constructs the object right into the union storage. https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/optional.
Is my example above well defined behavior? What do I need to do to make it work? Would a union make this work?
In your code, you do need std::launder in order to make your reinterpret_cast do what you want it to do. This is a separate issue from that of re-using memory. According to the standard ([expr.reinterpret].cast]7), your expression
reinterpret_cast<One*>(&storage)
is equivalent to:
static_cast<One*>(static_cast<void*>(&storage))
However, the outer static_cast does not succeed in producing a pointer to the newly created One object because according to [expr.static.cast]/13,
if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible (6.9.2)
with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.
That is, the resulting pointer still points to the storage object, not to the One object nested within it, and using it as a pointer to a One object would violate the strict aliasing rule. You must use std::launder to force the resulting pointer to point to the One object. Or, as pointed out in the comments, you could simply use the pointer returned by placement new directly, rather than the one obtained from reinterpret_cast.
If, as suggested in the comments, you used a union instead of aligned_storage,
union {
One one;
Two two;
};
you would sidestep the pointer-interconvertibility issue, so std::launder would not be needed on account of non-pointer-interconvertibility. However, there is still the issue of re-use of memory. In this particular case, std::launder is not needed on account of re-use because your One and Two classes do not contain any non-static data members of const-qualified or reference type ([basic.life]/8).
Finally, there was the question of why libstdc++'s implementation of std::optional does not use std::launder, even though std::optional may contain classes that contain non-static data members of const-qualified or reference type. As pointed out in comments, libstdc++ is part of the implementation, and may simply elide std::launder when the implementers know that GCC will still compile the code properly without it. The discussion that led up to the introduction of std::launder (see CWG 1776 and the linked thread, N4303, P0137) seems to indicate that, in the opinion of people who understand the standard much better than I do, std::launder is indeed required in order to make the union-based implementation of std::optional well-defined in the presence of members of const-qualified or reference type. However, I am not sure that the standard text is clear enough to make this obvious, and it might be worth having a discussion about how it might be clarified.

Can Aliasing Problems be Avoided with const Variables

My company uses a messaging server which gets a message into a const char* and then casts it to the message type.
I've become concerned about this after asking this question. I'm not aware of any bad behavior in the messaging server. Is it possible that const variables do not incur aliasing problems?
For example say that foo is defined in MessageServer in one of these ways:
As a parameter: void MessageServer(const char* foo)
Or as const variable at the top of MessageServer: const char* foo = PopMessage();
Now MessageServer is a huge function, but it never assigns anything to foo, however at 1 point in MessageServer's logic foo will be cast to the selected message type.
auto bar = reinterpret_cast<const MessageJ*>(foo);
bar will only be read from subsequently, but will be used extensively for object setup.
Is an aliasing problem possible here, or does the fact that foo is only initialized, and never modified save me?
EDIT:
Jarod42's answer finds no problem with casting from a const char* to a MessageJ*, but I'm not sure this makes sense.
We know this is illegal:
MessageX* foo = new MessageX;
const auto bar = reinterpret_cast<MessageJ*>(foo);
Are we saying this somehow makes it legal?
MessageX* foo = new MessageX;
const auto temp = reinterpret_cast<char*>(foo);
auto bar = reinterpret_cast<const MessageJ*>(temp);
My understanding of Jarod42's answer is that the cast to temp makes it legal.
EDIT:
I've gotten some comments with relation to serialization, alignment, network passing, and so on. That's not what this question is about.
This is a question about strict aliasing.
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias eachother.)
What I'm asking is: Will the initialization of a const object, by casting from a char*, ever be optimized below where that object is cast to another type of object, such that I am casting from uninitialized data?
First of all, casting pointers does not cause any aliasing violations (although it might cause alignment violations).
Aliasing refers to the process of reading or writing an object through a glvalue of different type than the object.
If an object has type T, and we read/write it via a X& and a Y& then the questions are:
Can X alias T?
Can Y alias T?
It does not directly matter whether X can alias Y or vice versa, as you seem to focus on in your question. But, the compiler can infer if X and Y are completely incompatible that there is no such type T that can be aliased by both X and Y, therefore it can assume that the two references refer to different objects.
So, to answer your question, it all hinges on what PopMessage does. If the code is something like:
const char *PopMessage()
{
static MessageJ foo = .....;
return reinterpret_cast<const char *>(&foo);
}
then it is fine to write:
const char *ptr = PopMessage();
auto bar = reinterpret_cast<const MessageJ*>(foo);
auto baz = *bar; // OK, accessing a `MessageJ` via glvalue of type `MessageJ`
auto ch = ptr[4]; // OK, accessing a `MessageJ` via glvalue of type `char`
and so on. The const has nothing to do with it. In fact if you did not use const here (or you cast it away) then you could also write through bar and ptr with no problem.
On the other hand, if PopMessage was something like:
const char *PopMessage()
{
static char buf[200];
return buf;
}
then the line auto baz = *bar; would cause UB because char cannot be aliased by MessageJ. Note that you can use placement-new to change the dynamic type of an object (in that case, char buf[200] is said to have stopped existing, and the new object created by placement-new exists and its type is T).
My company uses a messaging server which gets a message into a const char* and then casts it to the message type.
So long as you mean that it does a reinterpret_cast (or a C-style cast that devolves to a reinterpret_cast):
MessageJ *j = new MessageJ();
MessageServer(reinterpret_cast<char*>(j));
// or PushMessage(reinterpret_cast<char*>(j));
and later takes that same pointer and reinterpret_cast's it back to the actual underlying type, then that process is completely legitimate:
MessageServer(char *foo)
{
if (somehow figure out that foo is actually a MessageJ*)
{
MessageJ *bar = reinterpret_cast<MessageJ*>(foo);
// operate on bar
}
}
// or
MessageServer()
{
char *foo = PopMessage();
if (somehow figure out that foo is actually a MessageJ*)
{
MessageJ *bar = reinterpret_cast<MessageJ*>(foo);
// operate on bar
}
}
Note that I specifically dropped the const's from your examples as their presence or absence doesn't matter. The above is legitimate when the underlying object that foo points at actually is a MessageJ, otherwise it is undefined behavior. The reinterpret_cast'ing to char* and back again yields the original typed pointer. Indeed, you could reinterpret_cast to a pointer of any type and back again and get the original typed pointer. From this reference:
Only the following conversions can be done with reinterpret_cast ...
6) An lvalue expression of type T1 can be converted to reference to another type T2. The result is an lvalue or xvalue referring to the same object as the original lvalue, but with a different type. No temporary is created, no copy is made, no constructors or conversion functions are called. The resulting reference can only be accessed safely if allowed by the type aliasing rules (see below) ...
Type aliasing
When a pointer or reference to object of type T1 is reinterpret_cast (or C-style cast) to a pointer or reference to object of a different type T2, the cast always succeeds, but the resulting pointer or reference may only be accessed if both T1 and T2 are standard-layout types and one of the following is true:
T2 is the (possibly cv-qualified) dynamic type of the object ...
Effectively, reinterpret_cast'ing between pointers of different types simply instructs the compiler to reinterpret the pointer as pointing at a different type. More importantly for your example though, round-tripping back to the original type again and then operating on it is safe. That is because all you've done is instructed the compiler to reinterpret a pointer as pointing at a different type and then told the compiler again to reinterpret that same pointer as pointing back at the original, underlying type.
So, the round trip conversion of your pointers is legitimate, but what about potential aliasing problems?
Is an aliasing problem possible here, or does the fact that foo is only initialized, and never modified save me?
The strict aliasing rule allows compilers to assume that references (and pointers) to unrelated types do not refer to the same underlying memory. This assumption allows lots of optimizations because it decouples operations on unrelated reference types as being completely independent.
#include <iostream>
int foo(int *x, long *y)
{
// foo can assume that x and y do not alias the same memory because they have unrelated types
// so it is free to reorder the operations on *x and *y as it sees fit
// and it need not worry that modifying one could affect the other
*x = -1;
*y = 0;
return *x;
}
int main()
{
long a;
int b = foo(reinterpret_cast<int*>(&a), &a); // violates strict aliasing rule
// the above call has UB because it both writes and reads a through an unrelated pointer type
// on return b might be either 0 or -1; a could similarly be arbitrary
// technically, the program could do anything because it's UB
std::cout << b << ' ' << a << std::endl;
return 0;
}
In this example, thanks to the strict aliasing rule, the compiler can assume in foo that setting *y cannot affect the value of *x. So, it can decide to just return -1 as a constant, for example. Without the strict aliasing rule, the compiler would have to assume that altering *y might actually change the value of *x. Therefore, it would have to enforce the given order of operations and reload *x after setting *y. In this example it might seem reasonable enough to enforce such paranoia, but in less trivial code doing so will greatly constrain reordering and elimination of operations and force the compiler to reload values much more often.
Here are the results on my machine when I compile the above program differently (Apple LLVM v6.0 for x86_64-apple-darwin14.1.0):
$ g++ -Wall test58.cc
$ ./a.out
0 0
$ g++ -Wall -O3 test58.cc
$ ./a.out
-1 0
In your first example, foo is a const char * and bar is a const MessageJ * reinterpret_cast'ed from foo. You further stipulate that the object's underlying type actually is a MessageJ and that no reads are done through the const char *. Instead, it is only casted to the const MessageJ * from which only reads are then done. Since you do not read nor write through the const char * alias, then there can be no aliasing optimization problem with your accesses through your second alias in the first place. This is because there are no potentially conflicting operations performed on the underlying memory through your aliases of unrelated types. However, even if you did read through foo, then there could still be no potential problem as such accesses are allowed by the type aliasing rules (see below) and any ordering of reads through foo or bar would yield the same results because there are no writes occurring here.
Let us now drop the const qualifiers from your example and presume that MessageServer does do some write operations on bar and furthermore that the function also reads through foo for some reason (e.g. - prints a hex dump of memory). Normally, there might be an aliasing problem here as we have reads and writes happening through two pointers to the same memory through unrelated types. However, in this specific example, we are saved by the fact that foo is a char*, which gets special treatment by the compiler:
Type aliasing
When a pointer or reference to object of type T1 is reinterpret_cast (or C-style cast) to a pointer or reference to object of a different type T2, the cast always succeeds, but the resulting pointer or reference may only be accessed if both T1 and T2 are standard-layout types and one of the following is true: ...
T2 is char or unsigned char
The strict-aliasing optimizations that are allowed for operations through references (or pointers) of unrelated types are specifically disallowed when a char reference (or pointer) is in play. The compiler instead must be paranoid that operations through the char reference (or pointer) can affect and be affected by operations done through other references (or pointers). In the modified example where reads and writes operate on both foo and bar, you can still have defined behavior because foo is a char*. Therefore, the compiler is not allowed to optimize to reorder or eliminate operations on your two aliases in ways that conflict with the serial execution of the code as written. Similarly, it is forced to be paranoid about reloading values that may have been affected by operations through either alias.
The answer to your question is that, so long as your functions are properly round tripping pointers to a type through a char* back to its original type, then your function is safe, even if you were to interleave reads (and potentially writes, see caveat at end of EDIT) through the char* alias with reads+writes through the underlying type alias.
These two technical references (3.10.10) are useful for answering your question. These other references help give a better understanding of the technical information.
====
EDIT: In the comments below, zmb objects that while char* can legitimately alias a different type, that the converse is not true as several sources seem to say in varying forms: that the char* exception to the strict aliasing rule is an asymmetric, "one-way" rule.
Let us modify my above strict-aliasing code example and ask would this new version similarly result in undefined behavior?
#include <iostream>
char foo(char *x, long *y)
{
// can foo assume that x and y cannot alias the same memory?
*x = -1;
*y = 0;
return *x;
}
int main()
{
long a;
char b = foo(reinterpret_cast<char*>(&a), &a); // explicitly allowed!
// if this is defined behavior then what must the values of b and a be?
std::cout << (int) b << ' ' << a << std::endl;
return 0;
}
I argue that this is defined behavior and that both a and b must be zero after the call to foo. From the C++ standard (3.10.10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:^52
the dynamic type of the object ...
a char or unsigned char type ...
^52: The intent of this list is to specify those circumstances in which an object may or may not be aliased.
In the above program, I am accessing the stored value of an object through both its actual type and a char type, so it is defined behavior and the results have to comport with the serial execution of the code as written.
Now, there is no general way for the compiler to always statically know in foo that the pointer x actually aliases y or not (e.g. - imagine if foo was defined in a library). Maybe the program could detect such aliasing at run time by examining the values of the pointers themselves or consulting RTTI, but the overhead this would incur wouldn't be worth it. Instead, the better way to generally compile foo and allow for defined behavior when x and y do happen to alias one another is to always assume that they could (i.e. - disable strict alias optimizations when a char* is in play).
Here's what happens when I compile and run the above program:
$ g++ -Wall test59.cc
$ ./a.out
0 0
$ g++ -O3 -Wall test59.cc
$ ./a.out
0 0
This output is at odds with the earlier, similar strict-aliasing program's. This is not dispositive proof that I'm right about the standard, but the different results from the same compiler provides decent evidence that I may be right (or, at least that one important compiler seems to understand the standard the same way).
Let's examine some of the seemingly conflicting sources:
The converse is not true. Casting a char* to a pointer of any type other than a char* and dereferencing it is usually in volation of the strict aliasing rule. In other words, casting from a pointer of one type to pointer of an unrelated type through a char* is undefined.
The bolded bit is why this quote doesn't apply to the problem addressed by my answer nor the example I just gave. In both my answer and the example, the aliased memory is being accessed both through a char* and the actual type of the object itself, which can be defined behavior.
Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of type char). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule."
Again, the bolded bit is why this statement doesn't apply to my answers. In this and similar counter-examples, an array of characters is being accessed through a pointer of an unrelated type. Even in C, this is UB because the character array might not be aligned according to the aliased type's requirements, for example. In C++, this is UB because such access does not meet any of the type aliasing rules as the underlying type of the object actually is char.
In my examples, we first have a valid pointer to a properly constructed type that is then aliased by a char* and then reads and writes through these two aliased pointers are interleaved, which can be defined behavior. So, there seems to be some confusion and conflation out there between the strict aliasing exception for char and not accessing an underlying object through an incompatible reference.
int value;
int *p = &value;
char *q = reinterpret_cast<char*>(&value);
Both p and p refer to the same address, they are aliasing the same memory. What the language does is provide a set of rules defining the behaviors that are guaranteed: write through p read through q fine, other way around not fine.
The standard and many examples clearly state that "write through q, then read through p (or value)" can be well defined behavior. What is not as abundantly clear, but what I'm arguing for here, is that "write through p (or value), then read through q" is always well defined. I claim even further, that "reads and writes through p (or value) can be arbitrarily interleaved with reads and writes to q" with well defined behavior.
Now there is one caveat to the previous statement and why I kept sprinkling the word "can" throughout the above text. If you have a type T reference and a char reference that alias the same memory, then arbitrarily interleaving reads+writes on the T reference with reads on the char reference is always well defined. For example, you might do this to repeatedly print out a hex dump of the underlying memory as you modify it multiple times through the T reference. The standard guarantees that strict aliasing optimizations will not be applied to these interleaved accesses, which otherwise might give you undefined behavior.
But what about writes through a char reference alias? Well, such writes may or may not be well defined. If a write through the char reference violates an invariant of the underlying T type, then you can get undefined behavior. If such a write improperly modified the value of a T member pointer, then you can get undefined behavior. If such a write modified a T member value to a trap value, then you can get undefined behavior. And so on. However, in other instances, writes through the char reference can be completely well defined. Rearranging the endianness of a uint32_t or uint64_t by reading+writing to them through an aliased char reference is always well defined, for example. So, whether such writes are completely well defined or not depends on the particulars of the writes themselves. Regardless, the standard guarantees that its strict aliasing optimizations will not reorder or eliminate such writes w.r.t. other operations on the aliased memory in a manner that itself could lead to undefined behavior.
So my understanding is that you are doing something like that:
enum MType { J,K };
struct MessageX { MType type; };
struct MessageJ {
MType type{ J };
int id{ 5 };
//some other members
};
const char* popMessage() {
return reinterpret_cast<char*>(new MessageJ());
}
void MessageServer(const char* foo) {
const MessageX* msgx = reinterpret_cast<const MessageX*>(foo);
switch (msgx->type) {
case J: {
const MessageJ* msgJ = reinterpret_cast<const MessageJ*>(foo);
std::cout << msgJ->id << std::endl;
}
}
}
int main() {
const char* foo = popMessage();
MessageServer(foo);
}
If that is correct, then the expression msgJ->id is ok (as would be any access to foo), as msgJ has the correct dynamic type. msgx->type on the other hand does incur UB, because msgx has a unrelated type. The fact that the the pointer to MessageJ was cast to const char* in between is completely irrelevant.
As was cited by others, here is the relevant part in the standard (the "glvalue" is the result of dereferencing the pointer):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:52
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
As far as the discussion "cast to char*" vs "cast from char*" is concerned:
You might know that the standard doesn't talk about strict aliasing as such, it only provides the list above. Strict aliasing is one analysis technique based on that list for compilers to determine which pointers can potentially alias each other. As far as optimizations are concerned, it doesn't make a difference, if a pointer to a MessageJ object was cast to char* or vice versa. The compiler cannot (without further analysis) assume that a char* and MessageX* point to distinct objects and will not perform any optimizations (e.g. reordering) based on that.
Of course that doesn't change the fact that accessing a char array via a pointer to a different type would still be UB in C++ (I assume mostly due to alignment issues) and the compiler might perform other optimizations that could ruin your day.
EDIT:
What I'm asking is: Will the initialization of a const object, by
casting from a char*, ever be optimized below where that object is
cast to another type of object, such that I am casting from
uninitialized data?
No it will not. Aliasing analysis doesn't influence how the pointer itself is handled, but the access through that pointer. The compiler will NOT reorder the write access (store memory address in the pointer variable) with the read access (copy to other variable / load of address in order to access the memory location) to the same variable.
There is no aliasing problem as you use (const)char* type, see the last point of:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among -its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
The other answer answered the question well enough (it's a direct quotation from the C++ standard in https://isocpp.org/files/papers/N3690.pdf page 75), so I'll just point out other problems in what you're doing.
Note that your code may run into alignment problems. For example, if the alignment of MessageJ is 4 or 8 bytes (typical on 32-bit and 64-bit machines), strictly speaking, it is undefined behaviour to access an arbitrary character array pointer as a MessageJ pointer.
You won't run into any problems on x86/AMD64 architectures as they allow unaligned access. However, someday you may find that the code you're developing is ported to a mobile ARM architecture and the unaligned access would be a problem then.
It therefore seems you're doing something you shouldn't be doing. I would consider using serialization instead of accessing a character array as a MessageJ type. The only problem isn't potential alignment problems, an additional problem is that the data may have a different representation on 32-bit and 64-bit architectures.

Why does sizeof a reference type give you the sizeof the type?

According to the standard, in [expr.sizeof] (5.3.3.2) we get:
When applied to a reference or a reference type, the result is the size of the referenced type.
This seems to go along with the fact that references are unspecified [dcl.ref] (8.3.2.4):
It is unspecified whether or not a reference requires storage
But it seems pretty strange to me to have this kind of inconsistency within the language. Regardless of whether or not the reference requires storage, wouldn't it be important to be able to determine how much size the reference uses? Seeing these results just seems wrong:
sizeof(vector<int>) == 24
sizeof(vector<int>*) == 8
sizeof(vector<int>&) == 24
sizeof(reference_wrapper<vector<int>>) == 8
What is the reasoning behind wanting sizeof(T&) == sizeof(T) by definition?
The choice is somewhat arbitrary, and trying to fully justify either option will lead to circular metaphysical arguments.
The intent of a reference is to be (an alias for) the object itself; under that reasoning it makes sense for them both to have the same size (and address), and that is what the language specifies.
The abstraction is leaky - sometimes a reference has its own storage, separate from the object - leading to anomolies like those you point out. But we have pointers for when we need to deal with a "reference" as a separate entity to the object.
Argument 1: A reference should be a synonym of your object hence the interface of the reference should be exactly the same as the interface of the object, also all operators should work in the same way on object and on reference (except type operators).
It will make sense in the following code:
MyClass a;
MyClass& b = a;
char a_buf[sizeof(a)];
char b_buf[sizeof(b)]; // you want b_buf be the same size as a_buf
memcpy(&a, a_buf, sizeof(a));
memcpy(&b, b_buf, sizeof(b)); // you want this line to work like the above line
Argument 2: From C++ standard's point of view references are something under the hood and it even doesn't say if they occupy memory or not, so it cannot say how to get their size.
How to get reference size: Since by all compilers references are implemented by help of constant pointers and they occupy memory, there is a way to know their size.
class A_ref
{A& ref;}
sizeof(A_ref);
It's not particularly important to know how much storage a reference requires, only the change in storage requirements caused by adding a reference. And that you can determine:
struct with
{
char c;
T& ref;
};
struct without
{
char c;
};
return sizeof (with) - sizeof (without);

Why do parameters passed by reference in C++ not require a dereference operator?

I'm new to the C++ community, and just have a quick question about how C++ passes variables by reference to functions.
When you want to pass a variable by reference in C++, you add an & to whatever argument you want to pass by reference. How come when you assign a value to a variable that is being passed by reference why do you say variable = value; instead of saying *variable = value?
void add_five_to_variable(int &value) {
// If passing by reference uses pointers,
// then why wouldn't you say *value += 5?
// Or does C++ do some behind the scene stuff here?
value += 5;
}
int main() {
int i = 1;
add_five_to_variable(i);
cout << i << endl; // i = 6
return 0;
}
If C++ is using pointers to do this with behind the scenes magic, why aren't dereferences needed like with pointers? Any insight would be much appreciated.
When you write,
int *p = ...;
*p = 3;
That is syntax for assigning 3 to the object referred to by the pointer p. When you write,
int &r = ...;
r = 3;
That is syntax for assigning 3 to the object referred to by the reference r. The syntax and the implementation are different. References are implemented using pointers (except when they're optimized out), but the syntax is different.
So you could say that the dereferencing happens automatically, when needed.
C++ uses pointers behind the scenes but hides all that complication from you. Passing by reference also enables you to avoid all the problems asssoicated with invalid pointers.
When you pass an object to a function by reference, you manipulate the object directly in the function, without referring to its address like with pointers. Thus, when manipulating this variable, you don't want to dereference it with the *variable syntax. This is good practice to pass objects by reference because:
A reference can't be redefined to point to another object
It can't be null. you have to pass a valid object of that type to the function
How the compiler achieves the "pass by reference" is not really relevant in your case.
The article in Wikipedia is a good ressource.
There are two questions in one, it seems:
one question is about syntax: the difference between pointer and reference
the other is about mechanics and implementation: the in-memory representation of a reference
Let's address the two separately.
Syntax of references and pointers
A pointer is, conceptually, a "sign" (as road sign) toward an object. It allows 2 kind of actions:
actions on the pointee (or object pointed to)
actions on the pointer itself
The operator* and operator-> allow you to access the pointee, to differenciate it from your accesses to the pointer itself.
A reference is not a "sign", it's an alias. For the duration of its life, come hell or high water, it will point to the same object, nothing you can do about it. Therefore, since you cannot access the reference itself, there is no point it bothering you with weird syntax * or ->. Ironically, not using weird syntax is called syntactic sugar.
Mechanics of a reference
The C++ Standard is silent on the implementation of references, it merely hints that if the compiler can it is allowed to remove them. For example, in the following case:
int main() {
int a = 0;
int& b = a;
b = 1;
return b;
}
A good compiler will realize that b is just a proxy for a, no room for doubts, and thus simply directly access a and optimize b out.
As you guessed, a likely representation of a reference is (under the hood) a pointer, but do not let it bother you, it does not affect the syntax or semantics. It does mean however that a number of woes of pointers (like access to objects that have been deleted for example) also affect references.
The explicit dereference is not required by design - that's for convenience. When you use . on a reference the compiler emits code necessary to access the real object - this will often include dereferencing a pointer, but that's done without requiring an explicit dereference in your code.

When is the right time to use *, & or const in C++?

I was studying pointers references and came across different ways to feed in parameters. Can someone explain what each one actually means?
I think the first one is simple, it's that x is a copy of the parameter fed in so another variable is created on the stack.
As for the others I'm clueless.
void doSomething1(int x){
//code
}
void doSomething2(int *x){
//code
}
void doSomething3(int &x){
//code
}
void doSomething3(int const &x){
//code
}
I also see stuff like this when variables are declared. I don't understand the differences between them. I know that the first one will put 100 into the variable y on the stack. It won't create a new address or anything.
//example 1
int y = 100;
//example 2
int *y = 100;
//Example 3: epic confusion!
int *y = &z;
Question 1: How do I use these methods? When is it most appropriate?
Question 2: When do I declare variables in that way?
Examples would be great.
P.S. this is one the main reasons I didn't learn C++ as Java just has garbage collection. But now I have to get into C++.
//example 1
int y = 100;
//example 2
int *y = 100;
//Example 3: epic confusion!
int *y = &z;
I think the problem for most students is that in C++ both & and * have different meanings, depending on the context in which they are used.
If either of them appears after a type within an object declaration (T* or T&), they are type modifiers and change the type from plain T to a reference to a T (T&) or a pointer to a T (T*).
If they appear in front of an object (&obj or *obj), they are unary prefix operators invoked on the object. The prefix & returns the address of the object it is invoked for, * dereferences a pointer, iterator etc., yielding the value it references.
It doesn't help against confusion that the type modifiers apply to the object being declared, not the type. That is, T* a, b; defines a T* named a and a plain T named b, which is why many people prefer to write T *a, b; instead (note the placement of the type-modifying * adjacent the object being defined, instead of the type modified).
Also unhelpful is that the term "reference" is overloaded. For one thing it means a syntactic construct, as in T&. But there's also the broader meaning of a "reference" being something that refers to something else. In this sense, both a pointer T* and a reference (other meaning T&) are references, in that they reference some object. That comes into play when someone says that "a pointer references some object" or that a pointer is "dereferenced".
So in your specific cases, #1 defines a plain int, #2 defines a pointer to an int and initializes it with the address 100 (whatever lives there is probably best left untouched ), and #3 defines another pointer and initializes it with the address of an object z (necessarily an int, too).
A for how to pass objects to functions in C++, here is an old answer from me to that.
From Scott Myers - More Effective C++ -> 1
First, recognize that there is no such thing as a null reference. A reference must always refer to some object.Because a reference must refer to an object, C++ requires that references be initialized.
Pointers are subject to no such restriction. The fact that there is no such thing as a null reference implies that it can be more efficient to use references than to use pointers. That's because there's no need to test the validity of a reference before using it.
Another important difference between pointers and references is that pointers may be reassigned to refer to different objects. A reference, however, always refers to the object with which it is initialized
In general, you should use a pointer whenever you need to take into account the possibility that there's nothing to refer to (in which case you can set the pointer to null) or whenever you need to be able to refer to different things at different times (in which case you can change where the pointer points). You should use a reference whenever you know there will always be an object to refer to and you also know that once you're referring to that object, you'll never want to refer to anything else.
References, then, are the feature of choice when you know you have something to refer to, when you'll never want to refer to anything else, and when implementing operators whose syntactic requirements make the use of pointers undesirable. In all other cases, stick with pointers.
Read S.Lippmann's C++ Premier or any other good C++ book.
As for passing the parameters, generally when copying is cheap we pass by value. For mandatory out parameters we use references, for optional out parameters - pointers, for input parameters where copying is costly, we pass by const references
Thats really complicated topic. Please read here: http://www.goingware.com/tips/parameters/.
Also Scott Meiers "Effective C++" is a top book on such things.
void doSomething1(int x){
//code
}
This one pass the variable by value, whatever happens inside the function, the original variable doesn't change
void doSomething2(int *x){
//code
}
Here you pass a variable of type pointer to integer. So when accessing the number you should use *x for the value or x for the address
void doSomething3(int &x){
//code
}
Here is like the first one, but whatever happens inside the function, the original variable will be changed as well
int y = 100;
normal integer
//example 2
int *y = 100;
pointer to address 100
//Example 3: epic confusion!
int *y = &z;
pointer to the address of z
void doSomething1(int x){
//code
}
void doSomething2(int *x){
//code
}
void doSomething3(int &x){
//code
}
And i am really getting confused between them?
The first is using pass-by-value and the argument to the function will retain its original value after the call.
The later two are using pass-by-reference. Essentially they are two ways of achieving the same thing. The argument is not guarenteed to retain its original value after the call.
Most programmers prefer to pass large objects by const reference to improve the performance of their code and provide a constraint that the value will not change. This ensures the copy constructor is not called.
Your confusion might be due to the '&' operator having two meanings. The one you seem to be familiar with is the 'reference operator'. It is also used as the 'address operator'. In the example you give you are taking the address of z.
A good book to check out that covers all of this in detail is 'Accelerated C++' by Andrew Koening.
The best time to use those methods is when it's more efficient to pass around references as opposed to entire objects. Sometimes, some data structure operations are also faster using references (inserting into a linked list for example). The best way to understand pointers is to read about them and then write programs to use them (and compare them to their pass-by-value counterparts).
And for the record, knowledge of pointers makes you considerably more valuable in the workplace. (all too often, C++ programmers are the "mystics" of the office, with knowledge of how those magical boxes under the desks process code /semi-sarcasm)