As far as I can tell, in c++ references are implemented as constant pointers.
say you have y, which is a reference to variable x.
why would it not be more performant and efficient (especially when passing variables in functions). To either:
A. every time y is mentioned it gets replaced by x as a pre-processing stage.
B. have both x and y refer to the same memory address in the compilers symbol table.
As far as I can tell, in c++ references are implemented as constant pointers.
Might be correct, might be not. I actually don't know for sure. In any case thats implementation details. What matters is how the C++ standard specifies reference and it does not mention that they must be implemented as constant pointers.
why would it not be more performant and efficient...
When it is possible and more performant the compiler will do that. The so-called as-if rule allows the compiler to perform any optimization that does not change observable behavior of the program in accordance with the C++ standard. The standard does not specify how references are implemented in detail. It is up to the compiler to implement them in the most efficient way.
Related
If I create a reference to a variable inside the scope of a function like that :
{
int x = 5;
int & ref = x;
}
Will it always create an implicit pointer ? Creating a pointer is needed if the reference is a function parameter, but in this case, it is the same as using x directly.
Not necessarily. How your compiler implements references is down to it, so long as it follows the C++ standard.
Remember that the compiler will adopt the as-if rule. You program the intended behaviour. The compiler generates the code. A good compiler will miss out your code snippet entirely since it has no observable effect.
See What exactly is the "as-if" rule?
That's an unspecified implementation detail. (Function parameters might be passed in registers, which would mean no pointer either.)
But in this (automatic) scope, ref is just an alias for x, so no pointer is needed for the compiler.
Others have given already the formally correct answer. I am trying with a more practical perspective. In principle, yes, the compiler will "always" create an implicit pointer. Frequently, that a pointer is the only way a reference can be implemented.
However, the compiler employs many optimization strategies and hence, frequently, the implicit pointer can and will be optimized away.
Some examples:
In your example above, since the variables are never used, everything even the variable x will be optimized away.
If you pass the reference to a function that cannot be inlined, the reference most likely will be kept. If the function can be inlined, the references probably can be optimized away as well.
void swap(int &a, int &b) {
int c=a; a=b; b=c;
}
If the above function is typically equivalent to using pointers. If you ask your compiler to produce the assembly code, except for some minor differences, it will produce the same code. In many cases the function can be inlined which means your call to the function swap will be replaced by what the function is doing. As a consequence, the references will probably optimized away (same would be the case if you had been using pointers).
If your question goes deeper and is whether there is a difference in using pointers versus references, they are equally expensive. A reference cannot magically replace the necessity for a pointer. On the other hand, even though they are the same, references are not redundant from a code readability point of view.
In the end, as the others have explained use whatever makes your program more readable and don't worry about the difference.
Edit: removed vector<int&> sample - thanks idclev 463035818
Looking at references in C++ I noticed that all implementations I looked at used a pointer internally.
Does the C++ Standard guarantee that a reference will use a pointer internally or would it be ok for an implementation to use a more "efficient" solution? (I would currently not see how it could be done "better" because when a new stack frame is created there's not really a bulletproof way to know easily at what offset from the stack base pointer the variable that is being referenced is at because the stack is quite dynamic)
Note: I do understand the difference between a pointer and a reference in C++ (This question has nothing to do with that)
If you mean that a reference requires the compiler to allocate storage for a pointer, then that's unspecified.
ยง 8.3.2/4
It is unspecified whether or not a reference requires storage.
EDIT: To record Martin Bonner's comment as a useful, practical note,
[F]or debugging purposes it can be quite useful to know what is going on "under the hood". (E.g. to answer questions like "why hasn't this gone completely off the rails?"). In practise, compilers all implement references as pointers (unless they can optimize the reference completely away).
No, it does not make any guarantees about how references are implemented. The C++ language only defines the semantics of references, not their implementation.
The standard doesn't say how a reference is implemented, just how it works.
It also doesn't say anything about stack frames, that's another implementation detail.
As far as I know, when two pointers (or references) do not type alias each other, it is legal to for the compiler to make the assumption that they address different locations and to make certain optimizations thereof, e.g., reordering instructions. Therefore, having pointers to different types to have the same value may be problematic. However, I think this issue only applies when the two pointers are passed to functions. Within the function body where the two pointers are created, the compiler should be able to make sure the relationship between them as to whether they address the same location. Am I right?
As far as I know, when two pointers (or references) do not type alias
each other, it is legal to for the compiler to make the assumption
that they address different locations and to make certain
optimizations thereof, e.g., reordering instructions.
Correct. GCC, for example, does perform optimizations of this form which can be disabled by passing the flag -fno-strict-aliasing.
However, I think this issue only applies when the two pointers are
passed to functions. Within the function body where the two pointers
are created, the compiler should be able to make sure the relationship
between them as to whether they address the same location. Am I right?
The standard doesn't distinguish between where those pointers came from. If your operation has undefined behavior, the program has undefined behavior, period. The compiler is in no way obliged to analyze the operands at compile time, but he may give you a warning.
Implementations which are designed and intended to be suitable for low-level programming should have no particular difficulty recognizing common patterns where storage of one type is reused or reinterpreted as another in situations not involving aliasing, provided that:
Within any particular function or loop, all pointers or lvalues used to access a particular piece of storage are derived from lvalues of a common type which identify the same object or elements of the same array, and
Between the creation of a derived-type pointer and the last use of it or any pointer derived from it, all operations involving the storage are performed only using the derived pointer or other pointers derived from it.
Most low-level programming scenarios requiring reuse or reinterpretation of storage fit these criteria, and handling code that fits these criteria will typically be rather straightforward in an implementation designed for low-level programming. If an implementation cache lvalues in registers and performs loop hoisting, for example, it could support the above semantics reasonably efficiently by flushing all cached values of type T whenever T or T* is used to form a pointer or lvalue of another type. Such an approach may be optimal, but would degrade performance much less than having to block all type-based optimizations entirely.
Note that it is probably in many cases not worthwhile for even an implementation intended for low-level programming to try to handle all possible scenarios involving aliasing. Doing that would be much more expensive than handling the far more common scenarios that don't involve aliasing.
Implementations which are specialized for other purposes are, of course, not required to make any attempt whatsoever to support any exceptions to 6.5p7--not even those that are often treated as part of the Standard. Whether such an implementation should be able to support such constructs would depend upon the particular purposes for which it is designed.
According to Wikipedia:
C++11 defines conditions under which pointer values are "safely
derived" from other values. An implementation may specify that it
operates under "strict pointer safety," in which case pointers that
are not derived according to these rules can become invalid.
As I read it you can get the safety model used by an implementation, however that's fixed for the compiler (possibly variable with a command line switch).
Suppose I have code that hides pointers, such code definitely would not run with a naive bolt on garbage collector. However collectors (like my own) and Boehm provide hooks for finding pointers in certain objects.
I am in particular thinking about JudyArrays. These are digital tries which necessarily hide the keys. My question is basically whether using such data structures would render the behaviour of a program undefined in C++11.
I hope not (since Judy Arrays outperform everything else). Also as it happens .. I'm using them to implement a garbage collector. I am concerned however because "minimal requirements" don't general work at all and were strongly opposed in the original debate on the C++ conformance model (by the UK and Australia). Parametric requirements are better. But the C++11 GC related text seems to be a bit of both so I'm confused!
It's implementation defined whether an implementation provides relaxed pointer safety (what you seem to want) or strict pointer safety (pointers remain valid only when safely derived). As you've implied, you can call get_pointer_safety to find out what the policy is, but the standard provides no way to specify/change the policy.
You may, however, be able to side-step this question. If you can make a call to declare_reachable (passing that pointer value) before you hide the pointer, it remains valid until a matching call to undeclare_reachable (and here "matching" means calls nest).
From the discussion that has happened in my recent question (Why is a c++ reference considered safer than a pointer?), it raises another question in my mind: What exactly was the rationale behind introducing references in c++?
Section 3.7 of Stroustrup's Design and Evolution of C++ describes the introduction of references into the language. If you're interested in the rationale behind any feature of C++, I highly recommend this book.
References were introduced primarily to support operator overloading. Doug McIlroy recalls that once I was explaining some problems with a precursor to the current operator overloading scheme to him. He used the word reference with the startling effect that I muttered "Thank you," and left his office to reappear the next day with the current scheme essentially complete. Doug had reminded me of Algol68.
C passes every function argument by value, and where passing an object by value would be inefficient or inappropriate the user can pass a pointer. This strategy doesn't work where operator overloading is used. In that case, notational convenience is essential because users cannot be expected to insert address-of operators if the objects are large. For example:
a = b - c;
is acceptable (that is, conventional) notation, but
a = &b - &c;
is not. Anyway, &b - &c already has a meaning in C, and I didn't want to change that.
It is not possible to change what a reference refers to after initialization. That is, once a C++ reference is initialized, it cannot be re-bound. I had in the past been bitten by Algol68 references where r1 = r2 can either assign through r1 to the object referred to or assign a new reference value to r1 (re-binding r1) depending on the type of r2. I wanted to avoid such problems in C++.
You need them for operator overloading (of course we can now go down the rabbit hole of "what was the rationale for introducing operator overloading?")
How would you type std::auto_ptr::operator*() without references? Or std::vector::operator[]?
References bind to objects implicitly. This has large advantages when you consider things like binding to temporaries or operator overloading- C++ programs would be full of & and *. When you think about it, the basic use case of a pointer is actually to behave of a reference. In addition, it's much harder to screw up references- you don't perform any pointer arithmetic yourself, can't automatically convert from arrays (a terrible thing), etc.
References are cleaner, easier, and safer than pointers.
It's interesting because most other languages don't have references like C++ has them (aliases), they just have pointer-style references.
If code takes the address of a variable and passes it to a routine, the compiler has no way of knowing whether that address might get stored someplace and used long after the called routine has exited, and possibly after the variable has ceased to exist. By contrast, if code passes give a routine a reference to a variable, it has somewhat more assurance that the reference will only be used while that routine is running. Once that routine returns, the reference will no longer be used.
Things end up getting a little 'broken' by the fact that C++ allows code to take the address of a reference. This ability was provided to allow compatibility with older routines which expected pointers rather than references. If a reference is passed to a routine which takes its address and stores it someplace, all bets are off. On the other hand, if as a matter of policy one forbids using the address of a reference in any way that might be persisted, one can pretty well gain the assurances that references provide.
To allow for operator overloading. They wanted operators to be overloadable both for objects and pointers, so they needed a way to refer to an object by something other than a pointer. Hence the reference was introduce. It is in "The Design and Evolution of C++".