It seems that when I pass different integers directly to a function, C++ assigns them the same address as opposed to assigning different addresses to different values. Is this by design, or an optimization that can be turned off? See the code below for an illustration.
#include <iostream>
const int *funct(const int &x) { return &x; }
int main() {
int a = 3, b = 4;
// different addresses
std::cout << funct(a) << std::endl;
std::cout << funct(b) << std::endl;
// same address
std::cout << funct(3) << std::endl;
std::cout << funct(4) << std::endl;
}
The bigger context of this question is that I am trying to construct a list of pointers to integers that I would add one by one (similar to funct(3)). Since I cannot modify the method definition (similar to funct's), I thought of storing the address of each argument, but they all ended up having the same address.
The function const int *funct(const int &x) takes in a reference that is bound to an int variable.
a and b are int variables, so x can be bound to them, and they will have distinct memory addresses.
Since the function accepts a const reference, that means the compiler will also allow x to be bound to a temporary int variable as well (whereas a non-const reference cannot be bound to a temporary).
When you pass in a numeric literal to x, like funct(3), the compiler creates a temporary int variable to hold the literal value. That temporary variable is valid only for the lifetime of the statement that is making the function call, and then the temporary goes out of scope and is destroyed.
As such, when you are making multiple calls to funct() in separate statements, the compiler is free to reuse the same memory for those temporary variables, eg:
// same address
std::cout << funct(3) << std::endl;
std::cout << funct(4) << std::endl;
Is effectively equivalent to this:
// same address
int temp;
{
temp = 3;
std::cout << funct(temp) << std::endl;
}
{
temp = 4;
std::cout << funct(temp) << std::endl;
}
However, if you make multiple calls to funct() in a single statement, the compiler will be forced to make separate temporary variables, eg:
// different addresses
std::cout << funct(3) << std::endl << funct(4) << std::endl;
Is effectively equivalent to this:
// different addresses
{
int temp1 = 3;
int temp2 = 4;
std::cout << funct(temp1) << std::endl << funct(temp2) << std::endl;
}
Demo
The function
const int *funct(const int &x) { return &x; }
will return the address of whatever x is referencing.
So this will, as you expected, print the address of a:
std::cout << funct(a) << std::endl;
The problem with the expression funct(3) is that it is impossible to make a reference of a constant and pass it as a parameter. A constant doesn't have an address, and therefore for practical reasons C++ doesn't support taking a reference of a constant. What C++ actually does support is making a temporary object, initializing it with the value 3, and taking the reference of that object.
Basically, the compiler will, in this case, translate this:
std::cout << funct(3) << std::endl;
into something equivalent to this:
{
int tmp = 3;
std::cout << funct(tmp) << std::endl;
}
Unless you do something to extend the lifetime of a temporary object, it will go out of scope after the function call (or right before the next sequence point, I am not sure).
Since the temporary created by 3 goes out of scope before you create a temporary from 4, the memory used by the first temporary may be reused for the second temporary.
Related
I began learning C++ this week, and currently I am reading about compound types and constant variables. Unlike in most cases, references to const support type conversion by creating a temporary variable. But if so, then what's the difference in behaviour between:
int i = 42;
double di = 42;
and
int i = 42;
const double &di = 42;
Don't we end up with two independent variables that can end up having different values if we try to change i? Is the only difference that in the example with the const reference, the reference cannot be changed? The thing that bugs me the most is that when the types of a non-const variable and a const ref match, the reference points to the same address in memory and changes along with the change in the original variable, whereas this does not happen for a non-typematching const ref to a non-const variable:
#include <iostream>
int main() {
int i = 42;
const int &ri = i;
const double &dri = i;
++i;
std::cout << " at " << &i << ", " << ri << " at "
<< &ri << ", " << dri << " at " << &dri << std::endl;
int j = i;
int jj = ri;
int djj = dri;
std::cout << j << " at " << &j << ", " << jj << " at "
<< &jj << ", " << djj << " at " << &dri << std::endl;
return 0;
}
Output:
43 at %Address1%, 43 at %Address1%, 42 at %Address2%
43 at %Address3%, 43 at %Address4%, 42 at %Address2%
This seems to me like a major difference in behavior that is easy to overlook from simply looking at the syntax, on top of the fact that such behavior seems counter-intuitive to the entire idea of references. Also, why does jj is allocated a separate space, but not djj, which references the same address as dri?
Let's say you have a function of the form:
void foo(double const& d);
And now, let's say you have a float somewhere. And you want to pass that to this function via foo(f);. If a T const& could not bind to any object convertible to T, then this wouldn't work. Every user of this function that don't have a double would have to do foo(static_cast<double>(f)) or an equivalent.
You might say that maybe foo should take double by value. And for double specifically, maybe it should.
But what about if it's std::string, and I want to call foo("some string"). Well, "some string" is not a std::string; it is a string literal which is convertible to std::string. So we allow that conversion.
Again, you might say that it should take the string by value. But what about the cases when the caller really does have a std::string? They'd have to copy that string, a copy that is discarded and is therefore unnecessary.
Of course, C++'s rules should be uniform. So if we want this to work for function arguments¶meters, it also has to work for named variables. But even then, it could be useful. You might call a function that you expect to return a string of some form, but aren't especially picky about which form. Just so long as it is convertible to a std::string. This might be in template code:
template<typename T>
void foo(T t)
{
std::string const& data = t.get_a_string();
}
Do you really care if get_a_string returns std::string exactly, or just some string type convertible to std::string? Probably the latter.
After compilation, what does the reference become, an address, or a constant pointer?
I know the difference between pointers and references, but I want to know the difference between the underlying implementations.
int main()
{
int a = 1;
int &b = a;
int *ptr = &a;
cout << b << " " << *ptr << endl; // 1 1
cout << "&b: " << &b << endl; // 0x61fe0c
cout << "ptr: " << ptr << endl; // 0x61fe0c
return 0;
}
The pedantic answer is: Whatever the compiler feels like, all that matters is that it works as specified by the language's semantics.
To get the actual answer, you have to look at resulting assembly, or make heavy usage of Undefined Behavior. At that point, it becomes a compiler-specific question, not a "C++ in general" question
In practice, references that need to be stored essentially become pointers, while local references tend to get compiled out of existence. The later is generally the case because the guarantee that references never get reassigned means that if you can see it getting assigned, then you know full well what it refers to. However, you should not be relying on this for correctness purposes.
For the sake of completeness
It is possible to get some insight into what the compiler is doing from within valid code by memcpying the contents of a struct containing a reference into a char buffer:
#include <iostream>
#include <array>
#include <cstring>
struct X {
int& ref;
};
int main() {
constexpr std::size_t x_size = sizeof(X);
int val = 12;
X val_ref = {val};
std::array<unsigned char, x_size> raw ;
std::memcpy(&raw, &val_ref, x_size);
std::cout << &val << std::endl;
std::cout << "0x";
for(const unsigned char c : raw) {
std::cout << std::hex << (int)c;
}
std::cout << std::endl ;
}
When I ran this on my compiler, I got the (endian flipped) address of val stored within the struct.
it heavily depend on compiler maybe compiler decide to optimize the code therefore it will make it value or ..., but as far i know references will compiler like pointer i mean if you see their result assembly they are compiled like pointer.
int a = 1;
int &b = a;
Here, the reference b has a type int, but, what is the purpose of it having a type when it is not an object? What if that type was different of that of the object it refers to?
The purpose of having typed references (i.e. pointers) is to enable type checking (which helps to catch bugs). If you were to declare a reference as a different type, you will get a type error (you can cast it, but that needs to be done explicitly).
According to Sumita Arora's book 'Computer Science with C++' The reference variables are often treated as derived data type in which it has a property of storing variable addresses.It is means of providing an alias to the existing variable.That is existing variable can be called by using this alternate names.
Suppose when we want to perform swapping of two variables using references.
// function definition to swap the values.
void swap(int &x, int &y) {
int temp;
temp = x; // save the value at address x
x = y; // put y into x
y = temp; // put x into y
return;
}
void main () {
// local variable declaration:
int a = 100;
int b = 200;
cout << "Before swap, value of a :" << a << endl;
cout << "Before swap, value of b :" << b << endl;
/* calling a function to swap the values using variable reference.*/
swap(a, b);
cout << "After swap, value of a :" << a << endl;
cout << "After swap, value of b :" << b << endl;
getch();
}
Here,swapping is performed using call by reference method and the changes will be reflected at actual parameters also.Here modification of passed parameters are done quite easily which serves one of its actual purpose.Whenever,there is a comparison with integer variable and a reference variable during swapping,the possible error might occur is type mismatch error,since address is being compared with value.Here integer references are used to identify that it could store addresses of integer variables only, which is possibly a mechanism developed to tackle type mismatch errors and make compiler identify that the given address holds an integer or the datatype specified by reference such that the program runs smoothly and performs operations.References also eliminates wild pointer cases and often provides easy-to-use interface.
According to " How to get around the warning "rvalue used as lvalue"? ", Visual Studio will merely warn on code such as this:
int bar() {
return 3;
}
void foo(int* ptr) {
}
int main() {
foo(&bar());
}
In C++ it is not allowed to take the address of a temporary (or, at least, of an object referred to by an rvalue expression?), and I thought that this was because temporaries are not guaranteed to even have storage.
But then, although diagnostics may be presented in any form the compiler chooses, I'd still have expected MSVS to error rather than warn in such a case.
So, are temporaries guaranteed to have storage? And if so, why is the above code disallowed in the first place?
Actually, in the original language design it was allowed to take the address of a temporary. As you have noticed correctly, there is no technical reason for not allowing this, and MSVC still allows it today through a non-standard language extension.
The reason why C++ made it illegal is that binding references to temporaries clashes with another C++ language feature that was inherited from C: Implicit type conversion.
Consider:
void CalculateStuff(long& out_param) {
long result;
// [...] complicated calculations
out_param = result;
}
int stuff;
CalculateStuff(stuff); //< this won't compile in ISO C++
CalculateStuff() is supposed to return its result via the output parameter. But what really happens is this: The function accepts a long& but is given an argument of type int. Through C's implicit type conversion, that int is now implicitly converted to a variable of type long, creating an unnamed temporary in the process.
So instead of the variable stuff, the function really operates on an unnamed temporary, and all side-effects applied by that function will be lost once that temporary is destroyed. The value of the variable stuff never changes.
References were introduced to C++ to allow operator overloading, because from the caller's point of view, they are syntactically identical to by-value calls (as opposed to pointer calls, which require an explicit & on the caller's side). Unfortunately it is exactly that syntactical equivalence that leads to troubles when combined with C's implicit type conversion.
Since Stroustrup wanted to keep both features (references and C-compatibility), he introduced the rule we all know today: Unnamed temporaries only bind to const references. With that additional rule, the above sample no longer compiles. Since the problem only occurs when the function applies side-effects to a reference parameter, it is still safe to bind unnamed temporaries to const references, which is therefore still allowed.
This whole story is also described in Chapter 3.7 of Design and Evolution of C++:
The reason to allow references to be initialized by non-lvalues was to allow the distinction between call-by-value and call-by-reference to be a detail specified by the called function and of no interest to the caller. For const references, this is possible; for non-const references it is not. For Release 2.0 the definition of C++ was changed to reflect this.
I also vaguely remember reading in a paper who first discovered this behavior, but I can't remember right now. Maybe someone can help me out?
Certainly temporaries have storage. You could do something like this:
template<typename T>
const T *get_temporary_address(const T &x) {
return &x;
}
int bar() { return 42; }
int main() {
std::cout << (const void *)get_temporary_address(bar()) << std::endl;
}
In C++11, you can do this with non-const rvalue references too:
template<typename T>
T *get_temporary_address(T &&x) {
return &x;
}
int bar() { return 42; }
int main() {
std::cout << (const void *)get_temporary_address(bar()) << std::endl;
}
Note, of course, that dereferencing the pointer in question (outside of get_temporary_address itself) is a very bad idea; the temporary only lives to the end of the full expression, and so having a pointer to it escape the expression is almost always a recipe for disaster.
Further, note that no compiler is ever required to reject an invalid program. The C and C++ standards merely call for diagnostics (ie, an error or warning), upon which the compiler may reject the program, or it may compile a program, with undefined behavior at runtime. If you would like your compiler to strictly reject programs which produce diagnostics, configure it to convert warnings to errors.
You're right in saying that "temporaries are not guaranteed to even have storage", in the sense that the temporary may not be stored in addressable memory. In fact, very often functions compiled for RISC architectures (e.g. ARM) will return values in general use registers and would expect inputs in those registers as well.
MSVS, producing code for x86 architectures, may always produce functions that return their values on the stack. Therefore they're stored in addressable memory and have a valid address.
Temporary objects do have memory. Sometimes the compiler creates temporaries as well. In poth cases these objects are about to go away, i.e. they shouldn't gather important changes by chance. Thus, you can get hold of a temporary only via an rvalue reference or a const reference but not via a non-const reference. Taking the address of an object which about to go away also feels like a dangerous thing and thus isn't supported.
If you are sure you really want a non-const reference or a pointer from a temporary object you can return it from a corresponding member function: you can call non-const member functions on temporaries. And you can return this from this member. However, note that the type system is trying to help you. When you trick it you better know that what you are diing is the Right Thing.
As others mentioned, we all agreed temporaries do have storage.
why is it illegal to take the address of a temporary?
Because temporaries are allocated on stack, the compiler is free to use that address to any other purposes it wants to.
int foo()
{
int myvar=5;
return &myvar;
}
int main()
{
int *p=foo();
print("%d", *p);
return 0;
}
Let's say the address of 'myvar' is 0x1000. This program will most likely print 99 even though it's illegal to access 0x1000 in main(). Though, not necessarily all the time.
With a slight change to the above main():
int foo()
{
int myvar=5;
return &myvar; // address of myvar is 0x1000
}
int main()
{
int *p=foo(); //illegal to access 0x1000 here
print("%d", *p);
fun(p); // passing *that address* to fun()
return 0;
}
void fun(int *q)
{
int a,b; //some variables
print("%d", *q);
}
The second printf is very unlikely to print '5' as the compiler might have even allocated the same portion of stack (which contains 0x1000) for fun() as well. No matter whether it prints '5' for both printfs OR in either of them, it is purely an unintentional side effect on how stack memory is being used/allocated. That's why it's illegal to access an address which is not alive in the scope.
Temporaries do have storage. They are allocated on the stack of the caller (note: might be subject of calling convention, but I think they all use caller's stack):
caller()
{
callee1( Tmp() );
callee2( Tmp() );
}
Compiler will allocate space for the result Tmp() on stack of the caller. You can take address of this memory location - it'll be some address on stack of caller. What compiler does not guarantee is that it will preserve values at this stack address after callee returns. For example, compiler can place there another temporary etc.
EDIT: I believe, it's disallowed to eliminate code like this :
T bar();
T * ptr = &bar();
because it will very likely lead to problems.
EDIT: here is a little test:
#include <iostream>
typedef long long int T64;
T64 ** foo( T64 * fA )
{
std::cout << "Address of tmp inside callee : " << &fA << std::endl;
return ( &fA );
}
int main( void )
{
T64 lA = -1;
T64 lB = -2;
T64 lC = -3;
T64 lD = -4;
T64 ** ptr_tmp = foo( &lA );
std::cout << "**ptr_tmp = *(*ptr_tmp ) = lA\t\t\t\t**" << ptr_tmp << " = *(" << *ptr_tmp << ") = " << **ptr_tmp << " = " << lA << std::endl << std::endl;
foo( &lB );
std::cout << "**ptr_tmp = *(*ptr_tmp ) = lB (compiler override)\t**" << ptr_tmp << " = *(" << *ptr_tmp << ") = " << **ptr_tmp << " = " << lB << std::endl
<< std::endl;
*ptr_tmp = &lC;
std::cout << "Manual override" << std::endl << "**ptr_tmp = *(*ptr_tmp ) = lC (manual override)\t\t**" << ptr_tmp << " = *(" << *ptr_tmp << ") = " << **ptr_tmp
<< " = " << lC << std::endl << std::endl;
*ptr_tmp = &lD;
std::cout << "Another attempt to manually override" << std::endl;
std::cout << "**ptr_tmp = *(*ptr_tmp ) = lD (manual override)\t\t**" << ptr_tmp << " = *(" << *ptr_tmp << ") = " << **ptr_tmp << " = " << lD << std::endl
<< std::endl;
return ( 0 );
}
Program output GCC:
Address of tmp inside callee : 0xbfe172f0
**ptr_tmp = *(*ptr_tmp ) = lA **0xbfe172f0 = *(0xbfe17328) = -1 = -1
Address of tmp inside callee : 0xbfe172f0
**ptr_tmp = *(*ptr_tmp ) = lB (compiler override) **0xbfe172f0 = *(0xbfe17320) = -2 = -2
Manual override
**ptr_tmp = *(*ptr_tmp ) = lC (manual override) **0xbfe172f0 = *(0xbfe17318) = -3 = -3
Another attempt to manually override
**ptr_tmp = *(*ptr_tmp ) = lD (manual override) **0xbfe172f0 = *(0x804a3a0) = -5221865215862754004 = -4
Program output VC++:
Address of tmp inside callee : 00000000001EFC10
**ptr_tmp = *(*ptr_tmp ) = lA **00000000001EFC10 = *(000000013F42CB10) = -1 = -1
Address of tmp inside callee : 00000000001EFC10
**ptr_tmp = *(*ptr_tmp ) = lB (compiler override) **00000000001EFC10 = *(000000013F42CB10) = -2 = -2
Manual override
**ptr_tmp = *(*ptr_tmp ) = lC (manual override) **00000000001EFC10 = *(000000013F42CB10) = -3 = -3
Another attempt to manually override
**ptr_tmp = *(*ptr_tmp ) = lD (manual override) **00000000001EFC10 = *(000000013F42CB10) = 5356268064 = -4
Notice, both GCC and VC++ reserve on the stack of main hidden local variable(s) for temporaries and MIGHT silently reuse them. Everything goes normal, until last manual override: after last manual override we have additional separate call to std::cout. It uses stack space to where we just wrote something, and as a result we get garbage.
Bottom line: both GCC and VC++ allocate space for temporaries on stack of caller. They might have different strategies on how much space to allocate, how to reuse this space (it might depend on optimizations as well). They both might reuse this space at their discretion and, therefore, it is not safe to take address of a temporary, since we might try to access through this address the value we assume it still has (say, write something there directly and then try to retrieve it), while compiler might have reused it already and overwrote our value.
#include <iostream>
#include <tchar.h>
void output(int *param)
{
std::cout << "Value: " << *param << std::endl;
};
int _tmain(int argc, _TCHAR* argv[])
{
int i = 34;
output(&i);
return 0;
}
obviously writes "Value: 34" to the console.
But if I make the following changes
...
void output(int **param)
{
std::cout << "Value: " << **param << std::endl;
}
...
output(&(&i));
...
I get a compile error "'&' requires l-value".
By the way, I even tried to make the following change:
output(&34);
Indeed this feels wrong ... somehow.
My question is: Why is this not allowed to use & at an r-value? Is there some reason on assembler level?
You are trying to get a reference to a r-value and that is basically not defined, since it is always a temporary value and actually never has an address on the stack/heap. That is why C++11 introduced r-value references, but that is a totally different subject to your question.
To get your code to compile your need to do the following:
int i = 34;
int* pi = &i;
output(&ip);
By "grounding" your reference in pi, you give the compiler a real address on the stack that can be given to output.
You're trying to get the address of an address, and an address is not an l-value. (You can very roughly think about l-values as values that can stand on the left side of an operation. Variables, and "named values" are l-values, for example)
Store the first address somewhere.
int number = 4;
int* firstAddress = &number;
int** secondAddress = &firstAddress;
output(secondAddress);