returning tmp using references behaving differently - c++

I know that returning temporary variables using references doesn't work since the temporary object is lost after the function terminates, but the following piece of code works since the returned temporary is assigned to another object.
I assume the temporary objects get destroyed after the line of function call. If it is so, why isn't this working for this kind of method chaining?
Counter& Counter::doubler()
{
Counter tmp;
tmp.i = this->i * 2;
return tmp;
}
int main()
{
Counter d(2);
Counter d1, d2;
d1 = d.doubler(); // normal function call
std::cout << "d1=" << d1.get() << std::endl; // Output : d1=4
d2 = d.doubler().doubler(); // Method chaining
std::cout << "d2=" << d2.get() << std::endl; // Output : d2=0
return 0;
}

If a function returns a reference to a local object, the object will be destroyed as soon as the function returns (as local objects are). It does not persist to the end of the line of the function call.
Accessing an object after it has been destroyed will yield unpredictable results. Sometimes it may work, for some definition of "work", and sometimes it may not. Just don't do it.

Counter& doubler()
{
Counter tmp;
tmp.i=this->i*2;
return tmp;
}
It's undefined behaviour. After return from function - your reference will be dangling, since Counter destructor will be called for local object tmp.

The real question is not "why this kind of method chaining is not working?", but instead "why the first ('normal') function call works?"
The answer is there's no way to tell, because it might as well break your program.
To state it clearly: returning temporary object by reference is undefined behavior. Which, of course, means that it might work by coincidence today and stop working tomorrow. All bets are off.

When a function returns and stack roll back happens it is logical rollback the stack pointer is set with different value. If a function returns a local variable reference then memory location pointing to local may still be with process and has same bits set. However this is not guaranteed and after few more calls will not be valid and may result in undefined behavior.

Other are all right, in that 'just do not fiddle around with references to local objects'
But as to why it works in one case and not in other
When you call it singly, and when the function returns, the object is still lying on the stack. Granted a 'destructed' object - but whatever space the object used to take is still there on the stack. If you have a simple object, like with a single int member, then there is NOTHING disturbing it on the stack, unless you code allocated something else on the stack, or the destructor decided to do a much thorough job and obliterate an integer member (which most destructors do not do). Granted yada yada, but till the very next line not much is going to happen that would move it from the stack. Your reference is pointing to a valid memory location and your (destructed) object would be there. That is why it works for you.
When you call it chained, see the first call returns you a reference to that tmp on the stack. As explained in #1 above, no problem so far. Your (destructed) tmp is still very much there on stack. But notice the moment you call that second doubler. Where is the tmp inside that second doubler function call going to come up? Right where the tmp from your first call was!!! The second call overwrites the object (the tmp with value 4) with a tmp with value 0 (the default constructed one). The second call is in effect made on a Counter which has 0 value, hence you get 0. Extremely tricky - that is why just forget about fiddling with returning references to local variable.
Now Purists may scream - undefined, no no just don't do it - and I am with them - I have myself said twice (now thrice) that do not do it. But people may try it. I bet for a 'simple' object like the following, AND code exactly as in the question (so as to nothing is disturbing the stack), everyone is going to get consistent 4, 0 - no randomness, no undefined ....
class Counter
{
public:
Counter()
{
i = 0;
}
Counter(int k)
{
i = k;
}
int get()
{
return i;
}
int i;
Counter& doubler();
};

Related

What happens if an object resizes its own container?

This is not a question about why you would write code like this, but more as a question about how a method is executed in relation to the object it is tied to.
If I have a struct like:
struct F
{
// some member variables
void doSomething(std::vector<F>& vec)
{
// do some stuff
vec.push_back(F());
// do some more stuff
}
}
And I use it like this:
std::vector<F>(10) vec;
vec[0].doSomething(vec);
What happens if the push_back(...) in doSomething(...) causes the vector to expand? This means that vec[0] would be copied then deleted in the middle of executing its method. This would be no good.
Could someone explain what exactly happens here?
Does the program instantly crash? Does the method just try to operate on data that doesn't exist?
Does the method operate "orphaned" of its object until it runs into a problem like changing the object's state?
I'm interested in how a method call is related to the associated object.
Yes, it's bad. It's possible for your object to be copied (or moved in C++11 if the distinction is relevant to your code) while your are inside doSomething(). So after the push_back() returns, the this pointer may no longer point to the location of your object. For the specific case of vector::push_back(), it's possible that the memory pointed to by this has been freed and the data copied to a new array somewhere else. For other containers (list, for example) that leave their elements in place, this is (probably) not going to cause problems at all.
In practice, it's unlikely that your code is going to crash immediately. The most likely circumstance is a write to free memory and a silent corruption of the state of your F object. You can use tools like valgrind to detect this kind of behavior.
But basically you have the right idea: don't do this, it's not safe.
Could someone explain what exactly happens here?
Yes. If you access the object, after a push_back, resize or insert has reallocated the vector's contents, it's undefined behavior, meaning what actually happens is up to your compiler, your OS, what do some more stuff is and maybe a number of other factors like maybe phase of the moon, air humidity in some distant location,... you name it ;-)
In short, this is (indirectly via the std::vector implemenation) calling the destructor of the object itself, so the lifetime of the object has ended. Further, the memory previously occupied by the object has been released by the vector's allocator. Therefore the use the object's nonstatic members results in undefined behavior, because the this pointer passed to the function does not point to an object any more. You can however access/call static members of the class:
struct F
{
static int i;
static int foo();
double d;
void bar();
// some member variables
void doSomething(std::vector<F>& vec)
{
vec.push_back(F());
int n = foo(); //OK
i += n; //OK
std::cout << d << '\n'; //UB - will most likely crash with access violation
bar(); //UB - what actually happens depends on the
// implementation of bar
}
}

Returning a reference can work?

I used to think returning a reference is bad as our returned reference will refer to some garbage value. But this code works (matrix is a class):
const int max_matrix_temp = 7;
matrix&get_matrix_temp()
{
static int nbuf = 0;
static matrix buf[max_matrix_temp];
if(nbuf == max_matrix_temp)
nbuf = 0;
return buf[nbuf++];
}
matrix& operator+(const matrix&arg1, const matrix&arg2)
{
matrix& res = get_matrix_temp();
//...
return res;
}
What is buf doing here and how does it save us from having garbage values?
buf is declared as static, meaning it retains it's value between calls to the function:
static matrix buf[max_matrix_temp];
i.e. it's not created on the stack as int i = 0; would be (a non-static local variable), so returning a reference to it is perfectly safe.
This following code is dangerous, because the memory for the variable's value is on the stack, so when the function returns and we move back up the stack to the previous function, all of the memory reservations local to the function cease to exist:
int * GetAnInt()
{
int i = 0; // create i on the stack
return &i; // return a pointer to that memory address
}
Once we've returned, we have a pointer to a piece of memory on the stack, and with dumb luck it will hold the value we want because it's not been overwritten yet — but the reference is invalid as the memory is now free for use as and when space on the stack is required.
I see no buf declared anywhere, which means it doesn't go out of scope with function return, so it's okay. (it actually looks like it's meant to be matrixbuf which is also fine, because it's static).
EDIT: Thanks to R. Martinho Fernandes for the guess. Of course it is matrix buf, so it makes buf static array in which temporary is allocated to make sure it doesn't get freed when the function returns and therefore the return value is still valid.
This is safe up to a point, but very dangerous. The returned reference
can't dangle, but if client code keeps it around, at some future point,
the client is in for a big surprise, as his value suddenly changes to a
new return value. And if you call get_matrix_temp more than
max_matrix_temp times in a single expression, you're going to end up
overwriting data as well.
In the days before std::string, in code using printf, I used this
technique for returning conversions of user defined types, where a
"%s" specifier was used, and the argument was a call to a formatting
function. Again, max_matrix_temp was the weak point: a single
printf which formatted more instances of my type would output wrong
data. It was a bad idea then, and it's a worse idea now.

C++ transfer ownership of a struct to a function

Is it possible to transfer the ownership of a local variable in C++ (or C++0x) to a function, leaving it undefined after the return, so optimizations can be done?
struct A {
int a[100000];
};
int func(A& s){
//s should now be "owned" by func and be undefined in the calling function
s.a[2] += 4;
return s.a[2];
}
int main(){
A s;
printf("%d\n", func(s));
//s is now undefined
}
I want the function "func" to be optimized to simply return s.a[2]+4, but not change the actual value in memory, just like if "s" had been a local variable in "func".
If it can't be done in standard C++, is it possible with some extension in g++?
No, it's not possible, either Standard or through an extension, and that's because there is no optimization value. Compilers can trivially prove that there are no more references to local variables in such situations. Failing all else, you could trivially mimic the effect by doing
int main() {
{
A s;
printf("%d\n", func(s));
}
}
Being able to do that kind of thing would be hideously dangerous for no benefit.
Leave optimization to compiler - in simple cases it probably can do it.
Don't forget - premature optimization is the root of all evil.
I would expect a reasonable optimizing compiler to be able to make an optimization like this without any special hints if the local variable s is really not referenced after the function returns, assuming the variable and the function were in the same compilation unit or you had some form of link time code generation enabled.
You might be able to help the optimizer by scoping your local variable to make it explicit that it can't be accessed beyond the one reference after the function call:
int main() {
{
A s;
printf("%d\n", func(s));
} //s is now undefined
}
If you have a specific case that doesn't appear to be optimized as effectively as you think it should then perhaps you can provide more detail about your situation. I'm a little unclear what exactly you mean by the function 'owning' the local variable in this case since you do actually want to access it after the function returns.
You are mixing distinct issues like ownership, stack unwinding and value parameters.
Is it possible to transfer the ownership of a local variable
No. Local variables are local to the scope defined and nobody else can see them or manipulate them in any fashion. You can pass to a function the value of a local variable, a reference to a local variable, or the address of a local variable. When a reference or a pointer to a local variable is passed then the calee can manipulate the content of the variable, but by no mean can he influence the scope of the variable in the caller frame. The most common transfer of 'ownership' implies transfer a pointer by value and relying on the callee to take ownership of the allocated memory. All forms of variable passing (by value, by ref, by pointer) can handle this, the issue of memory allocation ownership is distinct.
I want the function "func" to be optimized to simply return s.a[2]+4,
but not change the actual value in memory,
Then do exactly that, why make it any more complicated?
int func(const A& s){
return s.a[2] + 4;
}
This will do exactly what you describe, but is very unlikely this is what you're actually asking. Making a leap of faith and invoking some psychic powers one would guess that what you're really asking is can an object be at the same time changed in the callee scope and left intact in the caller scope? The answer is obviously no, because memory cannot have different values depending on the caller. You can pass a copy of the object (pass by value) which would allow the caller to manipulate its own copy of the object as it sees fit w/o affecting the original one from the caller scope. OR you can pass const reference, preventing the callee from modifying it, and have the callee copy out whatever it needs to to modify.
I'm surprised that nobody else has posted this, but it sounds like what's closest to what you want is simply to remove the reference, and put the function call in a seperate scope
int func(A s){ //removed the &
//s is "owned" by func and changes don't touch anyone else's A objects
s.a[2] += 4;
return s.a[2];
}
int main(){
{
A s;
printf("%d\n", func(s));
// s hasn't changed, func had it's own copy.
} // s goes out of scope and is deleted
//s is now undefined
}
You could also have it pointer based if you prefer that. The code sample below changes your func(A) to take a reference to a pointer so it can be turned safely to null after deallocation.
It's a little bit hooky but it is possible if you absolutely require it. It may be a premature optimization that can be avoided however.
int func(A*& s) {
int retVal = s->a[2] + 4;
delete s;
s = NULL;
return retVal;
}
int main() {
A* s = new A();
printf("%d\n", func(s));
}
You can use the old C++03 auto_ptr for this. The calling function is left with an auto_ptr that points to nullptr, while the function took over and deleted s.
int func(std::auto_ptr<A> s){
//s is "owned" by func and be undefined in the calling function
s.a[2] += 4;
return s.a[2];
}
int main(){
std::auto_ptr<A> s = new A();
printf("%d\n", func(s)); //destructive copy of pointer
//s is now nullptr
}

In C++ if a pointer is returned and immediately dereferenced, will the two operations be optimized away?

In C++ if I get and return the address of a variable and the caller then immediately dereferences it, will the compiler reliably optimize out the two operations?
The reason I ask is I have a data structure where I'm using an interface similar to std::map where find() returns a pointer (iterator) to a value, and returns NULL (there is no trivial .end() equivalent) to indicate that the value has not been found.
I happen to know that the variables being stored are pointers, so returning NULL works fine even if I returned the value directly, but it seems that returning a pointer to the value is more general. Otherwise if someone tried to store an int there that was actually 0 the data structure would claim it isn't there.
However, I'm wondering if there's even any loss in efficiency here, seeing as the compiler should optimize away actions that just undo the effect of each other. The problem is that the two are separated by a function return so maybe it wouldn't be able to detect that they just undo each other.
Lastly, what about having one private member function that just returns the value and an inline public member function that just takes the address of the value. Then at least the address/dereference operations would take place together and have a better chance of being optimized out, while the whole body of the find() function is not inlined.
private:
V _find(key) {
... // a few dozen lines...
}
public:
inline V* find(key) {
return &_find(key);
}
std::cout << *find(a_key);
This would return a pointer to a temporary, which I didn't think about. The only thing that can be done similar to this is to do a lot of processing in the _find() and do the last step and the return of the pointer in find() to minimize the amount of inlined code.
private:
W* _find(key) {
... // a few dozen lines...
}
public:
inline V* find(key) {
return some_func(_find(key)); // last steps on W to get V*
}
std::cout << *find(a_key);
Or as yet another responder mentioned, we could return a reference to V in the original version (again, not sure why we're all blind to the trivial stuff at first glance... see discussion.)
private:
V& _find(key) {
... // a few dozen lines...
}
public:
inline V* find(key) {
return &_find(key);
}
std::cout << *find(a_key);
_find returns a temporary object of type V. find then attempts to take the address of the temporary and return it. Temporary objects don't last very long, hence the name. So the temporary returned by _find will be destroyed after getting its address. And therefore find will return a pointer to a previously destroyed object, which is bad.
I've seen it go either way. It really depends on the compiler and the level optimization. Even when it does get inlined, I've seen cases where the compiler will not optimize this out.
The only way to see if it does get optimized out it is to actually look at the disassembly.
What you should probably do is to make a version where you manually inline them. Then benchmark it to see if you actually get a noticeable performance gain. If not, then this whole question is moot.
Your code (even in its second incarnation) is broken. _find returns a V, which find destroys immediately before returning its address.
If _find returned a V& to an object that outlives the call (thus producing a correct program), then the dereference would be a no-op, since a reference is no different to a pointer at the machine code level.

Return by reference

Please see the following code snippets. In the second function i am returning a reference. I am declaring a local variable in the function and is returning the address. As the variable is local I believe its life ends as it exits the function. My question is why is it possible to access the value from the caller without any exceptions even though the original variable is deleted?
int& b=funcMulRef(20,3);
int* a= funcMul(20,3);
int* funcMul(int x,int y)
{
int* MulRes = new int;
*MulRes = (x*y);
return MulRes;
}
int& funcMulRef(int x,int y)
{
int MulRes ;
MulRes = (x*y);
return MulRes;
}
Regards,
JOhn
The behaviour of the second function is simply undefined; anything can happen, and in many circumstances, it will appear to work, simply because nothing has overwritten where the result used to be stored on the stack.
You are accessing data that is no longer in scope.
The memory probably still has the data in it though so it appears to work properly but is likely to be reused at any time and the value will be overwritten.
The next time you call any function or allocate a local stack variable it's very likely to reuse that memory for the new data and overwrite what you had there before. It's underfined behavour.
The original value isn't deleted. Just because the action of deleting it will cause some unseen computations.
The value is still there, but the memory space is no longer yours, and is actually undefined.
You are pointing to a space in memory that can be overrun by the program.
No, you shouldn't do this. The result of accessing residual data on the stack is undefined. Beside that, if your return value is of class type, its destructor will have already been called.
Are you trying to avoid temporary objects? If so, you might be interested in this:
http://en.wikipedia.org/wiki/Return_value_optimization
It most likely won't work in these cases :
funcMulRef(10,3) + funcMulRef(100,500)
alternatively, in a more nasty way :
std::cout << "10*3=" << funcMulRef(10,3) << " 100*500=" << funcMulRef(100,500) << std::endl;
gcc will warn for this kind of errors if you use -Wall