Is it possible to transfer the ownership of a local variable in C++ (or C++0x) to a function, leaving it undefined after the return, so optimizations can be done?
struct A {
int a[100000];
};
int func(A& s){
//s should now be "owned" by func and be undefined in the calling function
s.a[2] += 4;
return s.a[2];
}
int main(){
A s;
printf("%d\n", func(s));
//s is now undefined
}
I want the function "func" to be optimized to simply return s.a[2]+4, but not change the actual value in memory, just like if "s" had been a local variable in "func".
If it can't be done in standard C++, is it possible with some extension in g++?
No, it's not possible, either Standard or through an extension, and that's because there is no optimization value. Compilers can trivially prove that there are no more references to local variables in such situations. Failing all else, you could trivially mimic the effect by doing
int main() {
{
A s;
printf("%d\n", func(s));
}
}
Being able to do that kind of thing would be hideously dangerous for no benefit.
Leave optimization to compiler - in simple cases it probably can do it.
Don't forget - premature optimization is the root of all evil.
I would expect a reasonable optimizing compiler to be able to make an optimization like this without any special hints if the local variable s is really not referenced after the function returns, assuming the variable and the function were in the same compilation unit or you had some form of link time code generation enabled.
You might be able to help the optimizer by scoping your local variable to make it explicit that it can't be accessed beyond the one reference after the function call:
int main() {
{
A s;
printf("%d\n", func(s));
} //s is now undefined
}
If you have a specific case that doesn't appear to be optimized as effectively as you think it should then perhaps you can provide more detail about your situation. I'm a little unclear what exactly you mean by the function 'owning' the local variable in this case since you do actually want to access it after the function returns.
You are mixing distinct issues like ownership, stack unwinding and value parameters.
Is it possible to transfer the ownership of a local variable
No. Local variables are local to the scope defined and nobody else can see them or manipulate them in any fashion. You can pass to a function the value of a local variable, a reference to a local variable, or the address of a local variable. When a reference or a pointer to a local variable is passed then the calee can manipulate the content of the variable, but by no mean can he influence the scope of the variable in the caller frame. The most common transfer of 'ownership' implies transfer a pointer by value and relying on the callee to take ownership of the allocated memory. All forms of variable passing (by value, by ref, by pointer) can handle this, the issue of memory allocation ownership is distinct.
I want the function "func" to be optimized to simply return s.a[2]+4,
but not change the actual value in memory,
Then do exactly that, why make it any more complicated?
int func(const A& s){
return s.a[2] + 4;
}
This will do exactly what you describe, but is very unlikely this is what you're actually asking. Making a leap of faith and invoking some psychic powers one would guess that what you're really asking is can an object be at the same time changed in the callee scope and left intact in the caller scope? The answer is obviously no, because memory cannot have different values depending on the caller. You can pass a copy of the object (pass by value) which would allow the caller to manipulate its own copy of the object as it sees fit w/o affecting the original one from the caller scope. OR you can pass const reference, preventing the callee from modifying it, and have the callee copy out whatever it needs to to modify.
I'm surprised that nobody else has posted this, but it sounds like what's closest to what you want is simply to remove the reference, and put the function call in a seperate scope
int func(A s){ //removed the &
//s is "owned" by func and changes don't touch anyone else's A objects
s.a[2] += 4;
return s.a[2];
}
int main(){
{
A s;
printf("%d\n", func(s));
// s hasn't changed, func had it's own copy.
} // s goes out of scope and is deleted
//s is now undefined
}
You could also have it pointer based if you prefer that. The code sample below changes your func(A) to take a reference to a pointer so it can be turned safely to null after deallocation.
It's a little bit hooky but it is possible if you absolutely require it. It may be a premature optimization that can be avoided however.
int func(A*& s) {
int retVal = s->a[2] + 4;
delete s;
s = NULL;
return retVal;
}
int main() {
A* s = new A();
printf("%d\n", func(s));
}
You can use the old C++03 auto_ptr for this. The calling function is left with an auto_ptr that points to nullptr, while the function took over and deleted s.
int func(std::auto_ptr<A> s){
//s is "owned" by func and be undefined in the calling function
s.a[2] += 4;
return s.a[2];
}
int main(){
std::auto_ptr<A> s = new A();
printf("%d\n", func(s)); //destructive copy of pointer
//s is now nullptr
}
Related
I'm trying to ensure an object - wrapped by a shared_ptr - is alive as long as a function is executed by passing it as value. However inside the function the object is not used at all, so I just want to use it for 'pinning':
void doSomething(std::shared_ptr<Foo>) {
// Perform some operations unrelated to the passed shared_ptr.
}
int main() {
auto myFoo{std::make_shared<Foo>()};
doSomething(std::move(myFoo)); // Is 'myFoo' kept alive until doSomething returns?
return 0;
}
I did check the behavior on different optimization-levels (GCC) and it seems that it works as intended, however I don't know whether the compiler still may optimize it away in certain scenarios.
You don't need to worry - the lifetime of the function argument at the call site is guaranteed to survive the function call. (This is why things like foo(s.c_str()) for a std::string s work.)
A compiler is not allowed to break that rule, subject to as if rule flexibility.
This very much depends on what the body of doSomething and Foo will actually look like. For instance, consider the following example:
struct X
{
~X() { std::cout << "2"; };
};
void f(std::shared_ptr<X>) { std::cout << "1"; }
int main()
{
auto p = std::make_shared<X>();
f(std::move(p));
}
This program has the very same observable effect as:
int main()
{
std::cout << "12";
}
and the order "12" is guaranteed. So, in the generated assembly, there may be no shared pointer used. However, most compilers will likely not perform such aggressive optimizations since there are dynamic memory allocations and virtual function calls involved internally, which is not that easy to optimize away.
The compiler could optimise away the copying of an object into a function argument if the function is being inlined and if the copying has no side effects.
Copying a shared_ptr increments its reference count so it does have side effects so the compiler can't optimise it away (unless the compiler can prove to itself that not modifying the reference count has no effect on the program).
How can we return a variable by reference while the scope of the returning function has gone and its vars have been destroyed as soon as returning the var?
And if we make as the following to avoid that:
int fr = 9;
int& foo() {
//const int& k = 5;
return fr;
};
I will ask must we declare the returned var as a global var?
You can return a function local static variable instead of a global variable, of course:
int& foo() {
static int rc = 9;
return rc;
}
Note, however, that you still effectively have a global variable with all its problems, e.g., potentially concurrent access from multiple threads. At least, starting with C++11 the initialization of function local static variable is thread-safe: a function local static variable is initialized upon the first execution of the declaration statement.
Use the static keyword so that its scope remains throughout the code.
Example:-
int& fun(){
static int a =5;
return a;
}
int main()
{
int &b=fun();
cout<<b;
}
You can create a class and introduce a member, which you return as reference. This would be more transparent, than the 'static function member' solution, but requires more overhead, so that it is only reasonable, if you need a class anyway.
class Foo {
public:
Foo() ;
int& getFoo() {return myFoo;}
private:
int myFoo;
};
Note: OP and the other answers suggest variations on returning a pre-existing object (global, static in function, member variable). This answer, however, discusses returning a variable whose lifetime starts in the function, which I thought was the spirit of the question, i.e.:
how can we return a variable by reference while the scope of the returning function has gone and its vars have been destroyed as soon as returning the var.
The only way to return by reference a new object is by dynamically allocating it:
int& foo() {
return *(new int);
}
Then, later on:
delete &myref;
Now, of course, that is not the usual way of doing things, nor what people expect when they see a function that returns a reference. See all the caveats at Deleting a reference.
It could make some sense, though, if the object is one of those that "commits suicide" later by calling delete this. Again, this is not typical C++ either. More information about that at Is delete this allowed?.
Instead, when you want to return an object that is constructed inside a function, what you usually do is either:
Return by value (possibly taking advantage of copy elision).
Return a dynamically allocated object (either returning a raw pointer to it or a class wrapping it, e.g. a smart pointer).
But neither of these two approaches return the actual object by reference.
I'm assuming it's an academic example to examine a principle because the obvious way to code it would otherwise be to return by value.
With this precondition in mind this looks like a use case for smart pointers. You would wrap the variable in a smart pointer and return by value. This is similar to #Acorns answer but the variable will self delete once it is no longer being referred to, so no need for an explicit delete.
I know that returning temporary variables using references doesn't work since the temporary object is lost after the function terminates, but the following piece of code works since the returned temporary is assigned to another object.
I assume the temporary objects get destroyed after the line of function call. If it is so, why isn't this working for this kind of method chaining?
Counter& Counter::doubler()
{
Counter tmp;
tmp.i = this->i * 2;
return tmp;
}
int main()
{
Counter d(2);
Counter d1, d2;
d1 = d.doubler(); // normal function call
std::cout << "d1=" << d1.get() << std::endl; // Output : d1=4
d2 = d.doubler().doubler(); // Method chaining
std::cout << "d2=" << d2.get() << std::endl; // Output : d2=0
return 0;
}
If a function returns a reference to a local object, the object will be destroyed as soon as the function returns (as local objects are). It does not persist to the end of the line of the function call.
Accessing an object after it has been destroyed will yield unpredictable results. Sometimes it may work, for some definition of "work", and sometimes it may not. Just don't do it.
Counter& doubler()
{
Counter tmp;
tmp.i=this->i*2;
return tmp;
}
It's undefined behaviour. After return from function - your reference will be dangling, since Counter destructor will be called for local object tmp.
The real question is not "why this kind of method chaining is not working?", but instead "why the first ('normal') function call works?"
The answer is there's no way to tell, because it might as well break your program.
To state it clearly: returning temporary object by reference is undefined behavior. Which, of course, means that it might work by coincidence today and stop working tomorrow. All bets are off.
When a function returns and stack roll back happens it is logical rollback the stack pointer is set with different value. If a function returns a local variable reference then memory location pointing to local may still be with process and has same bits set. However this is not guaranteed and after few more calls will not be valid and may result in undefined behavior.
Other are all right, in that 'just do not fiddle around with references to local objects'
But as to why it works in one case and not in other
When you call it singly, and when the function returns, the object is still lying on the stack. Granted a 'destructed' object - but whatever space the object used to take is still there on the stack. If you have a simple object, like with a single int member, then there is NOTHING disturbing it on the stack, unless you code allocated something else on the stack, or the destructor decided to do a much thorough job and obliterate an integer member (which most destructors do not do). Granted yada yada, but till the very next line not much is going to happen that would move it from the stack. Your reference is pointing to a valid memory location and your (destructed) object would be there. That is why it works for you.
When you call it chained, see the first call returns you a reference to that tmp on the stack. As explained in #1 above, no problem so far. Your (destructed) tmp is still very much there on stack. But notice the moment you call that second doubler. Where is the tmp inside that second doubler function call going to come up? Right where the tmp from your first call was!!! The second call overwrites the object (the tmp with value 4) with a tmp with value 0 (the default constructed one). The second call is in effect made on a Counter which has 0 value, hence you get 0. Extremely tricky - that is why just forget about fiddling with returning references to local variable.
Now Purists may scream - undefined, no no just don't do it - and I am with them - I have myself said twice (now thrice) that do not do it. But people may try it. I bet for a 'simple' object like the following, AND code exactly as in the question (so as to nothing is disturbing the stack), everyone is going to get consistent 4, 0 - no randomness, no undefined ....
class Counter
{
public:
Counter()
{
i = 0;
}
Counter(int k)
{
i = k;
}
int get()
{
return i;
}
int i;
Counter& doubler();
};
I used to think returning a reference is bad as our returned reference will refer to some garbage value. But this code works (matrix is a class):
const int max_matrix_temp = 7;
matrix&get_matrix_temp()
{
static int nbuf = 0;
static matrix buf[max_matrix_temp];
if(nbuf == max_matrix_temp)
nbuf = 0;
return buf[nbuf++];
}
matrix& operator+(const matrix&arg1, const matrix&arg2)
{
matrix& res = get_matrix_temp();
//...
return res;
}
What is buf doing here and how does it save us from having garbage values?
buf is declared as static, meaning it retains it's value between calls to the function:
static matrix buf[max_matrix_temp];
i.e. it's not created on the stack as int i = 0; would be (a non-static local variable), so returning a reference to it is perfectly safe.
This following code is dangerous, because the memory for the variable's value is on the stack, so when the function returns and we move back up the stack to the previous function, all of the memory reservations local to the function cease to exist:
int * GetAnInt()
{
int i = 0; // create i on the stack
return &i; // return a pointer to that memory address
}
Once we've returned, we have a pointer to a piece of memory on the stack, and with dumb luck it will hold the value we want because it's not been overwritten yet — but the reference is invalid as the memory is now free for use as and when space on the stack is required.
I see no buf declared anywhere, which means it doesn't go out of scope with function return, so it's okay. (it actually looks like it's meant to be matrixbuf which is also fine, because it's static).
EDIT: Thanks to R. Martinho Fernandes for the guess. Of course it is matrix buf, so it makes buf static array in which temporary is allocated to make sure it doesn't get freed when the function returns and therefore the return value is still valid.
This is safe up to a point, but very dangerous. The returned reference
can't dangle, but if client code keeps it around, at some future point,
the client is in for a big surprise, as his value suddenly changes to a
new return value. And if you call get_matrix_temp more than
max_matrix_temp times in a single expression, you're going to end up
overwriting data as well.
In the days before std::string, in code using printf, I used this
technique for returning conversions of user defined types, where a
"%s" specifier was used, and the argument was a call to a formatting
function. Again, max_matrix_temp was the weak point: a single
printf which formatted more instances of my type would output wrong
data. It was a bad idea then, and it's a worse idea now.
I came across an issue today regarding local variables. I learned that...
int * somefunc()
{
int x = 5;
return &x;
}
int * y = somefunc();
//do something
is bad, unsafe, etc. I'd imagine that the case is the same for...
int * somefunc()
{
int * x = new int;
x = 5;
return x;
}
int * y = somefunc();
//do something
delete y;
I've been under the impression for the longest time that this would be safe as the address of x stays in scope when it's returned. However, I'm having second thoughts now and I'm thinking this would lead to memory leaks and other problems, just as the fist example would. Can someone confirm this for me?
As it stands, the second example is wrong. You probably meant this:
int * somefunc()
{
int * x = new int;
*x = 5; // note the dereferencing of x here
return x;
}
Now this is technically fine, but it is prone to errors. First, if after the allocation of x an exception happens, you have to catch it, delete x and then rethrow, or you get a memory-leak. Second, if you return a pointer, the caller has to delete it - callers forget.
The recommended way would be to return a smart pointer, like boost::shared_ptr. This would solve the problems mentioned above. To understand why, read about RAII.
Yes, you're taking the risk of leaking memory. (compile errors aside.)
Doing this for an int is silly, but the principle is the same even if it's a large structure.
But understand: you've written C-style code, where you have a function that allocates storage.
If you're trying to learn C++, you should put somefunc() and the data it operates on into a class. Methods and data together. A class can also do RAII as Space_C0wb0y pointed out.
You might be making int * as just an example, but really, in the case you noted, there is not a reason to return int *, just return int, the actual value is more than good enough. I see these situations all the time, getting overly complicated, when, what is actually needed, is just to simplify.
In the case of 'int *', I can only really think of a realistic case of returning an array of ints, if so, then you need to allocate that, return that, and hopefully, in your documentation, note that it has to be released.
The first approach certainly leads to problems, as you are now well aware.
The second is kind of OK, but demands attention from the programmer because he needs to explicitly delete the returned pointer (as you did). This is harder when your application grows larger, using this method will probably cause problems (memory leaks) as the programmer will find it difficult to keep track of every single variable he needs to deallocate.
A 3rd approach for this scenario, is to pass a variable by reference to be used inside the function, which is way safer.
void somefunc(int& value)
{
value = 5;
}
// some code that calls somefunc()
int a_value = 0;
somefunc(a_value);
// printing a_value will display 5
(Edited)
Yes, the second is fine, so long as you dereference that 'x' before assigning!
Ok, I would analyze this by answering these questions:
What does x contain ? - A memory location(since it is a pointer
variable)
What is the scope of x? - Since it a a auto variable it's scope is
limited to the function somefunc()
What happens to auto variables once they exit the local scope ? - They are
deleted from the stack space.
So what happens to x now after return from somefunc()? - Since it is
an auto variable declared on the stack
, it's scope(lifetime) is limited to
somefunc() and hence will be deleted.
Ok so now, what happens to the value pointed to by x? We have a
memory leak as the value is allocated
on the heap and we have just lost the
address when x is deleted.
What does y get? - No idea.
What happens when y is deleted? - No idea.
The point is not to return a pointer or reference to a local variable, because once the function returns, locals don't exist.
However, the return value still exists, and dynamically allocated memory certainly exists as well.
In C++, we prefer to avoid raw pointers whenever possible. To "return a value that already exists" (i.e. the function does not create a new value), use a reference. To "return a value that didn't already exist" (i.e. the function creates a new value, in the idiomatic sense, not the new keyword sense) use a value, or if necessary, some kind of smart pointer wrapper.
It's both memory leak and a crash (because of the delete).