returning local variables - c++

I came across an issue today regarding local variables. I learned that...
int * somefunc()
{
int x = 5;
return &x;
}
int * y = somefunc();
//do something
is bad, unsafe, etc. I'd imagine that the case is the same for...
int * somefunc()
{
int * x = new int;
x = 5;
return x;
}
int * y = somefunc();
//do something
delete y;
I've been under the impression for the longest time that this would be safe as the address of x stays in scope when it's returned. However, I'm having second thoughts now and I'm thinking this would lead to memory leaks and other problems, just as the fist example would. Can someone confirm this for me?

As it stands, the second example is wrong. You probably meant this:
int * somefunc()
{
int * x = new int;
*x = 5; // note the dereferencing of x here
return x;
}
Now this is technically fine, but it is prone to errors. First, if after the allocation of x an exception happens, you have to catch it, delete x and then rethrow, or you get a memory-leak. Second, if you return a pointer, the caller has to delete it - callers forget.
The recommended way would be to return a smart pointer, like boost::shared_ptr. This would solve the problems mentioned above. To understand why, read about RAII.

Yes, you're taking the risk of leaking memory. (compile errors aside.)
Doing this for an int is silly, but the principle is the same even if it's a large structure.
But understand: you've written C-style code, where you have a function that allocates storage.
If you're trying to learn C++, you should put somefunc() and the data it operates on into a class. Methods and data together. A class can also do RAII as Space_C0wb0y pointed out.

You might be making int * as just an example, but really, in the case you noted, there is not a reason to return int *, just return int, the actual value is more than good enough. I see these situations all the time, getting overly complicated, when, what is actually needed, is just to simplify.
In the case of 'int *', I can only really think of a realistic case of returning an array of ints, if so, then you need to allocate that, return that, and hopefully, in your documentation, note that it has to be released.

The first approach certainly leads to problems, as you are now well aware.
The second is kind of OK, but demands attention from the programmer because he needs to explicitly delete the returned pointer (as you did). This is harder when your application grows larger, using this method will probably cause problems (memory leaks) as the programmer will find it difficult to keep track of every single variable he needs to deallocate.
A 3rd approach for this scenario, is to pass a variable by reference to be used inside the function, which is way safer.
void somefunc(int& value)
{
value = 5;
}
// some code that calls somefunc()
int a_value = 0;
somefunc(a_value);
// printing a_value will display 5

(Edited)
Yes, the second is fine, so long as you dereference that 'x' before assigning!

Ok, I would analyze this by answering these questions:
What does x contain ? - A memory location(since it is a pointer
variable)
What is the scope of x? - Since it a a auto variable it's scope is
limited to the function somefunc()
What happens to auto variables once they exit the local scope ? - They are
deleted from the stack space.
So what happens to x now after return from somefunc()? - Since it is
an auto variable declared on the stack
, it's scope(lifetime) is limited to
somefunc() and hence will be deleted.
Ok so now, what happens to the value pointed to by x? We have a
memory leak as the value is allocated
on the heap and we have just lost the
address when x is deleted.
What does y get? - No idea.
What happens when y is deleted? - No idea.

The point is not to return a pointer or reference to a local variable, because once the function returns, locals don't exist.
However, the return value still exists, and dynamically allocated memory certainly exists as well.
In C++, we prefer to avoid raw pointers whenever possible. To "return a value that already exists" (i.e. the function does not create a new value), use a reference. To "return a value that didn't already exist" (i.e. the function creates a new value, in the idiomatic sense, not the new keyword sense) use a value, or if necessary, some kind of smart pointer wrapper.

It's both memory leak and a crash (because of the delete).

Related

Pass By Value/Pointer/Reference Clarification

I need a once-and-for-all clarification on passing by value/pointer/reference.
If I have a variable such as
int SomeInt = 10;
And I want to pass it to a function like
void DoSomething(int Integer)
{
Integer = 1;
}
In my current scenario when passing SomeInt to DoSomething() I want SomeInt's value to be updated based on whatever we do to it inside of DoSomething() as well as be most efficient on memory and performance so I'm not copying the variable around?. That being said which of the following prototypes would accomplish this task?
void DoSomething(int* Integer);
void DoSomething(int& Integer);
How would I actually pass the variable into the function? What is the difference between the previous two prototypes?
Finally if using a function within a class
class SomeClass
{
int MyInteger;
public:
void ChangeValue(int& NewValue)
{
MyInteger = NewValue;
}
};
If I pass an integer into ChangeValue, when the integer I passed in get's deleted will that mean when I try to use MyInteger from within the class it will no longer be useable?
Thank you all for your time, I know this is kind of a basic question but the explanations I keep running into confuse me further.
Functionally, all three of these work:
pass an int and change the return type to int so you can return the new value, usage: x = f(x);
when you plan to set the value without needing to read the initial value, it's much better to use a function like int DoSomething(); so the caller can just say int x = f(); without having to create x on an earlier line and wondering/worrying whether it needs to be initialised to anything before the call.
pass an int& and set it inside the function, usage: int x; x = ? /* if an input */; f(x);
pass an int* and set the pointed-to int inside the function, usage: int x; x = ?; f(&x);
most efficient on memory and performance so I'm not copying the variable around
Given the C++ Standard doesn't dictate how references should be implemented by the compiler, it's a bit dubious trying to reason about their characteristics - if you care compile your code to assembly or machine code and see how it works out on your particular compiler (for specific compiler commandline options etc.). If you need a rule of thumb, assume that references have identical performance characteristics to pointers unless profiling or generated-code inspection suggests otherwise.
For an int you can expect the first version above to be no slower than the pointer version, and possibly be faster, because the int parameter can be passed and returned in a register without ever needing a memory address.
If/when/where the by-pointer version is inlined there's more chance that the potentially slow "needing a memory address so we can pass a pointer" / "having to dereference a pointer to access/update the value" aspect of the pass-by-pointer version can be optimised out (if you've asked the compiler to try), leaving both versions with identical performance....
Still, if you need to ask a question like this I can't imagine you're writing code where these are the important optimisation choices, so a better aim is to do what gives you the cleanest, most intuitive and robust usage for the client code... now - whether that's x = f(x); (where you might forget the leading x =), or f(x) where you might not realise x could be modified, or f(&x) (where some caller might think they can pass nullptr is a reasonable question in its own right, but separate from your performance concerns. FWIW, the C++ FAQ Lite recommends references over pointers for this kind of situation, but I personally reject its reasoning and conclusions - it all boils down to familiarity with either convention, and how often you need to pass const pointer values, or pointer values where nullptr is a valid sentinel, that could be confused with the you-may-modify-me implication hoped for in your scenario... that depends a lot on your coding style, libraries you use, problem domain etc..
Both of your examples
void DoSomething(int* Integer);
void DoSomething(int& Integer);
will accomplish the task. In the first case - with pointer - you need to call the function with DoSomething(&SomeInt);, in the second case - with reference - simpler as DoSomething(SomeInt);
The recommended way is to use references whenever they are sufficient, and pointers only if they are necessary.
You can use either. Function call for first prototype would be
DoSomething(&SomeInt);
and for second prototype
DoSomething(SomeInt);
As was already said before, you can use both. The advantage of the
void DoSomething(int* Integer)
{
*Integer=0xDEADBEEF;
}
DoSomething(&myvariable);
pattern is that it becomes obvious from the call that myvariable is subject to change.
The advantage of the
void DoSomething(int& Integer)
{
Integer=0xDEADBEEF;
}
DoSomething(myvariable);
pattern is that the code in DoSomething is a bit cleaner, DoSomething has a harder time to mess with memory in bad ways and that you might get better code out of it. Disadvantage is that it isn't immediately obvious from reading the call that myvariable might get changed.

What happens if an object resizes its own container?

This is not a question about why you would write code like this, but more as a question about how a method is executed in relation to the object it is tied to.
If I have a struct like:
struct F
{
// some member variables
void doSomething(std::vector<F>& vec)
{
// do some stuff
vec.push_back(F());
// do some more stuff
}
}
And I use it like this:
std::vector<F>(10) vec;
vec[0].doSomething(vec);
What happens if the push_back(...) in doSomething(...) causes the vector to expand? This means that vec[0] would be copied then deleted in the middle of executing its method. This would be no good.
Could someone explain what exactly happens here?
Does the program instantly crash? Does the method just try to operate on data that doesn't exist?
Does the method operate "orphaned" of its object until it runs into a problem like changing the object's state?
I'm interested in how a method call is related to the associated object.
Yes, it's bad. It's possible for your object to be copied (or moved in C++11 if the distinction is relevant to your code) while your are inside doSomething(). So after the push_back() returns, the this pointer may no longer point to the location of your object. For the specific case of vector::push_back(), it's possible that the memory pointed to by this has been freed and the data copied to a new array somewhere else. For other containers (list, for example) that leave their elements in place, this is (probably) not going to cause problems at all.
In practice, it's unlikely that your code is going to crash immediately. The most likely circumstance is a write to free memory and a silent corruption of the state of your F object. You can use tools like valgrind to detect this kind of behavior.
But basically you have the right idea: don't do this, it's not safe.
Could someone explain what exactly happens here?
Yes. If you access the object, after a push_back, resize or insert has reallocated the vector's contents, it's undefined behavior, meaning what actually happens is up to your compiler, your OS, what do some more stuff is and maybe a number of other factors like maybe phase of the moon, air humidity in some distant location,... you name it ;-)
In short, this is (indirectly via the std::vector implemenation) calling the destructor of the object itself, so the lifetime of the object has ended. Further, the memory previously occupied by the object has been released by the vector's allocator. Therefore the use the object's nonstatic members results in undefined behavior, because the this pointer passed to the function does not point to an object any more. You can however access/call static members of the class:
struct F
{
static int i;
static int foo();
double d;
void bar();
// some member variables
void doSomething(std::vector<F>& vec)
{
vec.push_back(F());
int n = foo(); //OK
i += n; //OK
std::cout << d << '\n'; //UB - will most likely crash with access violation
bar(); //UB - what actually happens depends on the
// implementation of bar
}
}

Declare stack variable without specifying the name and get the pointer

It's known that defining a heap variable with new gets the pointer without specifying the name:
Var *p = new Var("name", 1);
But I have to clear the variable pointed to by p with delete p later on in the program.
I want to declare a stack variable so it is automatically cleared after function exits, but I only want to get the pointer, and the following:
Var v("name", 1);
Var *p = &v;
is quite tedious, and specifier v will never be referenced.
Can I declare a stack class instance and get its pointer without specifying its name?
There's two questions hidden in here. The first one is:
Var *p = new Var("name", 1);
But I have to clear the variable pointed to by p with delete p later
on in the program.
I want to declare a stack variable so it is automatically cleared
after function exits
So here, you're asking how to allocate memory without having to explicitly clean it up afterwards. The solution is to use std::unique_ptr:
std::unique_ptr<Var> p(new Var("name", 1));
Voila! unique_ptr will automatically clean itself up, it has virtually no overhead compared to a raw pointer, and it's overloaded the * and -> operators so you can use it just like a raw pointer. Search for "C++11 smart pointers" if you want to know more.
The second question is:
I only want to get the pointer, and the following:
Var v("name", 1);
Var *p = &v;
is quite tedious, and specifier v will never be referenced.
The important point here is that Var *p = &v is completely unnecessary. If you have a function that requires a pointer, you can use &v on the spot:
void SomeFunc(const Var* p);
// ...
Var v("name", 1);
SomeFunc(&v);
There's no need to put &v in a separate variable before passing it into a function that requires a pointer.
The exception is if the function takes a reference to a pointer (or a pointer to a pointer):
void SomeFunc2(Var*& p);
void SomeFunc3(Var** p);
These types of functions are rare in modern C++, and when you see them, you should read the documentation for that function very carefully. More often than not, those functions will allocate memory, and you'll have to free it explicitly with some other function.
There's no way to do this by allocating on the stack. However, you can use std::make_shared for the heap:
#include <memory>
std::shared_ptr<Var> p = std::make_shared<Var>();
At the cost/risk of being more confusing, you can avoid repeating the type in your code in the question ala:
Var v("name", 1), *p = &v;
You could also potentially use alloca, which is provided by most systems and returns a pointer to stack-allocated memory, but then you have to go through a separate painful step to placement new an object into that memory and do your own object destruction. alloca needs to be called inside the function so it's the function stack on which the object is created, and not during the preparation of function arguments (as the variable's memory may be embedded in the stack area the compiler's using to prepare function arguments), which makes it tricky to wrap into some easily reused facility. You could use macros, but they're evil (see Marshall Cline's C++ FAQ for an explanation of that). Overall - not worth the pain....
Anyway, I recommend sticking with the code in your question and not over-thinking this: using &v a few times tends to be easier, and when it's not it's normally not a big deal if there's an unnecessary identifier for the stack-based variable.
I don't think there is a way to overcome it without some overhead (like the shared_ptr). so the shortest way to write it will be:
Var v("name", 1), *p = &v;
Yes, it's possible to return an address to a temporary (i.e. stack) object and assign it to a pointer. However, the compiler might actually discard the object (i.e. cause that section in memory to be overwritten) before the end of the current scope. (TO CLARIFY: THIS MEANS DON'T DO THIS. EVER.) See the discussion in the comments below about the behavior observed in different versions of GCC on different operating systems. (I don't know whether or not the fact that version 4.5.3 only gives a warning instead of an error indicates that this will always be "safe" in the sense that the pointer will be valid everywhere within the current scope if you compile with that particular version of GCC, but I wouldn't count on it.)
Here's the code I used (modified as per Jonathan Leffler's suggestion):
#include <stdio.h>
class Class {
public:
int a;
int b;
Class(int va, int vb){a = va; b = vb;}
};
int main(){
Class *p = &Class(1, 2);
Class *q = &Class(3, 4);
printf("%p: %d,%d\n", (void *)p, p->a, p->b);
printf("%p: %d,%d\n", (void *)q, q->a, q->b);
}
When compiled using GCC 4.5.3 and run (on Windows 7 SP1), this code printed:
0x28ac28: 1,2
0x28ac30: 3,4
When compiled using GCC 4.7.1 and run (on Mac OS X 10.8.3), it printed:
0x7fff51cd04c0: 0,0
0x7fff51cd04d0: 1372390648,32767
In any case, I'm not sure why you wouldn't just declare the variable normally and use &v everywhere you need something "pointer-like" (for instance, in functions that require a pointer as an argument).

Return by reference

Please see the following code snippets. In the second function i am returning a reference. I am declaring a local variable in the function and is returning the address. As the variable is local I believe its life ends as it exits the function. My question is why is it possible to access the value from the caller without any exceptions even though the original variable is deleted?
int& b=funcMulRef(20,3);
int* a= funcMul(20,3);
int* funcMul(int x,int y)
{
int* MulRes = new int;
*MulRes = (x*y);
return MulRes;
}
int& funcMulRef(int x,int y)
{
int MulRes ;
MulRes = (x*y);
return MulRes;
}
Regards,
JOhn
The behaviour of the second function is simply undefined; anything can happen, and in many circumstances, it will appear to work, simply because nothing has overwritten where the result used to be stored on the stack.
You are accessing data that is no longer in scope.
The memory probably still has the data in it though so it appears to work properly but is likely to be reused at any time and the value will be overwritten.
The next time you call any function or allocate a local stack variable it's very likely to reuse that memory for the new data and overwrite what you had there before. It's underfined behavour.
The original value isn't deleted. Just because the action of deleting it will cause some unseen computations.
The value is still there, but the memory space is no longer yours, and is actually undefined.
You are pointing to a space in memory that can be overrun by the program.
No, you shouldn't do this. The result of accessing residual data on the stack is undefined. Beside that, if your return value is of class type, its destructor will have already been called.
Are you trying to avoid temporary objects? If so, you might be interested in this:
http://en.wikipedia.org/wiki/Return_value_optimization
It most likely won't work in these cases :
funcMulRef(10,3) + funcMulRef(100,500)
alternatively, in a more nasty way :
std::cout << "10*3=" << funcMulRef(10,3) << " 100*500=" << funcMulRef(100,500) << std::endl;
gcc will warn for this kind of errors if you use -Wall

preventing memory leak (case-specific)

Consider the following situation:
SomeType *sptr = someFunction();
// do sth with sptr
I am unaware of the internals of someFunction(). Its pretty obvious that the pointer to the object which someFunction() is returning must be either malloc'ed or be a static variable.
Now, I do something with sptr, and quit. clearly the object be still on the heap which is possibly a source of leak.
How do I avoid this?
EDIT:
Are references more safer than pointers.
Do the destructor for SomeType would be called if I do :
{
SomeType &sref = *sptr;
}
Any insights.
You need to read the documentation on someFunction. someFunction needs to clearly define the ownership of the returned pointer (does the caller own it and need to call delete or does someFunction own it and will make sure the the object is destructed sometime in the future).
If the code does not document it's behavior, there is no safe way to use it.
What do you mean by quit? End the process? Your heap is usually destroyed when the process is destroyed. You would only get a leak potential after quitting the process if your asked the operating system to do something for you (like get a file or window handle) and didn't release it.
Also, functions that return pointers need to document very well whose responsibility it is to deallocate the pointer target (if at all), otherwise, you can't know whether you need to delete it yourself or you could delete it by accident (a disaster if you were not meant to do so).
If the documentation of the function doesn't tell you what to do, check the library documentation - sometimes a whole library takes the same policy rather than documenting it in each and every function. If you can't find the answer anywhere, contact the author or give up on the library, since the potential for errors is not worth it, IMHO.
In my experience most functions that return a pointer either allocate it dynamically or return a pointer that is based on the input parameter. In this case, since there are no arguments, I would bet that it is allocated dynamically and you should delete it when you're done. But programming shouldn't be a guessing game.
It's always a good habit to clean up after yourself, don't presume the OS will do it;
There's a good change your IDE or debugger will report memory leak when you quit your application.
What do you have to do ? Well, it depends, but normally you have to release the memory allocated by someFunction(), and the documentation will probably help you with that, either there's an API to release the memory or you have to do it manually with free or delete.
Max
The library should document this.
Either you delete it explicitly after use or you call some release method which makes sure that object (and any other resources it points to*) doesn't leak.
(given a choice) Unless its a huge (in terms of memory) object, I would rather prefer a return by value. Or pass a reference to the function.
If someFunction returns an object for me it should be normal to have a pair function like someFunctionFree which you'll call to release the resources of the SomeType object. All the things needed should be found in the documentation of someFunction (mainly how the object can be freed or if the object is automatically freed). I personally prefer the corresponding deallocation function (a CreateObject/DestroyObject pair).
As others note it's up to the function to enforce it's ownership assumptions in code. Here's one way to do that using what's known as smart pointers:
#include <iostream>
#include <boost/shared_ptr.hpp>
struct Foo
{
Foo( int _x ) : x(_x) {}
~Foo() { std::cout << "Destructing a Foo with x=" << std::hex << x << "\n"; }
int x;
};
typedef boost::shared_ptr<Foo> FooHandle;
FooHandle makeFoo(int _x = 0xDEADBEEF) {
return FooHandle(new Foo(_x));
}
int main()
{
{
FooHandle fh = makeFoo();
}
std::cout<<"No memory leaks here!\n";
return 0;
}