C++ vector at(#) replacement clean up memory? - c++

The pains of compiled libraries is never knowing exactly how the memory is managed.
I'm lead to believe that vector's elements are placed on the heap unless explicity told not to.
Being placed on the heap, it obviously needs to be deleted when it is no longer used, which seems to happen when the vector object is deleted.
The question is when the at(#) or operator[] is called does it delete the memory being replaced?
For example:
std::vector<string> secretlyAnArray(5);
secretlyAnArray.at(0) = std::string("Does this memory leak?");
secretlyAnArray.at(0) = std::string("When I overwrite the object?")
Happy to learn better methods to replace data at a specific index of a vector, or just pointing at the documentation that explains it.
Edit 2:
After the helpful comments of Anis and Daniel which are much appreciated; it appears that at(#) returns a reference, which then standard reference rules apply rather than governed by the behavour of vector.

std::vector<std::string> secretlyAnArray(5);
secretlyAnArray.at(0) = std::string("Does this memory leak?");
secretlyAnArray.at(0) = std::string("When I overwrite the object?")
This code assigns an object of type std::string to another object of type std::string. The std::vector itself isn't at all involved in this assignment. Here, it just provides a reference to the destination (assigned-to) object.
What does assignment do for a given type depends on its definition. There is no generic answer. For instance, with std::string, when you copy/move-assign, the original content of the destination object (its string of characters) needs to be "destroyed". std::string does it for you, so, you don't need to care about it. It means that if the string is "long" (with regard to short string optimization; SSO), the memory will be correctly deallocated, if needed.

Related

Scenarios where we force to use Pointers in c++

I had been in an interview and asked to give an example or scenario in CPP where we can't proceed without pointers, means we have to use pointer necessarily.
I have given an example of function returning array whose size is not known then we need to return pointer that is name of the array which is actually a pointer. But the interviewer said its internal to array give some other example.
So please help me with some other scenarios for the same.
If you are using a C Library which has a function that returns a pointer, you have to use pointers then, not a reference.
There are many other cases (explicitly dealing with memory, for instance) - but these two came to my mind first:
linked data-structures
How: You need to reference parts of your structure in multiple places. You use pointers for that, because containers (which also use pointers internally) do not cover all your data-structure needs. For example,
class BinTree {
BinTree *left, *right;
public:
// ...
};
Why there is no alternative: there are no generic tree implementations in the standard (not counting the sorting ones).
pointer-to-implementation pattern (pimpl)
How: Your public .hpp file has the methods, but only refers to internal state via an opaque Whatever *; and your internal implementation actually knows what that means and can access its fields. See:
Is the pImpl idiom really used in practice?
Why there is no alternative: if you provide your implementation in binary-only form, users of the header cannot access internals without decompiling/reverse engineering. It is a much stronger form of privacy.
Anyplace you would want to use a reference, but have to allow for null values
This is common in libraries where if you pass a non zero pointer, it will be set to the value
It is also a convention to have arguments to a function that will be changed to use a pointer, rather than a reference to emphasize that the value can be changed to the user.
Here are some cases:
Objects with large lifetime. You created some object in function. You need this object afterwards (not even copy of it).
But if you created it without pointers, on stack - after function would finish, this object would die. So you need to create this object using dynamic memory and return pointer to it.
Stack space is not enough. You need object which needs lot of memory, hence allocating it on the stack won't fit your needs, since stack has less space than heap usually. So you need to create the object again using dynamic memory on heap and return pointer to it.
You need reference semantics. You have structure which you passed to some function and you want the function to modify this structure, in this case you need to pass a pointer to this structure, otherwise you can't modify the original structure, since copy of it will be passed to the function if you don't use pointers.
Note: in the latter case, indeed using pointer is not necessary, since you can substitute it using reference.
PS. You can browse here for more scenarios, and decide in which cases are pointer usages necessary.
pointers are important for performance example of this are for functions. originally when you pass a value in a function it copies the value from the argument and stores to the parameter
but in pointers you can indirectly access them and do what you want

Should I forget dynamic memory allocation and pointers and always pass by v?

I have noticed that most C++ experts always advice it's better to pass by value, due to RVO. This allows me not worry too much about pointer manipulation and is easier to write code as well. No complaints there. This makes me wonder whether it is the correct approach to not use dynamic memory allocation (on the heap) at all and always pass parameters and return results by value?
This means instead of coming up with signatures like this:
Character* getCharacter(Sprite *sprite, Action* action)
I should more or less stick to signatures like:
Character getCharacter(Sprite sprite, Action action)
Is my understanding correct? or did I juth think i thaw a putthy cath?
They each have there pro's and con's. remember that using words like "always" is an absolute. Only the Dark Side deals in absolutes.
So let's look at each way and when we would use them.
Pass by value is good when the object being passed is smaller (since a local copy gets made). It is also good if you want to be sure to not accidentally change the original data. Its shortcoming is it makes a local copy and that can be bad if it is really big.
Pass by reference only passes a memory address. Therefore, large objects can be passed for a relatively low footprint. Also, with a reference, you can modify the original (this is both good and bad). This enables you to "return" more than one variable (so to speak). So obviously, the big con here is that you can mistakenly change the original data.
Constant pass by reference is generally accepted to be a very strong candidate for doing things. It has the pros of both pass by reference and value. Low footprint since it is a reference AND you can't change the original. There aren't many cons accept for the fact that your use of the variable in the method needs to change a little. Remember, its a const and therefore cannot be modified in the function.
Remember, there is no magic-bullet. Nothing is always better. Determine what you need and select the right tool for the job.
EDIT: also, has been said. Passing is not the same as dynamic allocation. dynamic allocation only happens with the "new" keyword. My suggestion would be to avoid the "new" keyword for now until you have a better understanding of arguments and pointers.
Whether or not you allocate an object on the heap typically is driven by one of the following concerns:
If the new object needs to outlive the function that creates it, the object must be allocated on the heap.
If the object is very large, and does not fit on the stack, then you must allocate it on the heap.
Beyond that, the choice of pass by value or pass by reference is determined by the semantics. If you want to operate on a copy, pass by value. If you want to operate on the actual object, pass by reference.
Your statement is simply utterly untrue. There is some light advice to pass by value instead of the mainstream const-ref in the special case where the function will copy the argument to a local variable anyway.
And for passing by-nonconst-pointer, pass by value was never an alternative. The first implies an optional out or inout param and the second and input param.
And mentioned dynamic allocation in question title just fits no way with the content.
Your understanding in definitely not correct.

c++ vector construct with given memory

I'd like to use a std::vector to control a given piece of memory. First of all I'm pretty sure this isn't good practice, but curiosity has the better of me and I'd like to know how to do this anyway.
The problem I have is a method like this:
vector<float> getRow(unsigned long rowIndex)
{
float* row = _m->getRow(rowIndex); // row is now a piece of memory (of a known size) that I control
vector<float> returnValue(row, row+_m->cols()); // construct a new vec from this data
delete [] row; // delete the original memory
return returnValue; // return the new vector
}
_m is a DLL interface class which returns an array of float which is the callers responsibility to delete. So I'd like to wrap this in a vector and return that to the user.... but this implementation allocates new memory for the vector, copies it, and then deletes the returned memory, then returns the vector.
What I'd like to do is to straight up tell the new vector that it has full control over this block of memory so when it gets deleted that memory gets cleaned up.
UPDATE: The original motivation for this (memory returned from a DLL) has been fairly firmly squashed by a number of responders :) However, I'd love to know the answer to the question anyway... Is there a way to construct a std::vector using a given chunk of pre-allocated memory T* array, and the size of this memory?
The obvious answer is to use a custom allocator, however you might find that is really quite a heavyweight solution for what you need. If you want to do it, the simplest way is to take the allocator defined (as the default scond template argument to vector<>) by the implementation, copy that and make it work as required.
Another solution might be to define a template specialisation of vector, define as much of the interface as you actually need and implement the memory customisation.
Finally, how about defining your own container with a conforming STL interface, defining random access iterators etc. This might be quite easy given that underlying array will map nicely to vector<>, and pointers into it will map to iterators.
Comment on UPDATE: "Is there a way to construct a std::vector using a given chunk of pre-allocated memory T* array, and the size of this memory?"
Surely the simple answer here is "No". Provided you want the result to be a vector<>, then it has to support growing as required, such as through the reserve() method, and that will not be possible for a given fixed allocation. So the real question is really: what exactly do you want to achieve? Something that can be used like vector<>, or something that really does have to in some sense be a vector, and if so, what is that sense?
Vector's default allocator doesn't provide this type of access to its internals. You could do it with your own allocator (vector's second template parameter), but that would change the type of the vector.
It would be much easier if you could write directly into the vector:
vector<float> getRow(unsigned long rowIndex) {
vector<float> row (_m->cols());
_m->getRow(rowIndex, &row[0]); // writes _m->cols() values into &row[0]
return row;
}
Note that &row[0] is a float* and it is guaranteed for vector to store items contiguously.
The most important thing to know here is that different DLL/Modules have different Heaps. This means that any memory that is allocated from a DLL needs to be deleted from that DLL (it's not just a matter of compiler version or delete vs delete[] or whatever). DO NOT PASS MEMORY MANAGEMENT RESPONSIBILITY ACROSS A DLL BOUNDARY. This includes creating a std::vector in a dll and returning it. But it also includes passing a std::vector to the DLL to be filled by the DLL; such an operation is unsafe since you don't know for sure that the std::vector will not try a resize of some kind while it is being filled with values.
There are two options:
Define your own allocator for the std::vector class that uses an allocation function that is guaranteed to reside in the DLL/Module from which the vector was created. This can easily be done with dynamic binding (that is, make the allocator class call some virtual function). Since dynamic binding will look-up in the vtable for the function call, it is guaranteed that it will fall in the code from the DLL/Module that originally created it.
Don't pass the vector object to or from the DLL. You can use, for example, a function getRowBegin() and getRowEnd() that return iterators (i.e. pointers) in the row array (if it is contiguous), and let the user std::copy that into its own, local std::vector object. You could also do it the other way around, pass the iterators begin() and end() to a function like fillRowInto(begin, end).
This problem is very real, although many people neglect it without knowing. Don't underestimate it. I have personally suffered silent bugs related to this issue and it wasn't pretty! It took me months to resolve it.
I have checked in the source code, and boost::shared_ptr and boost::shared_array use dynamic binding (first option above) to deal with this.. however, they are not guaranteed to be binary compatible. Still, this could be a slightly better option (usually binary compatibility is a much lesser problem than memory management across modules).
Your best bet is probably a std::vector<shared_ptr<MatrixCelType>>.
Lots more details in this thread.
If you're trying to change where/how the vector allocates/reallocates/deallocates memory, the allocator template parameter of the vector class is what you're looking for.
If you're simply trying to avoid the overhead of construction, copy construction, assignment, and destruction, then allow the user to instantiate the vector, then pass it to your function by reference. The user is then responsible for construction and destruction.
It sounds like what you're looking for is a form of smart pointer. One that deletes what it points to when it's destroyed. Look into the Boost libraries or roll your own in that case.
The Boost.SmartPtr library contains a whole lot of interesting classes, some of which are dedicated to handle arrays.
For example, behold scoped_array:
int main(int argc, char* argv[])
{
boost::scoped_array<float> array(_m->getRow(atoi(argv[1])));
return 0;
}
The issue, of course, is that scoped_array cannot be copied, so if you really want a std::vector<float>, #Fred Nurk's is probably the best you can get.
In the ideal case you'd want the equivalent to unique_ptr but in array form, however I don't think it's part of the standard.

Know what references an object

I have an object which implements reference counting mechanism. If the number of references to it becomes zero, the object is deleted.
I found that my object is never deleted, even when I am done with it. This is leading to memory overuse. All I have is the number of references to the object and I want to know the places which reference it so that I can write appropriate cleanup code.
Is there some way to accomplish this without having to grep in the source files? (That would be very cumbersome.)
A huge part of getting reference counting (refcounting) done correctly in C++ is to use Resource Allocation Is Initialization so it's much harder to accidentally leak references. However, this doesn't solve everything with refcounts.
That said, you can implement a debug feature in your refcounting which tracks what is holding references. You can then analyze this information when necessary, and remove it from release builds. (Use a configuration macro similar in purpose to how DEBUG macros are used.)
Exactly how you should implement it is going to depend on all your requirements, but there are two main ways to do this (with a brief overview of differences):
store the information on the referenced object itself
accessible from your debugger
easier to implement
output to a special trace file every time a reference is acquired or released
still available after the program exits (even abnormally)
possible to use while the program is running, without running in your debugger
can be used even in special release builds and sent back to you for analysis
The basic problem, of knowing what is referencing a given object, is hard to solve in general, and will require some work. Compare: can you tell me every person and business that knows your postal address or phone number?
One known weakness of reference counting is that it does not work when there are cyclic references, i.e. (in the simplest case) when one object has a reference to another object which in turn has a reference to the former object. This sounds like a non-issue, but in data structures such as binary trees with back-references to parent nodes, there you are.
If you don't explicitly provide for a list of "reverse" references in the referenced (un-freed) object, I don't see a way to figure out who is referencing it.
In the following suggestions, I assume that you don't want to modify your source, or if so, just a little.
You could of course walk the whole heap / freestore and search for the memory address of your un-freed object, but if its address turns up, it's not guaranteed to actually be a memory address reference; it could just as well be any random floating point number, of anything else. However, if the found value lies inside a block a memory that your application allocated for an object, chances improve a little that it's indeed a pointer to another object.
One possible improvement over this approach would be to modify the memory allocator you use -- e.g. your global operator new -- so that it keeps a list of all allocated memory blocks and their sizes. (In a complete implementation of this, operator delete would have remove the list entry for the freed block of memory.) Now, at the end of your program, you have a clue where to search for the un-freed object's memory address, since you have a list of memory blocks that your program actually used.
The above suggestions don't sound very reliable to me, to be honest; but maybe defining a custom global operator new and operator delete that does some logging / tracing goes in the right direction to solve your problem.
I am assuming you have some class with say addRef() and release() member functions, and you call these when you need to increase and decrease the reference count on each instance, and that the instances that cause problems are on the heap and referred to with raw pointers. The simplest fix may be to replace all pointers to the controlled object with boost::shared_ptr. This is surprisingly easy to do and should enable you to dispense with your own reference counting - you can just make those functions I mentioned do nothing. The main change required in your code is in the signatures of functions that pass or return your pointers. Other places to change are in initializer lists (if you initialize pointers to null) and if()-statements (if you compare pointers with null). The compiler will find all such places after you change the declarations of the pointers.
If you do not want to use the shared_ptr - maybe you want to keep the reference count intrinsic to the class - you can craft your own simple smart pointer just to deal with your class. Then use it to control the lifetime of your class objects. So for example, instead of pointer assignment being done with raw pointers and you "manually" calling addRef(), you just do an assignment of your smart pointer class which includes the addRef() automatically.
I don't think it's possible to do something without code change. With code change you can for example remember the pointers of the objects which increase reference count, and then see what pointer is left and examine it in the debugger. If possible - store more verbose information, such as object name.
I have created one for my needs. You can compare your code with this one and see what's missing. It's not perfect but it should work in most of the cases.
http://sites.google.com/site/grayasm/autopointer
when I use it I do:
util::autopointer<A> aptr=new A();
I never do it like this:
A* ptr = new A();
util::autopointer<A> aptr = ptr;
and later to start fulling around with ptr; That's not allowed.
Further I am using only aptr to refer to this object.
If I am wrong I have now the chance to get corrections. :) See ya!

When is it not a good idea to pass by reference?

This is a memory allocation issue that I've never really understood.
void unleashMonkeyFish()
{
MonkeyFish * monkey_fish = new MonkeyFish();
std::string localname = "Wanda";
monkey_fish->setName(localname);
monkey_fish->go();
}
In the above code, I've created a MonkeyFish object on the heap, assigned it a name, and then unleashed it upon the world. Let's say that ownership of the allocated memory has been transferred to the MonkeyFish object itself - and only the MonkeyFish itself will decide when to die and delete itself.
Now, when I define the "name" data member inside the MonkeyFish class, I can choose one of the following:
std::string name;
std::string & name;
When I define the prototype for the setName() function inside the MonkeyFish class, I can choose one of the following:
void setName( const std::string & parameter_name );
void setName( const std::string parameter_name );
I want to be able to minimize string copies. In fact, I want to eliminate them entirely if I can. So, it seems like I should pass the parameter by reference...right?
What bugs me is that it seems that my localname variable is going to go out of scope once the unleashMonkeyFish() function completes. Does that mean I'm FORCED to pass the parameter by copy? Or can I pass it by reference and "get away with it" somehow?
Basically, I want to avoid these scenarios:
I don't want to set the MonkeyFish's name, only to have the memory for the localname string go away when the unleashMonkeyFish() function terminates. (This seems like it would be very bad.)
I don't want to copy the string if I can help it.
I would prefer not to new localname
What prototype and data member combination should I use?
CLARIFICATION: Several answers suggested using the static keyword to ensure that the memory is not automatically de-allocated when unleashMonkeyFish() ends. Since the ultimate goal of this application is to unleash N MonkeyFish (all of which must have unique names) this is not a viable option. (And yes, MonkeyFish - being fickle creatures - often change their names, sometime several times in a single day.)
EDIT: Greg Hewgil has pointed out that it is illegal to store the name variable as a reference, since it is not being set in the constructor. I'm leaving the mistake in the question as-is, since I think my mistake (and Greg's correction) might be useful to someone seeing this problem for the first time.
One way to do this is to have your string
std::string name;
As the data-member of your object. And then, in the unleashMonkeyFish function create a string like you did, and pass it by reference like you showed
void setName( const std::string & parameter_name ) {
name = parameter_name;
}
It will do what you want - creating one copy to copy the string into your data-member. It's not like it has to re-allocate a new buffer internally if you assign another string. Probably, assigning a new string just copies a few bytes. std::string has the capability to reserve bytes. So you can call "name.reserve(25);" in your constructor and it will likely not reallocate if you assign something smaller. (i have done tests, and it looks like GCC always reallocates if you assign from another std::string, but not if you assign from a c-string. They say they have a copy-on-write string, which would explain that behavior).
The string you create in the unleashMonkeyFish function will automatically release its allocated resources. That's the key feature of those objects - they manage their own stuff. Classes have a destructor that they use to free allocated resources once objects die, std::string has too. In my opinion, you should not worry about having that std::string local in the function. It will not do anything noticeable to your performance anyway most likely. Some std::string implementations (msvc++ afaik) have a small-buffer optimization: For up to some small limit, they keep characters in an embedded buffer instead of allocating from the heap.
Edit:
As it turns out, there is a better way to do this for classes that have an efficient swap implementation (constant time):
void setName(std::string parameter_name) {
name.swap(parameter_name);
}
The reason that this is better, is that now the caller knows that the argument is being copied. Return value optimization and similar optimizations can now be applied easily by the compiler. Consider this case, for example
obj.setName("Mr. " + things.getName());
If you had the setName take a reference, then the temporary created in the argument would be bound to that reference, and within setName it would be copied, and after it returns, the temporary would be destroyed - which was a throw-away product anyway. This is only suboptimal, because the temporary itself could have been used, instead of its copy. Having the parameter not a reference will make the caller see that the argument is being copied anyway, and make the optimizer's job much more easy - because it wouldn't have to inline the call to see that the argument is copied anyway.
For further explanation, read the excellent article BoostCon09/Rvalue-References
If you use the following method declaration:
void setName( const std::string & parameter_name );
then you would also use the member declaration:
std::string name;
and the assignment in the setName body:
name = parameter_name;
You cannot declare the name member as a reference because you must initialise a reference member in the object constructor (which means you couldn't set it in setName).
Finally, your std::string implementation probably uses reference counted strings anyway, so no copy of the actual string data is being made in the assignment. If you're that concerned about performance, you had better be intimately familiar with the STL implementation you are using.
Just to clarify the terminology, you've created MonkeyFish from the heap (using new) and localname on the stack.
Ok, so storing a reference to an object is perfectly legit, but obviously you must be aware of the scope of that object. Much easier to pass the string by reference, then copy to the class member variable. Unless the string is very large, or your performing this operation a lot (and I mean a lot, a lot) then there's really no need to worry.
Can you clarify exactly why you don't want to copy the string?
Edit
An alternative approach is to create a pool of MonkeyName objects. Each MonkeyName stores a pointer to a string. Then get a new MonkeyName by requesting one from the pool (sets the name on the internal string *). Now pass that into the class by reference and perform a straight pointer swap. Of course, the MonkayName object passed in is changed, but if it goes straight back into the pool, that won't make a difference. The only overhead is then the actual setting of the name when you get the MonkeyName from the pool.
... hope that made some sense :)
This is precisely the problem that reference counting is meant to solve. You could use the Boost shared_ptr<> to reference the string object in a way such that it lives at least as long as every pointer at it.
Personally I never trust it, though, preferring to be explicit about the allocation and lifespan of all my objects. litb's solution is preferable.
When the compiler sees ...
std::string localname = "Wanda";
... it will (barring optimization magic) emit 0x57 0x61 0x6E 0x64 0x61 0x00 [Wanda with the null terminator] and store it somewhere in the the static section of your code. Then it will invoke std::string(const char *) and pass it that address. Since the author of the constructor has no way of knowing the lifetime of the supplied const char *, s/he must make a copy. In MonkeyFish::setName(const std::string &), the compiler will see std::string::operator=(const std::string &), and, if your std::string is implemented with copy-on-write semantics, the compiler will emit code to increment the reference count but make no copy.
You will thus pay for one copy. Do you need even one? Do you know at compile time what the names of the MonkeyFish shall be? Do the MonkeyFish ever change their names to something that is not known at compile time? If all the possible names of MonkeyFish are known at compile time, you can avoid all the copying by using a static table of string literals, and implementing MonkeyFish's data member as a const char *.
As a simple rule of thumb store your data as a copy within a class, and pass and return data by (const) reference, use reference counting pointers wherever possible.
I'm not so concerned about copying a few 1000s bytes of string data, until such time that the profiler says it is a significant cost. OTOH I do care that the data structures that hold several 10s of MBs of data don't get copied.
In your example code, yes, you are forced to copy the string at least once. The cleanest solution is defining your object like this:
class MonkeyFish {
public:
void setName( const std::string & parameter_name ) { name = parameter_name; }
private:
std::string name;
};
This will pass a reference to the local string, which is copied into a permanent string inside the object. Any solutions that involve zero copying are extremely fragile, because you would have to be careful that the string you pass stays alive until after the object is deleted. Better not go there unless it's absolutely necessary, and string copies aren't THAT expensive -- worry about that only when you have to. :-)
You could make the string in unleashMonkeyFish static but I don't think that really helps anything (and could be quite bad depending on how this is implemented).
I've moved "down" from higher-level languages (like C#, Java) and have hit this same issue recently. I assume that often the only choice is to copy the string.
If you use a temporary variable to assign the name (as in your sample code) you will eventually have to copy the string to your MonkeyFish object in order to avoid the temporary string object going end-of-scope on you.
As Andrew Flanagan mentioned, you can avoid the string copy by using a local static variable or a constant.
Assuming that that isn't an option, you can at least minimize the number of string copies to exactly one. Pass the string as a reference pointer to setName(), and then perform the copy inside the setName() function itself. This way, you can be sure that the copy is being performed only once.