I'm an experienced coder, but am still relatively new to the STL, and have just come across this problem:
As far as I'm aware, STL containers aren't meant to copy the objects which they contain, or otherwise affect their lifecycles, yet experimentally I'm seeing different results.
In particular, string classes, which are meant to zero out the first character of their underlying storage upon destruction, are still accessible if they are stored in a container before they go out of scope. For instance, consider the following example:
using namespace std;
queue<string> strQueue;
const char *genStr(int i)
{
ostringstream os;
os << "The number i is " << i;
strQueue.push(os.str());
return strQueue.back().data();
}
void useStr()
{
while(!strQueue.empty())
{
cout << strQueue.front() << endl;
strQueue.pop();
}
}
int main(int argc, char **argv)
{
for(int i = 0; i < 40; i++)
{
printf("Retval is: %s\n", genStr(i));
}
useStr();
return 0;
}
As the strings go out of scope when genStr() exits, I would expect the printf to just output "Retval is: ", or at the very least for the call to useStr() to give undefined results, as the memory was stomped on by the repeated allocations from the extra calls, yet both return the appropriate stored strings, without fail.
I'd like to know why this happens, but in lieu of that, I'd be happy just to know whether I can rely on this effect happening with any old object.
Thanks
As far as I'm aware, STL containers
aren't meant to copy the objects which
they contain
Okay, let's stop right there. STL containers do copy their contents, frequently. They copy them when they're inserted, they copy them when the container is resized either automatically or explicitly, and they copy them when the container itself is copied. Lots and lots of copying.
I'm not sure where you got the idea that STL containers don't copy their contents. The only thing that I can think of that's even close is that if you insert a pointer into an STL container, it will copy the pointer itself but not the pointed-to data.
Also, there are no references involved in your code whatsoever, so I'm puzzled as to what the title of this question refers to.
STL containers aren't meant to copy the objects which they contain
The STL is all about making copies. It will make them when you insert objects, and will sometimes make them if the underlying storage gets resized. You may get broken code if the object you are copying becomes invalidated when your function goes out of scope (for example if you add a pointer to a local variable, rather than copying the local variable).
In your case, you aren't copying a reference to a string, you're copying a string. This copied string then exists in the scope of strQueue, so the behavior you are seeing is completely valid and reliable.
Here is another misunderstanding to clear up:
In particular, string classes, which are meant to zero out the first character of their underlying storage upon destruction
C++ doesn't tend to ever do that sort of thing. It would be a hidden cost, and C++ hates hidden costs :) The string destructor won't touch the memory because once the destructor has exited, the object no longer exists. Accessing it is undefined behavior, so the C++ implementation will do whatever is fastest and least wasteful in well defined code.
All the "STL" (I hate that term) collections store copies of the objects passed to them, so the lifetime of the object in the collection is completely independent of the original object. Under normal circumstances, the collection's copy of an object will remain valid until you erase it from the collection or destroy the collection.
What goes into the container is a copy of the object and not the actual object. Similarly what you get back is also a copy. You can access these objects as long as your container is in scope.
Related
I want to provide zero-copy, move based API. I want to move a string from thread A into thread B. Ideologically it seems that move shall be able to simply pass\move data from instance A into new instance B with minimal to none copy operations (mainly for addresses). So all data like data pointers will be simply copied no new instance (constructed via move). So does std::move on std::string garantee that .c_str() returns same result on instance before move and instance created via move constructor?
No. There's no requirement for std::string to use dynamic allocation or to do anything specific with such an allocation if it has one. In fact, modern implementations usually put short strings into the string object itself and don't allocate anything; then moving is the same as copying.
It's important to keep in mind that std::string is not a container, even though it looks very similar to one. Containers make stronger guarantees with respect to their elements than std::string does.
No, it's not guaranteed.
Guaranteeing it would basically prohibit (for one example) the short string optimization, in which the entire body of a short string is stored in the string object itself, rather than being allocated separately on the heap.
At least for now, I think SSO is regarded as important enough that the committee would be extremely reluctant to prohibit it (but that could change--when the original C++98 standard was written, they went to considerable trouble to allow copy-on-write strings, but they are now prohibited).
No,
but if that is needed, an option is to put the string in std::unique_ptr. Personally I would typically not rely on the c_str() value for more than the local scope.
Example, on request:
#include <iostream>
#include <string>
#include <memory>
int main() {
std::string ss("hello");
auto u_str = std::make_unique<std::string>(ss);
std::cout << u_str->c_str() <<std::endl;
std::cout << *u_str <<std::endl;
return 0;
}
if you don't have make_unique (new in C++14).
auto u_str = std::unique_ptr<std::string>(new std::string(ss));
Or just copy the whole implementation from the proposal by S.T.L.:
Ideone example on how to do that
It is documented here, so you can assume that the c_str() result is stable under some conditions.
You cannot however assume that c_str() will remain the same after move.
In practice it will stay in case of long string, but it won't stay for short strings.
I want to provide zero-copy, move based API. I want to move a string from thread A into thread B. Ideologically it seems that move shall be able to simply pass\move data from instance A into new instance B with minimal to none copy operations (mainly for addresses). So all data like data pointers will be simply copied no new instance (constructed via move). So does std::move on std::string garantee that .c_str() returns same result on instance before move and instance created via move constructor?
No. There's no requirement for std::string to use dynamic allocation or to do anything specific with such an allocation if it has one. In fact, modern implementations usually put short strings into the string object itself and don't allocate anything; then moving is the same as copying.
It's important to keep in mind that std::string is not a container, even though it looks very similar to one. Containers make stronger guarantees with respect to their elements than std::string does.
No, it's not guaranteed.
Guaranteeing it would basically prohibit (for one example) the short string optimization, in which the entire body of a short string is stored in the string object itself, rather than being allocated separately on the heap.
At least for now, I think SSO is regarded as important enough that the committee would be extremely reluctant to prohibit it (but that could change--when the original C++98 standard was written, they went to considerable trouble to allow copy-on-write strings, but they are now prohibited).
No,
but if that is needed, an option is to put the string in std::unique_ptr. Personally I would typically not rely on the c_str() value for more than the local scope.
Example, on request:
#include <iostream>
#include <string>
#include <memory>
int main() {
std::string ss("hello");
auto u_str = std::make_unique<std::string>(ss);
std::cout << u_str->c_str() <<std::endl;
std::cout << *u_str <<std::endl;
return 0;
}
if you don't have make_unique (new in C++14).
auto u_str = std::unique_ptr<std::string>(new std::string(ss));
Or just copy the whole implementation from the proposal by S.T.L.:
Ideone example on how to do that
It is documented here, so you can assume that the c_str() result is stable under some conditions.
You cannot however assume that c_str() will remain the same after move.
In practice it will stay in case of long string, but it won't stay for short strings.
Assume the following:
template<typename Item>
class Pipeline
{
[...]
void connect(OutputSide<Item> first, InputSide<Item> second)
{
Queue<Item> queue;
first.setOutputQueue(&queue);
second.setInputQueue(&queue);
queues.push_back(std::move(queue));
}
[...]
std::vector<Queue<Item> > queues;
};
Will the pointers to queue still work in "first" and "second" after the move?
Does std::move invalidate pointers?
No. An object still exists after being moved from, so any pointers to that object are still valid. If the Queue is sensibly implemented, then moving from it should leave it in a valid state (i.e. it's safe to destroy or reassign it); but may change its state (perhaps leaving it empty).
Will the pointers to queue still work in "first" and "second" after the move?
No. They will point to the local object that's been moved from; as described above, you can't make any assumptions about that object's state after the move.
Much worse than that is that when the function returns, it's destroyed, leaving the pointers dangling. They are now invalid, not pointing to any object, and using them will give undefined behaviour.
Perhaps you want them to point to the object that's been moved into queues:
queues.push_back(std::move(queue));
first.setOutputQueue(&queue.back());
second.setInputQueue(&queue.back());
but, since queues is a vector, those pointers will be invalidated when the queue next reallocates its memory.
To fix that problem, use a container like deque or list which doesn't move its elements after insertion. Alternatively, at the cost of an extra level of indirection, you could store (smart) pointers rather than objects, as described in Danvil's answer.
The pointers will not work, because queue is a local object which will be deleted at the end of connect. Even by using std::move you still create a new object at a new memory location. It will just try to use as much as possible from the "old" object.
Additionally the whole thing will not work at all independent of using std::move as push_back possibly has to reallocate. Thus a call to connect may invalidate all your old pointers.
A possible solution is creating Queue objects on the heap. The following suggestion uses C++11:
#include <memory>
template<typename Item>
class Pipeline
{
[...]
void connect(OutputSide<Item> first, InputSide<Item> second)
{
auto queue = std::make_shared<Queue<Item>>();
first.setOutputQueue(queue);
second.setInputQueue(queue);
queues.push_back(queue);
}
[...]
std::vector<std::shared_ptr<Queue<Item>>> queues;
};
Others provided nice and detailed explanations, but your question indicates that you do not understand fully what move does, or what it is designed to do. I'll try to describe it in simple words.
move, as the name implies, is meant to move things. But what can be moved? You cannot move an object once it is allocated somewhere. Recall the new moving-construtcor added recently, which resembles the copy-constructor..
So, it's all about the "contents" of an object. Both copy- and move-constructors are meant to operate on the "contents". So is the std::move. It is meant to move the contents of one object into the target, and in contrast to the copy, it is meant to leave no trace of contents in the original location.
It is meant to be used everywhere where something forces us to make a copy which we don't really care about and which we really'd like to actually omit, and only have the contents already in the target place.
That is, the usage of "move" indicates that a copy will be made, contents will be moved there, and original will be cleared (sometimes some steps might be skipped, but still, that's the basic idea of a move).
This clearly indicates that, even if the original survives, any pointers to the original object will not point to the destination, which received the content. At best, they will point to the original thing that was just 'cleared'. At worst, it will point to a completely unusable thing.
Now look at your code. It fills the queue and then takes the pointer to the original, then moves the queue.
I hope that's clear now what's happening and what you can do with it.
More C++ learning questions. I've been using vectors primarily with raw pointers with a degree of success, however, I've been trying to play with using value objects instead. The first issue I'm running into is compile error in general. I get errors when compiling the code below:
class FileReference {
public:
FileReference(const char* path) : path(string(path)) {};
const std::string path;
};
int main(...) {
std::vector<FileReference> files;
// error C2582: 'operator =' function is unavailable in 'FileReference'
files.push_back(FileReference("d:\\blah\\blah\\blah"));
}
Q1: I'm assuming it's because of somehow specifying a const path, and/or not defining an assignment operator - why wouldn't a default operator work? Does defining const on my object here even I'm assuming it's because I defined a const path, Does const even win me anything here?
Q2: Secondly, in a vector of these value objects, are my objects memory-safe? (meaning, will they get automatically deleted for me). I read here that vectors by default get allocated to the heap -- so does that mean I need to "delete" anything.
Q3: Thirdly, to prevent copying of the entire vector, I have to create a parameter that passes the vector as a reference like:
// static
FileReference::Query(const FileReference& reference, std::vector<FileReference>& files) {
// push stuff into the passed in vector
}
What's the standard for returning large objects that I don't want to die when the function dies. Would I benefit from using a shared_ptr here or something like that?
If any member variables are const, then a default assignment operator can't be created; the compiler doesn't know what you would want to happen. You would have to write your own operator overload, and figure out what behaviour you want. (For this reason, const member variables are often less useful than one might first think.)
So long as you're not taking ownership of raw memory or other resources, then there's nothing to clean up. A std::vector always correctly deletes its contained elements when its lifetime ends, so long as they in turn always correctly clean up their own resources. And in your case, your only member variable is a std:string, which also looks after itself. So you're completely safe.
You could use a shared pointer, but unless you do profiling and identify a bottleneck here, I wouldn't worry about it. In particular, you should read about copy elision, which the compiler can do in many circumstances.
Elements in vector must be assignable from section 23.2.4 Class template vector of the C++ standard:
...the stored object shall meet the requirements of Assignable.
Having a const member makes the class unassignable.
As the elements are being stored by value, they will be destructed when the vector is destroyed or when they are removed from the vector. If the elements were raw pointers, then they would have to be explicitly deleted.
As a Java developer I have the following C++ question.
If I have objects of type A and I want to store a collection of them in an array,
then should I just store pointers to the objects or is it better to store the object itself?
In my opinion it is better to store pointers because:
1) One can easily remove an object, by setting its pointer to null
2) One saves space.
Pointers or just the objects?
You can't put references in an array in C++. You can make an array of pointers, but I'd still prefer a container and of actual objects rather than pointers because:
No chance to leak, exception safety is easier to deal with.
It isn't less space - if you store an array of pointers you need the memory for the object plus the memory for a pointer.
The only times I'd advocate putting pointers (or smart pointers would be better) in a container (or array if you must) is when your object isn't copy construable and assignable (a requirement for containers, pointers always meet this) or you need them to be polymorphic. E.g.
#include <vector>
struct foo {
virtual void it() {}
};
struct bar : public foo {
int a;
virtual void it() {}
};
int main() {
std::vector<foo> v;
v.push_back(bar()); // not doing what you expected! (the temporary bar gets "made into" a foo before storing as a foo and your vector doesn't get a bar added)
std::vector<foo*> v2;
v2.push_back(new bar()); // Fine
}
If you want to go down this road boost pointer containers might be of interest because they do all of the hard work for you.
Removing from arrays or containers.
Assigning NULL doesn't cause there to be any less pointers in your container/array, (it doesn't handle the delete either), the size remains the same but there are now pointers you can't legally dereference. This makes the rest of your code more complex in the form of extra if statements and prohibits things like:
// need to go out of our way to make sure there's no NULL here
std::for_each(v2.begin(),v2.end(), std::mem_fun(&foo::it));
I really dislike the idea of allowing NULLs in sequences of pointers in general because you quickly end up burying all the real work in a sequence of conditional statements. The alternative is that std::vector provides an erase method that takes an iterator so you can write:
v2.erase(v2.begin());
to remove the first or v2.begin()+1 for the second. There's no easy "erase the nth element" method though on std::vector because of the time complexity - if you're doing lots of erasing then there are other containers which might be more appropriate.
For an array you can simulate erasing with:
#include <utility>
#include <iterator>
#include <algorithm>
#include <iostream>
int main() {
int arr[] = {1,2,3,4};
int len = sizeof(arr)/sizeof(*arr);
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
// remove 2nd element, without preserving order:
std::swap(arr[1], arr[len-1]);
len -= 1;
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
// and again, first element:
std::swap(arr[0], arr[len-1]);
len -= 1;
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
}
preserving the order requires a series of shuffles instead of a single swap, which nicely illustrates the complexity of erasing that std::vector faces. Of course by doing this you've just reinvented a pretty big wheel a whole lot less usefully and flexibly than a standard library container would do for you for free!
It sounds like you are confusing references with pointers. C++ has 3 common ways of representing object handles
References
Pointers
Values
Coming from Java the most analogous way is to do so with a pointer. This is likely what you are trying to do here.
How they are stored though has some pretty fundamental effects on their behaviors. When you store as a value you are often dealing with copies of the values. Where pointers are dealing with one object with multiple references. Giving a flat answer of one is better than the other is not really possible without a bit more context on what these objects do
It completely depends on what you want to do... but you're misguided in some ways.
Things you should know are:
You can't set a reference to NULL in C++, though you can set a pointer to NULL.
A reference can only be made to an existing object - it must start initialized as such.
A reference cannot be changed (though the referenced value can be).
You wouldn't save space, in fact you would use more since you're using an object and a reference. If you need to reference the same object multiple times then you save space, but you might as well use a pointer - it's more flexible in MOST (read: not all) scenarios.
A last important one: STL containers (vector, list, etc) have COPY semantics - they cannot work with references. They can work with pointers, but it gets complicated, so for now you should always use copyable objects in those containers and accept that they will be copied, like it or not. The STL is designed to be efficient and safe with copy semantics.
Hope that helps! :)
PS (EDIT): You can use some new features in BOOST/TR1 (google them), and make a container/array of shared_ptr (reference counting smart pointers) which will give you a similar feel to Java's references and garbage collection. There's a flurry of differences but you'll have to read about it yourself - they are a great feature of the new standard.
You should always store objects when possible; that way, the container will manage the objects' lifetimes for you.
Occasionally, you will need to store pointers; most commonly, pointers to a base class where the objects themselves will be of different types. In that case, you need to be careful to manage the lifetime of the objects yourself; ensuring that they are not destroyed while in the container, but that they are destroyed once they are no longer needed.
Unlike Java, setting a pointer to null does not deallocate the object pointed to; instead, you get a memory leak if there are no more pointers to the object. If the object was created using new, then delete must be called at some point. Your best options here are to store smart pointers (shared_ptr, or perhaps unique_ptr if available), or to use Boost's pointer containers.
You can't store references in a container. You could store (naked) pointers instead, but that's prone to errors and is therefore frowned upon.
Thus, the real choice is between storing objects and smart pointers to objects. Both have their uses. My recommendation would be to go with storing objects by value unless the particular situation demands otherwise. This could happen:
if you need to NULL out the object without removing it from the
container;
if you need to store pointers to the same object in
multiple containers;
if you need to treat elements of the container
polymorphically.
One reason to not do it is to save space, since storing elements by value is likely to be more space-efficient.
To add to the answer of aix:
If you want to store polymorphic objects, you must use smart pointers because the containers make a copy, and for derived types only copy the base part (at least the standard ones, I think boost has some containers which work differently). Therefore you'll lose any polymorphic behaviour (and any derived-class state) of your objects.