Container front and back behavior - c++

Here is the quote from "Effective STL":
When you get an object from a container (via. e.g., front or back), what you set is a copy of what was
contained. Copy in, copy out. That's the STL way.
I have had a hard time understanding this part. As far as I know front returns the reference of the first element (at least for std::vector).
Could you please explain above sentence?

This was actually an error in an earlier edition of the book. From the errata:
! 6/29/01 jk 20 The first para of Item 3 is incorrect: front 7/25/04
and back do NOT return copies of elements, they
return references to elements. I
removed all mention of front and back.
So the explanation of the sentence is: woops, time to get a new edition!

The idea with a statement like that is that when you want to get an element out of the container, you don't keep a reference or pointer to the element in the container, you create a copy of it (from the reference those methods return). The function returns, for back() and front(), are secondary concerns and likely confuse the issue - even the errata removed the mention of them.
Containers can undergo reallocation (especially vector) with you not necessarily being notified by the container, the elements are moved in memory and suddenly you have an invalid reference or pointer.
Bear in mind the time of the advice, before move semantics and movable objects etc. But the general principle still applies, don't keep references or pointers to objects that could become invalid.
"Value semantics" is a strong theme that not only runs through the standard library, but the whole of C++.

Related

Sharing a `std::list` without adding a (redundant) reference to it

I have a conainter, lets say a std::list<int>, which I would like to share between objects. One of the objects is known to live longer than the others, so he will hold the container. In order to be able to access the list, the other objects may have a pointer to the list.
Since the holder object might get moved, I'll need to wrap the list with a unique_ptr:
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::list<int>& list; };
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects:
class LongLiveHolder { std::unique_ptr<NonExistentListNode<int>> back; };
class ShortLiveObject { NonExistentListNode<int>& back; };
, which would save me a redundant dereference when accessing the list, except that I would no longer have the full std::list interface to use with the shorter-lived object- just the node pointers.
Can I somehow get rid of this extra layer of indirection, while still having the std::list interface in the shorter-lived object?
Preface
You may be overthinking the cost of the extra indirection from the std::unique_ptr (unless you have a lot of these lists and you know that usages of them will be frequent and intermixed with other procedures). In general, I'd first trust my compiler to do smart things. If you want to know the cost, do performance profiling.
The main purpose of the std::unique_ptr in your use-case is just to have shared data with a stable address when other data that reference it gets moved. If you use the list member of the long-lived object multiple times in a single procedure, you can possibly help your compiler to help you (and also get some nicer-to-read code) when you use the list through the long-lived object by making a variable in the scope of the procedure that stores a reference to the std::list pointed to by the std::unique_ptr like:
void fn(LongLiveHolder& holder) {
auto& list {holder.list.get()};
list.<some_operation_1>(...);
list.<some_operation_2>(...);
list.<some_operation_3>(...);
}
But again, you should inspect the generated machine code and do performance profiling if you really want to know what kind of difference it makes.
If Context Permits, Write your own List
You said:
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects: [...]
Considering Changes in what is the First Node
What if the first node of the list is allowed to be deleted? What if a new node is allowed to be inserted at the beginning of the list? You'd need a very specific context for those to not be requirements. What you want in your short-lived object is a view abstractions which supports the same interface as the actual list but just doesn't manage the lifetime of the list contents. If you implement the view abstraction as a pointer to the list's first node, then how will the view object know about changes to what the "real"/lifetime-managing list considers to be the first node? It can't- unless the lifetime-managing list keeps an internal list of all views of itself which are alive and also updates those (which itself is a performance and space overhead), and even then, what about the reverse? If the view abstraction was used to change what's considered the first node, how would the lifetime-managing list know about that change? The simplest, sane solution is to have an extra level of indirection: make the view point to the list instead of to what was the list's first node when the view was created.
Considering Requirements on Time Complexity of getting the list size
I'm pretty sure a std::list can't just hold pointers to front and back nodes. For one thing, since c++11 requires that std::list::size() is O(1), std::list probably has to keep track of its size at all times in a counter member- either storing it in itself, or doing some kind of size-tracking in each node struct, or some other implementation-defined behaviour. I'm pretty sure the simplest and most performant way to have multiple moveable references (non-const pointers) to something that needs to do this kind of bookkeeping is to just add another level of indirection.
You could try to "skip" the indirection layer required by the bookkeeping for specific cases that don't require that information, which is the iterators/node-pointers approach, which I'll comment on later. I can't think of a better place or way to store that bookkeeping other than with the collection itself. Ie. If the list interface has requirements that require such bookkeeping, an extra layer of indirection for each user of the list implementation has a very strong design rationale.
If Context Permits
If you don't care about having O(1) to get the size of your list, and you know that what is considered the first node will not change for the lifetime of the short-lived object, then you can write your own List class list-view class and make your own context-specific optimizations. That's one of the big selling-points of languages like C++: You get a nice standard library that does commonly useful things, and when you have a specific scenario where some features of those tools aren't required and are resulting in unnecessary overhead, you can build your own tool/abstraction (or possibly use someone else's library).
Commentary on std::unique_ptr + reference
Your first snippet works, but you can probably get some better implicit constructors and such for SortLiveObject by using std::reference_wrapper, since the default implicity-declared copy-assignment and default-construct functions get deleted when there's a reference member.
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::reference_wrapper<std::list<int>> list; };
Commentary on std::shared_ptr + std::weak_ref
Like #Adrian Maire suggested, std::shared_ptr in the longer-lived, object which might move while the shorter-lived object exists, and std::weak_ptr in the shorter-lived object is a working approach, but it probably has more overhead (at least coming from the ref-count) than using std::unique_ptr + a reference, and I can't think of any generalized pros, so I wouldn't suggest it unless you already had some other reason to use a std::shared_ptr. In the scenario you gave, I'm pretty sure you do not.
Commentary on Storing iterators/node-pointers in the short-lived object
#Daniel Langr already commented about this, but I'll try to expand.
Specifically for std::list, there is a possible standard-compliant solution (with several caveats) that doesn't have the extra indirection of the smart pointer. Caveats:
You must be okay with only having an iterator interface for the shorter-lived object (which you indicated that you are not).
The front and back iterators must be stable for the lifetime of the shorter-lived object. (the iterators should not be deleted from the list, and the shorter-lived object won't see new list entries that are pushed to the front or back by someone using the longer-lived object).
From cppreference.com's page for std::list's constructors:
After container move construction (overload (8)), references, pointers, and iterators (other than the end iterator) to other remain valid, but refer to elements that are now in *this. The current standard makes this guarantee via the blanket statement in [container.requirements.general]/12, and a more direct guarantee is under consideration via LWG 2321.
From cppreference.com's page for std::list:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
But I am not a language lawyer. I could be missing something important.
Also, you replied to Daniel saying:
Some iterators get invalid when moving the container (e.g. insert_iterator) #DanielLangr
Yes, so if you want to be able to make std::input_iterators, use the std::unique_ptr + reference approach and construct short-lived std::input_iterators when needed instead of trying to store long-lived ones.
If the list owner will be moved, then you need some memory address to share somehow.
You already indicated the unique_ptr. It's a decent solution if the non-owners don't need to save it internally.
The std::shared_ptr is an obvious alternative.
Finally, you can have a std::shared_ptr in the owner object, and pass std::weak_ptr to non-owners.

C++ map::erase behaviour

I wish to take a reference to an object out of one C++ std::map and put it in another - eg to move an object from a low to a high priority tree.
If I erase the object from one tree, do I invalidate it completely? What if I have it in both trees at once and then erase (ie the object is referenced twice).
The documentation implies that the object will be invalidated in either case, but it's not 100% clear IMHO.
update here's an example - but looks to me that this just isn't going to work - using references at least - I'll have to use some form of pointer...
PartialPage oldPage = hTree->oldestPage();
lTree->insertOldPage(oldPage);
hTree->erase(oldPage);
NB oldestPage() returns a reference to a PartialPage
Containers own the elements within them, by virtue of storing their own copies. Your object is not in "both containers" at any time.
If you wish to introduced shared-resource semantics, you will have to store std::shared_ptr<PartialPage>s in your maps.

Does std::move invalidate pointers?

Assume the following:
template<typename Item>
class Pipeline
{
[...]
void connect(OutputSide<Item> first, InputSide<Item> second)
{
Queue<Item> queue;
first.setOutputQueue(&queue);
second.setInputQueue(&queue);
queues.push_back(std::move(queue));
}
[...]
std::vector<Queue<Item> > queues;
};
Will the pointers to queue still work in "first" and "second" after the move?
Does std::move invalidate pointers?
No. An object still exists after being moved from, so any pointers to that object are still valid. If the Queue is sensibly implemented, then moving from it should leave it in a valid state (i.e. it's safe to destroy or reassign it); but may change its state (perhaps leaving it empty).
Will the pointers to queue still work in "first" and "second" after the move?
No. They will point to the local object that's been moved from; as described above, you can't make any assumptions about that object's state after the move.
Much worse than that is that when the function returns, it's destroyed, leaving the pointers dangling. They are now invalid, not pointing to any object, and using them will give undefined behaviour.
Perhaps you want them to point to the object that's been moved into queues:
queues.push_back(std::move(queue));
first.setOutputQueue(&queue.back());
second.setInputQueue(&queue.back());
but, since queues is a vector, those pointers will be invalidated when the queue next reallocates its memory.
To fix that problem, use a container like deque or list which doesn't move its elements after insertion. Alternatively, at the cost of an extra level of indirection, you could store (smart) pointers rather than objects, as described in Danvil's answer.
The pointers will not work, because queue is a local object which will be deleted at the end of connect. Even by using std::move you still create a new object at a new memory location. It will just try to use as much as possible from the "old" object.
Additionally the whole thing will not work at all independent of using std::move as push_back possibly has to reallocate. Thus a call to connect may invalidate all your old pointers.
A possible solution is creating Queue objects on the heap. The following suggestion uses C++11:
#include <memory>
template<typename Item>
class Pipeline
{
[...]
void connect(OutputSide<Item> first, InputSide<Item> second)
{
auto queue = std::make_shared<Queue<Item>>();
first.setOutputQueue(queue);
second.setInputQueue(queue);
queues.push_back(queue);
}
[...]
std::vector<std::shared_ptr<Queue<Item>>> queues;
};
Others provided nice and detailed explanations, but your question indicates that you do not understand fully what move does, or what it is designed to do. I'll try to describe it in simple words.
move, as the name implies, is meant to move things. But what can be moved? You cannot move an object once it is allocated somewhere. Recall the new moving-construtcor added recently, which resembles the copy-constructor..
So, it's all about the "contents" of an object. Both copy- and move-constructors are meant to operate on the "contents". So is the std::move. It is meant to move the contents of one object into the target, and in contrast to the copy, it is meant to leave no trace of contents in the original location.
It is meant to be used everywhere where something forces us to make a copy which we don't really care about and which we really'd like to actually omit, and only have the contents already in the target place.
That is, the usage of "move" indicates that a copy will be made, contents will be moved there, and original will be cleared (sometimes some steps might be skipped, but still, that's the basic idea of a move).
This clearly indicates that, even if the original survives, any pointers to the original object will not point to the destination, which received the content. At best, they will point to the original thing that was just 'cleared'. At worst, it will point to a completely unusable thing.
Now look at your code. It fills the queue and then takes the pointer to the original, then moves the queue.
I hope that's clear now what's happening and what you can do with it.

vector copy constructor C++ : does it have to be linear time?

I have a vector containing objects of type STL map, and I do vector.push_back(some map).
This unfortunately calls the map copy constructor, and wastes a lot of time. I understand that i can get around this by keeping a vector of (smart) pointers to maps - but this got me wondering - I read that STL anyway keeps its data on the heap and not on the stack - so why is the copy ctor not O(1) time, by simply copying pointers?
If you don't need the original map anymore after pushing back a copy back into the vector, write:
some_vector.push_back(std::move(some_map));
If you don't have a C++11 compiler yet, add an empty map and then swap that with the original:
some_vector.resize(some_vector.size() + 1);
some_vector.back().swap(some_map);
To answer your question directly: to do that, it would have to start with some sort of copy on write mechanism -- when you put something into a vector, it's required to be a copy of the original (or at least act like one). For example, if I push a map onto my vector, and then remove an item from the original map, that item should still be there in the copy of the map that was pushed onto the vector.
Then it would have to keep track of all the pointers, and ensure that the pointee (the map in this case) remained valid until all those pointers were themselves destroyed. It's certainly possible to do that. Quite a few languages, for example, provide garbage collection largely for this reason. Most of those change the semantics of things, so when/if you (for example) create a vector of maps, putting a map into the vector has reference semantics -- i.e., when you modify the original map, that's supposed to change any "copies" of it that you put into other collections.
As you've observed, you can do any/all of the above in C++ if you really want. The reason it doesn't right now is that most of the C++ standard library is built around value semantics instead of reference semantics. Either is (IMO, anyway) a perfectly valid and reasonable approach -- some languages take one, others take the other. Either/both can work just fine, but value semantics happens to be the choice that was made in C++.
If you want to copy pointers, create a vector of pointers to map. You can do that.
std::vector<std::map<A,B>* > x;
It doesn't do this automatically because it can't know who you want to manage the memory. Should the objects of the map be destroyed when the vector goes out of scope. What if the original map is still in scope?

QMap/QHash operator[] returned reference validity

I was wondering for how long the reference to a value inside a Qt container, especially a QHash or a QMap is valid. By valid I mean if it is guaranteed to still point to the correct location inside the map/hash after inserting or removing other elements.
Let's the following code:
QHash<char,int> dict; // or QMap<char,int> dict;
dict.insert('a', 1);
int& val(dict['a']);
dict.insert('b', 2);
val = 3; // < will this work or lead to a segfault
Will setting the value at the last line correctly update the value associated with a to 3 or will it lead to a segfault or will it be undefined (so work sometimes, segfault other times, depending on whether the data structure had to be reorganized internally, like resizing of the hash-table array). Is the behavior the same for QMap and QHash, or will one work and the other not?
This is fully covered in the documentation — you must have missed it!
Iterators of both types are invalidated when the data in the container
is modified or detached from implicitly shared copies due to a call to
a non-const member function.
So, although I would expect iterators/references to remain valid in practice in the scenario you described above, you shall not rely on this. Using them in this way shall invoke Undefined Behaviour.
This holds for QHashIterator and QMutableHashIterator, as well as bare references. Beware of non-authoritative references claiming the opposite, relying on implementation details that may change at any time.
There is nothing wrong with using references at QMap/QHash elements unless you delete the node you are referencing to. The elements of qt containers do not get reallocated every time a new elements is inserted. However I cannot see any good reason for using references to container elements.
For more details check this excellent article about qt containers internal implementation