STL heap containing pointers to objects - c++

I have an std::list<MyObject*> objectList container that I need to sort and maintain in the following scenario:
Each object has a certain field that supplies a cost (a float value for example). That cost value is used to compare two objects as if they were floating point numbers
The collection must be ordered (ascending) and must quickly find the correct position for a newly inserted element.
It is possible to delete the lowest element (in terms of cost) and it is also possible to update the cost of several arbitrarily positioned elements. The list must be then reordered as fast as possible, taking advantage of its already sorted nature.
Could I use any other stl container/mechanism to allow for the three behavioral properties? It pretty much resembles a heap and I thought using make_heap could be a good way to sort the list. I need to have a container of pointers, since there are several other data structures that rely on these pointers.
How then can I choose a better container that's also pointer friendly and allows sorting by looking at the comparison operators of the pointed types?
CLARIFICATION: I need an stl container that best fits the scenario and can successfully wrap pointers or references for that matter. (For example, I read briefly that the std::set container could be a good candidate, but I have no experience with it).
A current implementation, based on the below answers:
struct SHafleEdgeComparatorFunctor
{
bool operator()(SHEEdge* lhs, SHEEdge* rhs)
{
return (*lhs) < rhs;
}
};
std::multiset<SHEEdge*, SHafleEdgeComparatorFunctor> m_edges;
Of course, the SHEEdge data structure has an overloaded operator:
bool operator<(SHEEdge* rhs)
{
return this->GetCollapseError() < rhs->GetCollapseError();
}

I would indeed use std::set. The tricky bit in your requirements is to update existing elements.
A std::set is always sorted. You will have to either wrap your pointers in a class with a useful compare operator or you have to pass a comparison predicate to the set.
Then you get the sorted property automatically and you get constant time removal of the lowest element.
You also get updating of the cost value in log complexity: Simply remove the object from the set and re-add it. This will be as fast as it can be for a sorted container.
Inserting, and deleting is fast in a set.

I'd start using a smart pointer like shared_ptr instead of a raw pointer (raw pointers are good e.g. if they are observing pointers, like pointers passed as function parameters, but when you have ownership semantics, like in this case, it's better to use a smart pointer).
Then, I'd start with std::vector as a container.
So, try make it vector<shared_ptr<MyObject>>.
You can measure performance of it compared to list<shared_ptr<MyObject>>.
(Note also that std::list has kind of more overhead than std::vector, since it's a node-based container, and each node has some overhead; instead std::vector allocates a contiguous chunk of memory to store its data, in this case the shared_ptrs; so std::vector is also more "cache-friendly", etc.)
In general, std::vector offers very good performance, and it's a good option as a "first choice" container. In any case, your mileage may very, and the best thing is to measure performance (speed) to get a better understanding in your particular case.

if I understand correctly what you are asking, you are looking for the correct container to use.
Indeed std::set seems to be the correct container for the kind of things what you want to do, but it will depend on all the use cases
Do you need to have O(1) access to the elements?
What is the operation used that will have the most important cost?
std::set uses a key to sort the elements and doesn't allow having duplicates (if you want duplicates, have a look at std::multiset). When you add an element, it will automatically be inserted in the correct position. Generally you don't want to use raw pointers as the key, as objects can be null.
Another alternative could be to use a std::vector<std::shared_ptr>>, as #MikePro said, it is a good practice to have the pointers inside smart pointers, to prevent having to manually delete them (and avoid any memory leak in case of an exception for example). If you use a vector, you will have to use functions like std::sort, std::find present in <algorithm> header or std::vector::insert.
Generally this image helps finding your container. It's not perfect (as you have to know a bit more than what is displayed) but it usually does its job well:

Related

How to use C++ std::sets as building blocks of a class?

I need a data structure that satisfies the following:
stores an arbitrary number of elements, where each element is described by 10 numeric metrics
allows fast (log n) search of elements by any of the metrics
allows fast (log n) insertion of new elements
allows fast (log n) removal of elements
And let's assume that the elements are expensive to construct.
I came up with the following plan
store all elements in a vector called DATA.
use 10 std::sets, one for each of 10 metrics. Each std:set is light-weight, it contains only integers, which are indexes into the vector DATA. The comparison operators 'look up' the appropriate element in DATA and then select the appropriate metric
template&lt int C &gt
struct Cmp
{
bool operator() (int const a, int const b)
{
return ( DATA[a].coords[C] != DATA[b].coords[C] )
? ( DATA[a].coords[C] &lt DATA[b].coords[C] )
: ( a &lt b );
}
};
Elements are never modified or removed from a vector. A new element is pushed back to DATA and then its index (DATA.size()-1) is inserted into the sets (set<int, Cmp<..> >). To remove an element, I set a flag in the element saying that it is deleted (without actually removing it from the DATA vector) and then remove the element index from all ten std::sets.
This works fine as long as DATA is a global variable. (It also somewhat abuses the type system by making the templated struct Cmp dependent on a global variable.)
However, I was not able to enclose the DATA vector and std::set's (set<int, Cmp<...> >) inside a class and then 'index' DATA with those std::sets. For starters, the comparison operator Cmp defined inside an outer class has no access to the outer class' fields (so it cannot assess DATA). I also cannot pass the vector to the Cmp constructor because Cmp is being constructed by std::set and std::set expects a comparison operator with a constructor that has no arguments.
I have a feeling I'm working against C++ type system and trying to achieve something that the type system is purposely preventing me from doing. (I'm trying to make std::set depend on a variable that is going to be constructed only at runtime.) And while I understand why the type system might not like what I do, I think this is a legitimate use case.
Is there a way to implement the data structure/class I described above without providing a re-implementation of std::set/red-black tree? I hope there may be a trick I have not thought of yet. (And yes, I know that boost has something, but I'd like to stick to the standard library.)
When I read something like "look up foo by a value bar", my initial reaction is to use a map<> or something similar. There are some implications to this though:
Keys in an std::map (or values in an std::set) are unique, so no two elements can share the same key and accordingly no two data objects would be able to have the same metric. If multiple data objects can have the same metric (this isn't clear from your question), using an std::multimap (or std::multiset) would work though.
If the keys are constant and stored within the elements themselves, using a set<data*,cmp> is a common approach. The comparator then just retrieves the according field from the objects and compares them. Lookup then requires creating a temporary object and using find() with it. Some implementations also have an extension that allows searching with a different type, which would make this much easier but also make porting require actual work.
It is important that the fields used as keys remain constant though, because if you modify them, you implicitly change the order of the set<>. This is the reason that a set<>'s elements are effectively constant, i.e. even a plain iterator has a constant as value type. If you store pointers though, you can easily get around that, because a constant pointer is something different than a pointer to a constant. Don't shoot yourself into the foot with that!
If the metrics are not so much a property of the objects themselves (or you don't mind redundantly storing them), using an std::map would be a natural choice. Storing the same object under multiple keys, depending on the metric, can be done in separate containers (map<int,data*> c[10];). However, you can do that in a single map using e.g. a pair<metric,value> as key (map<pair<int,int>,data*> c;).
Using a vector<> to store the actual elements and only referencing them as either pointers or indices in a map surely works. I'd take the pointers though, as this is what allows the above approaches using a set or map to work. Without that, the comparator would have to store a reference to the container, where at the moment it just uses the global DATA container. Getting this to work with a vector is tricky though, since it reallocates its elements when growing, as you correctly pointed out. I'd consider a different container type, like std::list or std::deque. The former would allow erasing elements, too, but it has a higher per-element overhead. The latter has a relatively low per-element overhead, only slightly above std::vector. You could then even go so far as to store iterators instead of pointers, which helps debugging provided you use a "checked STL" for that. Still, you will have to do some manual bookkeeping which object is still referenced somewhere and which one isn't.
Instead of using a separate container, you could also allocate the elements dynamically, although that itself has some overhead. If the overhead per element is not an issue, you could then use reference-counted smart pointers. If the application is a one-shot process, you could also use raw pointers and let the OS reclaim the memory on exit.
Note that I assume that storing multiple copies of the data objects is not an option. If that was the case, you could just as well have a map<int,data> m[10];, where each map stores its own copy of the data objects. All the bookkeeping issues would then be resolved, but at the price of a 10x overhead.

vector vs. list from stl - remove method

std::list has a remove method, while the std::vector doesn't. What is the reason for that?
std::list<>::remove is a physical removal method, which can be implemented by physically destroying list elements that satisfy certain criteria (by physical destruction I mean the end of element's storage duration). Physical removal is only applicable to lists. It cannot be applied to arrays, like std::vector<>. It simply is not possible to physically end storage duration of an individual element of an array. Arrays can only be created and destroyed as a whole. This is why std::vector<> does not have anything similar to std::list<>::remove.
The universal removal method applicable to all modifiable sequences is what one might call logical removal: the target elements are "removed" from the sequence by overwriting their values with values of elements located further down in the sequence. I.e. the sequence is shifted and compacted by copying the persistent data "to the left". Such logical removal is implemented by freestanding functions, like std::remove. Such functions are applicable in equal degree to both std::vector<> and std::list<>.
In cases where the approach based on immediate physical removal of specific elements applies, it will work more efficiently than the generic approach I referred above as logical removal. That is why it was worth providing it specifically for std::list<>.
std::list::remove removes all items in a list that match the provided value.
std::list<int> myList;
// fill it with numbers
myList.remove(10); // physically removes all instances of 10 from the list
It has a similar function, std::list::remove_if, which allows you to specify some other predicate.
std::list::remove (which physically removes the elements) is required to be a member function as it needs to know about the memory structure (that is, it must update the previous and next pointers for each item that needs to be updated, and remove the items), and the entire function is done in linear time (a single iteration of the list can remove all of the requested elements without invalidating any of the iterators pointing to items that remain).
You cannot physically remove a single element from a std::vector. You either reallocate the entire vector, or you move every element after the removed items and adjust the size member. The "cleanest" implementation of that set of operations would be to do
// within some instance of vector
void vector::remove(const T& t)
{
erase(std::remove(t), end());
}
Which would require std::vector to depend on <algorithm> (something that is currently not required).
As the "sorting" is needed to remove the items without multiple allocations and copies being required. (You do not need to sort a list to physically remove elements).
Contrary to what others are saying, it has nothing to do with speed. It has to do with the algorithm needing to know how the data is stored in memory.
As a side note: This is also a similar reason why std::remove (and friends) do not actually remove the items from the container they operate on; they just move all the ones that are not going to be removed to the "front" of the container. Without the knowledge of how to actually remove an object from a container, the generic algorithm cannot actually do the removing.
Consider the implementation details of both containers. A vector has to provide a continuous memory block for storage. In order to remove an element at index n != N (with N being the vector's length), all elements from n+1 to N-1 need to be moved. The various functions in the <algorithm> header implement that behavior, like std::remove or std::remove_if. The advantage of these being free-standing functions is that they can work for any type that offers the needed iterators.
A list on the other hand, is implemented as a linked list structure, so:
It's fast to remove an element from anywhere
It's impossible to do it as efficiently using iterators (since the internal structure has to be known and manipulated).
In general in STL the logic is "if it can be done efficiently - then it's a class member. If it's inefficient - then it's an outside function"
This way they make the distinction between "correct" (i.e. "efficient") use of classes vs. "incorrect" (inefficient) use.
For example, random access iterators have a += operator, while other iterators use the std::advance function.
And in this case - removing elements from an std::list is very efficient as you don't need to move the remaining values like you do in std::vector
It's all about efficiency AND reference/pointer/iterator validity. list items can be removed without disturbing any other pointers and iterators. This is not true for a vector and other containers in all but the most trivial cases. Nothing prevents use the external strategy, but you have a superior options.. That said this fellow said it better than I could on a duplicate question
From another poster on a duplicate question:
The question is not why std::vector does not offer the operation, but
rather why does std::list offer it. The design of the STL is focused
on the separation of the containers and the algorithms by means of
iterators, and in all cases where an algorithm can be implemented
efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can
be implemented much more efficiently with knowledge of the container.
That is the case of removing elements from a container. The cost of
using the remove-erase idiom is linear in the size of the container
(which cannot be reduced much), but that hides the fact that in the
worst case all but one of those operations are copies of the objects
(the only element that matches is the first), and those copies can
represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity
of the operation will still be linear, but the associated cost for
each one of the elements removed is very low, a couple of pointer
copies and releasing of a node in memory. At the same time, the
implementation as part of the list can offer stronger guarantees:
pointers, references and iterators to elements that are not erased do
not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific
container is std::list::sort, that uses mergesort that is less
efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with
iterators unless there is a strong reason to provide a particular
implementation in a concrete container.
std::list is designed to work like a linked list. That is, it is designed (you might say optimized) for constant time insertion and removal ... but access is relatively slow (as it typically requires traversing the list).
std::vector is designed for constant-time access, like an array. So it is optimized for random access ... but insertion and removal are really only supposed to be done at the "tail" or "end", elsewhere they're typically going to be much slower.
Different data structures with different purposes ... therefore different operations.
To remove an element from a container you have to find it first. There's a big difference between sorted and unsorted vectors, so in general, it's not possible to implement an efficient remove method for the vector.

When the data structure is a template parameter, how can I tell if an operation will invalidate an iterator?

Specifically, I have a class which currently uses a vector and push_back. There is an element inside the vector I want to keep track of. Pushing back on the vector may invalidate the iterator, so I keep its index around. It's cheap to find the iterator again using the index. I can't reserve the vector as I don't know how many items will be inserted.
I've considered making the data structure a template parameter, and perhaps list may be used instead. In that case, finding an iterator from the index is not a trivial operation. Since pushing back on a list doesn't invalidate iterators to existing elements, I could just store this iterator instead.
But how can I code a generic class which handles both cases easily?
If I can find out whether push_back will invalidate the iterator, I could store the iterator and update it after each push_back by storing the distance from the beginning before the operation.
You should probably try to avoid this flexibility. Quote from Item 2 "Beware the illusion of container-independent code" from Effective STL by Scott Meyers:
Face the truth: it's not worth it. The different containers are
different, and they have strengths and weaknesses that vary in significant ways. They're not designed to be interchangeable, and
there's littel you can do to paper that over. If you try, you're
merely tempting fate, and fate doesn't like to be tempted.
If you really, positively, definitely have to maintain valid iterators, use std::list. If you also need to have random access, try Boost.MultiIndex (although you'll lose contiguous memory access).
If you look at the standard container adapators (std::stack, std::queue) you see that they support the intersection of the adaptable containers interfaces, not their union.
I'd create a second class, which responsibility would be to return the iterator you are interested in.
It should also be parametrized with the same template parameter, and then you can specialize it for any type you want (vector/list etc). So inside your specializations you can use any method you want.
So it's some traits-based solution.
If you really want to stick with the vector and have that functionality maybe take a look at
http://en.cppreference.com/w/cpp/container/vector/capacity function. wrap your push_backs in defined function or even better wrap whole std::vector in ur class and before push_backing compare capacity against size() to check if resize will happen.

Iterating through a sequence while modifying it. Use vector or List ? C++/ STL

Suppose I have a long sequence of unordered elements S s1, s2, s3,.... of a arbitrary but fixed data type through which I wish to iterate and delete certain elements according to some boolean criterion.
Now if after iterating through the sequence if I am not interested in the final ordering of the sequence then I can store my sequence in 2 ways
Use a plain ol' std::list to represent the sequence. Perform removal with the std::list methods.
Use a std::vector to represent the sequence. If a certain element fails the criterion and has to be deleted swap it with the last vector element and perform a pop_back.
My questions are
1.Which would be a better/efficient way timewise and/or memorywise to store my sequence?
2.If I had to venture a guess, then I would say list, because if si 's data-type memory size is large, swapping would be expensive. Would this reasoning be correct?
In practice, std::vector has a great performance advantage over other containers due to its tight memory locality. If your elements are moreover movable (i.e. inexpensive to swap), then your second option should be your first try. Implement it with the standard remove/erase idiom:
v.erase(std::remove_if(v.begin(), v.end(), predicate), v.end());
You should also set up a second version with a std::list and compare the performance:
l.remove_if(predicate);
The list avoids moving any elements around, so in theory it could be efficient, however the practical effects of memory locality cannot be captured by the language standard and you cannot get around measuring and comparing the actual performance.
(Supposedly, if your element type is huge, like sizeof(T) > 10000, the list will probably start being faster than the vector. Test and compare, and keep your code modular such that changing this later is easy.)
If you have a C++11 compiler, or atleast an rvalue reference aware one, using swap will cost you nothing if your data type isn't flat (i.e., contains pointers to external resources or in other words, is expensive to copy) since it will just move your structs around. So if you have such a compiler, create a move constructor (read up on that) for the data type, and you're set. Just use a std::vector from there on.
Now if your structs are flat (no external resources), and are large, you might really want to use a std::list, since the memory overhead would be reasonably small in comparision to your data type's size. Since you only seem to be interested in bidirectional/sequential access to the elements, this might be just the right place to use a list.
A last point, and an important factor, measure. The default container to reach for should always be std::vector. Measure how both perform before blindly deciding on one. Another important factor is if you actually need to do anything else with the containers, like random-access or such stuff.
Edit: Before I forget, you might also just want to create a view over the container holding your data, which might be very cheap.
We can only guess. I'd say that if the objects are easily copiable (e.g. basic types) then std::vector will be more efficient, as removing elements will not alloc/realloc/free any memory. But if the cost of copying elements is significant, then the std::list will be better.
But note that with C++11, the copy will be converted into a move, so you should consider the moving cost, that will be presumably quite less than the copy.
In almost all practical cases, use std::vector initially. As always, write your code first then optimise later, if and when it is needed. If your profiler indicates that vector's inefficiencies are the cause, then try a list. I've almost never seen a performance benefit from it though.

Container access and allocation through the same operator?

I have created a container for generic, weak-type data which is accessible through the subscript operator.
The std::map container allows both data access and element insertion through the operator, whereas std::vector I think doesn't.
What is the best (C++ style) way to proceed? Should I allow allocation through the subscript operator or have a separate insert method?
EDIT
I should say, I'm not asking if I should use vector or map, I just wanted to know what people thought about accessing and inserting being combined in this way.
In the case of Vectors: Subscript notation does not insert -- it overwrites.
This rest of this post distils the information from item 1-5 of Effective STL.
If you know the range of your data before hand -- and the size is fixed -- and you won't insert at locations which has data above it -- then you can use insert into vectors without unpleasant side-effects.
However in the general case vector insertions have implications such as shifting members upward and doubling memory when exhausted (which causes a flood of copies from the old vector's objects to locations in the new vector ) when you make ad hoc insertions. Vectors are designed for when you know the locality characteristics of your data..
Vectors come with an insert member function... and this function is very clever with most implementations in that it can infer optimizations from the iterators your supply. Can't you just use this ?
If you want to do ad-hoc insertions of data, you should use a list. Perhaps you can use a list to collect the data and then once its finalized populate a vector using the range based insert or range based constructor ?
it depends what you want. A map can be significantly slower than a vector if you wish to use the thing like an array. A map is very helpful if the index you want to use is non-sequential and you have LOADS of them. Its usually quicker to just use a vector, sort it and do a binary search to find what you are after. I've used this method to replace maps in tonnes of software and I still haven't found something where it was slower to do this with a vector.
So, IMO, std::vector is the better way, though a map MIGHT be useful if you are using it properly.
Separate insert method, definitely. The operator[] on std::map is just stupid and makes the code hard to read and debug.
Also you can't access data from a const context if you're using a operator[] to insert (which will lead to un-const-cancer, the even-more evil cousin of const-cancer).