Container access and allocation through the same operator? - c++

I have created a container for generic, weak-type data which is accessible through the subscript operator.
The std::map container allows both data access and element insertion through the operator, whereas std::vector I think doesn't.
What is the best (C++ style) way to proceed? Should I allow allocation through the subscript operator or have a separate insert method?
EDIT
I should say, I'm not asking if I should use vector or map, I just wanted to know what people thought about accessing and inserting being combined in this way.

In the case of Vectors: Subscript notation does not insert -- it overwrites.
This rest of this post distils the information from item 1-5 of Effective STL.
If you know the range of your data before hand -- and the size is fixed -- and you won't insert at locations which has data above it -- then you can use insert into vectors without unpleasant side-effects.
However in the general case vector insertions have implications such as shifting members upward and doubling memory when exhausted (which causes a flood of copies from the old vector's objects to locations in the new vector ) when you make ad hoc insertions. Vectors are designed for when you know the locality characteristics of your data..
Vectors come with an insert member function... and this function is very clever with most implementations in that it can infer optimizations from the iterators your supply. Can't you just use this ?
If you want to do ad-hoc insertions of data, you should use a list. Perhaps you can use a list to collect the data and then once its finalized populate a vector using the range based insert or range based constructor ?

it depends what you want. A map can be significantly slower than a vector if you wish to use the thing like an array. A map is very helpful if the index you want to use is non-sequential and you have LOADS of them. Its usually quicker to just use a vector, sort it and do a binary search to find what you are after. I've used this method to replace maps in tonnes of software and I still haven't found something where it was slower to do this with a vector.
So, IMO, std::vector is the better way, though a map MIGHT be useful if you are using it properly.

Separate insert method, definitely. The operator[] on std::map is just stupid and makes the code hard to read and debug.
Also you can't access data from a const context if you're using a operator[] to insert (which will lead to un-const-cancer, the even-more evil cousin of const-cancer).

Related

STL heap containing pointers to objects

I have an std::list<MyObject*> objectList container that I need to sort and maintain in the following scenario:
Each object has a certain field that supplies a cost (a float value for example). That cost value is used to compare two objects as if they were floating point numbers
The collection must be ordered (ascending) and must quickly find the correct position for a newly inserted element.
It is possible to delete the lowest element (in terms of cost) and it is also possible to update the cost of several arbitrarily positioned elements. The list must be then reordered as fast as possible, taking advantage of its already sorted nature.
Could I use any other stl container/mechanism to allow for the three behavioral properties? It pretty much resembles a heap and I thought using make_heap could be a good way to sort the list. I need to have a container of pointers, since there are several other data structures that rely on these pointers.
How then can I choose a better container that's also pointer friendly and allows sorting by looking at the comparison operators of the pointed types?
CLARIFICATION: I need an stl container that best fits the scenario and can successfully wrap pointers or references for that matter. (For example, I read briefly that the std::set container could be a good candidate, but I have no experience with it).
A current implementation, based on the below answers:
struct SHafleEdgeComparatorFunctor
{
bool operator()(SHEEdge* lhs, SHEEdge* rhs)
{
return (*lhs) < rhs;
}
};
std::multiset<SHEEdge*, SHafleEdgeComparatorFunctor> m_edges;
Of course, the SHEEdge data structure has an overloaded operator:
bool operator<(SHEEdge* rhs)
{
return this->GetCollapseError() < rhs->GetCollapseError();
}
I would indeed use std::set. The tricky bit in your requirements is to update existing elements.
A std::set is always sorted. You will have to either wrap your pointers in a class with a useful compare operator or you have to pass a comparison predicate to the set.
Then you get the sorted property automatically and you get constant time removal of the lowest element.
You also get updating of the cost value in log complexity: Simply remove the object from the set and re-add it. This will be as fast as it can be for a sorted container.
Inserting, and deleting is fast in a set.
I'd start using a smart pointer like shared_ptr instead of a raw pointer (raw pointers are good e.g. if they are observing pointers, like pointers passed as function parameters, but when you have ownership semantics, like in this case, it's better to use a smart pointer).
Then, I'd start with std::vector as a container.
So, try make it vector<shared_ptr<MyObject>>.
You can measure performance of it compared to list<shared_ptr<MyObject>>.
(Note also that std::list has kind of more overhead than std::vector, since it's a node-based container, and each node has some overhead; instead std::vector allocates a contiguous chunk of memory to store its data, in this case the shared_ptrs; so std::vector is also more "cache-friendly", etc.)
In general, std::vector offers very good performance, and it's a good option as a "first choice" container. In any case, your mileage may very, and the best thing is to measure performance (speed) to get a better understanding in your particular case.
if I understand correctly what you are asking, you are looking for the correct container to use.
Indeed std::set seems to be the correct container for the kind of things what you want to do, but it will depend on all the use cases
Do you need to have O(1) access to the elements?
What is the operation used that will have the most important cost?
std::set uses a key to sort the elements and doesn't allow having duplicates (if you want duplicates, have a look at std::multiset). When you add an element, it will automatically be inserted in the correct position. Generally you don't want to use raw pointers as the key, as objects can be null.
Another alternative could be to use a std::vector<std::shared_ptr>>, as #MikePro said, it is a good practice to have the pointers inside smart pointers, to prevent having to manually delete them (and avoid any memory leak in case of an exception for example). If you use a vector, you will have to use functions like std::sort, std::find present in <algorithm> header or std::vector::insert.
Generally this image helps finding your container. It's not perfect (as you have to know a bit more than what is displayed) but it usually does its job well:

Dynamic size of array in c++?

I am confused. I don't know what containers should I use. I tell you what I need first. Basically I need a container that can stored X number of Object (and the number of objects is unknown, it could be 1 - 50k).
I read a lot, over here array vs list its says: array need to be resized if the number of objects is unknown (I am not sure how to resize an array in C++), and it also stated that if using a linked list, if you want to search certain item, it will loop through (iterate) from first to end (or vice versa) while an array can specify "array object at index".
Then I went for an other solution, map, vector, etc. Like this one: array vs vector. Some responder says never use array.
I am new to C++, I only used array, vector, list and map before. Now, for my case, what kind of container you will recommend me to use? Let me rephrase my requirements:
Need to be a container
The number of objects stored is unknown but is huge (1 - 40k maybe)
I need to loop through the containers to find specific object
std::vector is what you need.
You have to consider 2 things when selecting a stl container.
Data you want to store
Operations you want to perform on the stored data
There wasa good diagram in a question here on SO, which depitcs this, I cannot find the link to it but I had it saved long time ago, here it is:
You cannot resize an array in C++, not sure where you got that one from. The container you need is std::vector.
The general rule is: use std::vector until it doesn't work, then shift to something that does. There are all sorts of theoretical rules about which one is better, depending on the operations, but I've regularly found that std::vector outperforms the others, even when the most frequent operations are things where std::vector is supposedly worse. Locality seems more important than most of the theoretical considerations on a modern machine.
The one reason you might shift from std::vector is because of iterator validity. Inserting into an std::vector may invalidate iterators; inserting into a std::list never.
Do you need to loop through the container, or you have a key or ID for your objects?
If you have a key or ID - you can use map to be able to quickly access the object by it, if the id is the simple index - then you can use vector.
Otherwise you can iterate through any container (they all have iterators) but list would be the best if you want to be memory efficient, and vector if you want to be performance oriented.
You can use vector. But if you need to find objects in the container, then consider using set, multiset or map.

Choosing a STL Container for a very large list

I have a very large list of items (~2 millions) that I want to optimize for access speed. I iterate trough the items using an iterator (++it).
Right now the code is implemented using std:map<std::wstring, STRUCT>.
I wonder if it's worth to change std::map with a std::deque<std::pair<std::wstring, STRUCT>>. I think I would have advantage of using pointer arithmetic and minimize cache miss. It worths ?
I know that profiling is the answer but I need an opinion before implementing this ...
If you know in advance the size, then std::Vector is clearly the way to go it your objects aren't too big.
std::vector<Object> list;
list.reserve(2000000);
And then fill it as usual.
This is the fastest and least memory consuming approach. However, you need to be able to allocate enought continous memory. But excepted if your object are 1kb big, it shouldn't be a problem.
With deque, you would lose ( or would have to re-implement ) the advantage of Key-Value pairs. If it's not essential for your data, I would consider using deque.
Generally, if you're only doing search in this set (no insertions/deletions), you're probably better off using a sorted sequential cointainer, like deque or vector. You can then use simple binary search to find the needed elements. The advantage of using a sequential container is that it is better in terms of memory usage, has very simple implementation, and provides better locality of reference. I'd write one version of the code using vector, and another version of the code using deque, then compare them in terms of preformance to decide which one to use in the final version.
However, if your structure needs to be updated (new elements need to be inserted or old elements have to be deleted frequently), map is better choice. Or maybe, you just have to drop STL containers altogether and just use an in-memory database (see SQLite), but it highly depends on what problem you're solving.
The fastest container to iterate through is usually a vector, so if you want to optimize for iteration at the expense of everything else, use that.
Overall app performance of course will depend how many times you iterate, and how you construct your data in the first place. For a simple test, once your map has been populated you can construct a vector from it as follows:
vector<pair<K,V> > myvec(mymap.begin(), mymap.end());
Where K and V are the key and value types of the map. Then just use the vector iterators in place of the map iterators and compare performance.
Of course, if you want to modify the map in future, then normally it would not be appropriate to optimize for iteration at the expense of everything else.

is it possible for set to have std::vector as underlying storage for storage of its elements?

For small collections std::vector is almost certainly the best container whatever the operations applied to it are. Is it possible to have std::vector as underlying storage for the elements set container instead red-black tree involving a lot of heap allocations (maybe boost has something?) or do I have to invent it myself?
Plain std::vector and std::sort is not an option due to performance reasons and std::inplace_merge is prone to coding errors (invalidation of iterators, etc..).
EDIT: clarified the question
There is no way to specify the underlying structure of an STL set. At best you can write an allocator that uses a vector to provide the memory used by set which may or may not be what you want.
for small size all containers are pretty efficient; just use set unless you know that you have a performance problem
in your case
using vector trades functionality (sorting, uniqueness) for storage size
using set does the opposite
If you need sorting and uniqueness then choose the container with that feature unless you are sure its a bad trade
If you mean can you have
std::set<std::vector<MyType> > myIdealContainer;
then the answer is yes, provided you are able to meaningfully wrap the vector in something that makes it sortable (so set can order its members). Watch out for copying inefficiency though.
If you mean can I instantiate set with vector as the storage for a custom allocator, then I don't know how you would do that (or why you would want to).
If you mean can you treat a vector the same way you would a set, then the answer is no. if your dataset is small and matching the container member is cheap, use vector, preserve ordering on inserts and scan linearly for matches using std::find. If dataset is large and/or matching is expensive, use set.
I think you are looking for boost::container::flat_set
flat_set is similar to std::set but it's implemented like an ordered vector.
No, it is not possible to specify the container to use for std::set, you can do that only with container adapters like std::queue or std::stack. std::set is one of the basic container with its own performance requirement. std::vector may not be the best container for all cases. For example, of you want a good lookup performance you would chose set, as find is O(log n) where as for vector it is O(n)
Maybe i've misunderstood you but if you are trying to use a std::set which has a std::vector for data storage (so all data of the set is actually stored int the vector), then the answer should be "no".
The reason for this is simply that the c++ std::set implementation is a binary search tree and a std::vector manages just a simple array/memory block.

STL-like vector with arbitrary index range

What I want is something similar to STL vector when it comes to access complexity, reallocation on resize, etc. I want it to support arbitrary index range, for example there could be elements indexed from -2 to +7 or from +5 to +10. I want to be able to push_front efficiently. Also I want two-way resize...
I know I could write something like this myself, but if there is an already written library that supports this please tell me.
Deque is very much like a vector in that it supports random access and efficient insertion at the end and it also supports efficient insertion at the beginning.
Map supports access based on arbitrary keys, you can have any range you want, or even a sparsely populated array. Iteration over the collection is slow.
Unordered map (tr1) is similar to map except it supports better iteration.
As a general rule of thumb, use a vector (in your case adapt it to the behaviour you want) and only change when you have evidence that the vector is causing slowness.
It seems the only difference between what you want and a vector is the offset you require for accessing elements, which you take care if by overloading operator [] or something. Unless I didn't understand what you meant by two-way resize.
Here you go, double-ended vector
http://dl.dropbox.com/u/9496269/devector.h
usage:
to reserve memory before begin(), use reserve(new_back_capacity, new_front_capcity);
except for when using push_front(), pop_front() and squeeze() the front capcity is always preserved.
squeeze() flushes all unused memory
default namespace; stdext
concept:
in most cases equivalent to ::std::vector but with ability to push_front
no performance differences compared to ::std::vector (as different from ::std::deque)
four bytes of overhead compared to ::std::vector
If you want 2 way resize, etc... you could create your own vector class with 2 vectors inside
one for the 0 and positive values and another for the negative ones.
Then just implement the common functions and add new ones (ex: push_begin to add to the negative indexes vector), and update the correspndent vectors inside.