set vs vector with custom iterator - c++

I understand this question may be quickly flagged as a duplicate of many other more popular questions, but I'll still ask it:
I need a container that provides duplicate checking on insert (like std::set, but allows me to modify elements already present (like std::vector). It should also be relatively fast to search for elements (which would prefer std::set again). Would it be better to use a vector and perhaps a custom duplicate-checking insert_iterator instead of modifying set elements by erasing and reinserting them?
Thanks

What is to stop you from using a std::set? If you need to modify an element, copy it, erase it, then re-insert.

Have you looked into using a map?
Reference
A map may be a good solution to your problem.

If your strings are long and performance is critical then you may be stuck with a custom container that wraps something like a parallel vector<string> and set<string *>. Provide a custom comparator for the set so that it dereferences through the pointer to make the comparisons. To modify an element, remove the pointer from the set, modify the string, then reinsert the pointer.
This get a bit messy when you want to remove container elements, so you would want to use some form of lazy deletion. At that point you are very close to a full-blown free-object pool for your strings.
If you're using a vector of strings in performance-critical code, then watch out for the vector reallocations which would manually copy each string into the new memory chunk. You can bypass that by watching for an upcoming reallocation, creating a new vector of empty strings (pre-reserved to double size), and then using string::swap on each element to move the old data into the new larger vector.
Things will be much nicer when c++0x move semantics are widely available.

Related

Can vector be resized bidirectionally?

Is there any function to resize a vector in both directions?Can we manipulate the pointer or the element from where it starts adding new empty elements?
Here is the entire interface of std::vector. It makes it very obvious there's no direct way to do what you ask, although you can reserve and then insert(begin,...) if you really want to (it has linear complexity, so is usually avoided).
The usual advice would be to use std::deque instead, since it's specifically designed for this operation.
Yes, you can insert elements to the front of the vector using insert:
vec.insert(vec.begin(), numer_of_elements_to_insert, {});
Note, however, that front-inserting into a vector is very inefficient, because it will require moving all the current elements in the vector past the newly inserted ones. If you need a double-ended container, look into std::deque.

Keeping vector of iterators of the data

I have a function :
void get_good_items(const std::vector<T>& data,std::vector<XXX>& good_items);
This function should check all data and find items that satisfies a condition and return where they are in good_items.
what is best instead of std::vector<XXX>?
std::vector<size_t> that contains all good indices.
std::vector<T*> that contain a pointers to the items.
std::vector<std::vector<T>::iterator> that contains iterators to the items.
other ??
EDIT:
What will I do with the good_items?
Many things... one of them is to delete them from the vector and save them in other place. maybe something else later
EDIT 2:
One of the most important for me is how will accessing the items in data will be fast depending on the struct of good_items?
EDIT 3:
I have just relized that my thought was wrong. Is not better to keep raw pointers(or smart) as items of the vector so I can keep the real values of the vector (which are pointers) and I do not afraid of heavy copy because they are just pointers?
If you remove items from the original vector, every one of the methods you listed will be a problem.
If you add items to the original vector, the second and third will be problematic. The first one won't be a problem if you use push_back to add items.
All of them will be fine if you don't modify the original vector.
Given that, I would recommend using std::vector<size_t>.
I would go with std::vector<size_t> or std::vector<T*> because they are easier to type. Otherwise, those three vectors are pretty much equivalent, they all identify positions of elements.
std::vector<size_t> can be made to use a smaller type for indexes if you know the limits.
If you expect that there are going to be many elements in this vector, you may like to consider using boost::dynamic_bitset instead to save memory and increase CPU cache utilization. A bit per element, bit position being the index into the original vector.
If you intend to remove the elements that statisfy the predicate, then erase-remove idiom is the simplest solution.
If you intend to copy such elements, then std::copy_if is the simplest solution.
If you intend to end up with two partitions of the container i.e. one container has the good ones and another the bad ones, then std::partition_copy is a good choice.
For generally allowing the iteration of such elements, an efficient solution is returning a range of such iterators that will check the predicate while iterating. I don't think there are such iterators in the standard library, so you'll need to implement them yourself. Luckily boost already has done that for you: http://www.boost.org/doc/libs/release/libs/iterator/doc/filter_iterator.html
The problem you are solving, from my understanding, is the intersection of two sets, and I would go for the solution from standard library: std::set_intersection

Dynamic size of array in c++?

I am confused. I don't know what containers should I use. I tell you what I need first. Basically I need a container that can stored X number of Object (and the number of objects is unknown, it could be 1 - 50k).
I read a lot, over here array vs list its says: array need to be resized if the number of objects is unknown (I am not sure how to resize an array in C++), and it also stated that if using a linked list, if you want to search certain item, it will loop through (iterate) from first to end (or vice versa) while an array can specify "array object at index".
Then I went for an other solution, map, vector, etc. Like this one: array vs vector. Some responder says never use array.
I am new to C++, I only used array, vector, list and map before. Now, for my case, what kind of container you will recommend me to use? Let me rephrase my requirements:
Need to be a container
The number of objects stored is unknown but is huge (1 - 40k maybe)
I need to loop through the containers to find specific object
std::vector is what you need.
You have to consider 2 things when selecting a stl container.
Data you want to store
Operations you want to perform on the stored data
There wasa good diagram in a question here on SO, which depitcs this, I cannot find the link to it but I had it saved long time ago, here it is:
You cannot resize an array in C++, not sure where you got that one from. The container you need is std::vector.
The general rule is: use std::vector until it doesn't work, then shift to something that does. There are all sorts of theoretical rules about which one is better, depending on the operations, but I've regularly found that std::vector outperforms the others, even when the most frequent operations are things where std::vector is supposedly worse. Locality seems more important than most of the theoretical considerations on a modern machine.
The one reason you might shift from std::vector is because of iterator validity. Inserting into an std::vector may invalidate iterators; inserting into a std::list never.
Do you need to loop through the container, or you have a key or ID for your objects?
If you have a key or ID - you can use map to be able to quickly access the object by it, if the id is the simple index - then you can use vector.
Otherwise you can iterate through any container (they all have iterators) but list would be the best if you want to be memory efficient, and vector if you want to be performance oriented.
You can use vector. But if you need to find objects in the container, then consider using set, multiset or map.

Choosing a STL Container for a very large list

I have a very large list of items (~2 millions) that I want to optimize for access speed. I iterate trough the items using an iterator (++it).
Right now the code is implemented using std:map<std::wstring, STRUCT>.
I wonder if it's worth to change std::map with a std::deque<std::pair<std::wstring, STRUCT>>. I think I would have advantage of using pointer arithmetic and minimize cache miss. It worths ?
I know that profiling is the answer but I need an opinion before implementing this ...
If you know in advance the size, then std::Vector is clearly the way to go it your objects aren't too big.
std::vector<Object> list;
list.reserve(2000000);
And then fill it as usual.
This is the fastest and least memory consuming approach. However, you need to be able to allocate enought continous memory. But excepted if your object are 1kb big, it shouldn't be a problem.
With deque, you would lose ( or would have to re-implement ) the advantage of Key-Value pairs. If it's not essential for your data, I would consider using deque.
Generally, if you're only doing search in this set (no insertions/deletions), you're probably better off using a sorted sequential cointainer, like deque or vector. You can then use simple binary search to find the needed elements. The advantage of using a sequential container is that it is better in terms of memory usage, has very simple implementation, and provides better locality of reference. I'd write one version of the code using vector, and another version of the code using deque, then compare them in terms of preformance to decide which one to use in the final version.
However, if your structure needs to be updated (new elements need to be inserted or old elements have to be deleted frequently), map is better choice. Or maybe, you just have to drop STL containers altogether and just use an in-memory database (see SQLite), but it highly depends on what problem you're solving.
The fastest container to iterate through is usually a vector, so if you want to optimize for iteration at the expense of everything else, use that.
Overall app performance of course will depend how many times you iterate, and how you construct your data in the first place. For a simple test, once your map has been populated you can construct a vector from it as follows:
vector<pair<K,V> > myvec(mymap.begin(), mymap.end());
Where K and V are the key and value types of the map. Then just use the vector iterators in place of the map iterators and compare performance.
Of course, if you want to modify the map in future, then normally it would not be appropriate to optimize for iteration at the expense of everything else.

Container access and allocation through the same operator?

I have created a container for generic, weak-type data which is accessible through the subscript operator.
The std::map container allows both data access and element insertion through the operator, whereas std::vector I think doesn't.
What is the best (C++ style) way to proceed? Should I allow allocation through the subscript operator or have a separate insert method?
EDIT
I should say, I'm not asking if I should use vector or map, I just wanted to know what people thought about accessing and inserting being combined in this way.
In the case of Vectors: Subscript notation does not insert -- it overwrites.
This rest of this post distils the information from item 1-5 of Effective STL.
If you know the range of your data before hand -- and the size is fixed -- and you won't insert at locations which has data above it -- then you can use insert into vectors without unpleasant side-effects.
However in the general case vector insertions have implications such as shifting members upward and doubling memory when exhausted (which causes a flood of copies from the old vector's objects to locations in the new vector ) when you make ad hoc insertions. Vectors are designed for when you know the locality characteristics of your data..
Vectors come with an insert member function... and this function is very clever with most implementations in that it can infer optimizations from the iterators your supply. Can't you just use this ?
If you want to do ad-hoc insertions of data, you should use a list. Perhaps you can use a list to collect the data and then once its finalized populate a vector using the range based insert or range based constructor ?
it depends what you want. A map can be significantly slower than a vector if you wish to use the thing like an array. A map is very helpful if the index you want to use is non-sequential and you have LOADS of them. Its usually quicker to just use a vector, sort it and do a binary search to find what you are after. I've used this method to replace maps in tonnes of software and I still haven't found something where it was slower to do this with a vector.
So, IMO, std::vector is the better way, though a map MIGHT be useful if you are using it properly.
Separate insert method, definitely. The operator[] on std::map is just stupid and makes the code hard to read and debug.
Also you can't access data from a const context if you're using a operator[] to insert (which will lead to un-const-cancer, the even-more evil cousin of const-cancer).