STL-like vector with arbitrary index range - c++

What I want is something similar to STL vector when it comes to access complexity, reallocation on resize, etc. I want it to support arbitrary index range, for example there could be elements indexed from -2 to +7 or from +5 to +10. I want to be able to push_front efficiently. Also I want two-way resize...
I know I could write something like this myself, but if there is an already written library that supports this please tell me.

Deque is very much like a vector in that it supports random access and efficient insertion at the end and it also supports efficient insertion at the beginning.
Map supports access based on arbitrary keys, you can have any range you want, or even a sparsely populated array. Iteration over the collection is slow.
Unordered map (tr1) is similar to map except it supports better iteration.
As a general rule of thumb, use a vector (in your case adapt it to the behaviour you want) and only change when you have evidence that the vector is causing slowness.

It seems the only difference between what you want and a vector is the offset you require for accessing elements, which you take care if by overloading operator [] or something. Unless I didn't understand what you meant by two-way resize.

Here you go, double-ended vector
http://dl.dropbox.com/u/9496269/devector.h
usage:
to reserve memory before begin(), use reserve(new_back_capacity, new_front_capcity);
except for when using push_front(), pop_front() and squeeze() the front capcity is always preserved.
squeeze() flushes all unused memory
default namespace; stdext
concept:
in most cases equivalent to ::std::vector but with ability to push_front
no performance differences compared to ::std::vector (as different from ::std::deque)
four bytes of overhead compared to ::std::vector

If you want 2 way resize, etc... you could create your own vector class with 2 vectors inside
one for the 0 and positive values and another for the negative ones.
Then just implement the common functions and add new ones (ex: push_begin to add to the negative indexes vector), and update the correspndent vectors inside.

Related

C++: What are the reasons for choosing a linked list / deque over a vector? [duplicate]

There's a well known image (cheat sheet) called "C++ Container choice". It's a flow chart to choose the best container for the wanted usage.
Does anybody know if there's already a C++11 version of it?
This is the previous one:
Not that I know of, however it can be done textually I guess. Also, the chart is slightly off, because list is not such a good container in general, and neither is forward_list. Both lists are very specialized containers for niche applications.
To build such a chart, you just need two simple guidelines:
Choose for semantics first
When several choices are available, go for the simplest
Worrying about performance is usually useless at first. The big O considerations only really kick in when you start handling a few thousands (or more) of items.
There are two big categories of containers:
Associative containers: they have a find operation
Simple Sequence containers
and then you can build several adapters on top of them: stack, queue, priority_queue. I will leave the adapters out here, they are sufficiently specialized to be recognizable.
Question 1: Associative ?
If you need to easily search by one key, then you need an associative container
If you need to have the elements sorted, then you need an ordered associative container
Otherwise, jump to the question 2.
Question 1.1: Ordered ?
If you do not need a specific order, use an unordered_ container, otherwise use its traditional ordered counterpart.
Question 1.2: Separate Key ?
If the key is separate from the value, use a map, otherwise use a set
Question 1.3: Duplicates ?
If you want to keep duplicates, use a multi, otherwise do not.
Example:
Suppose that I have several persons with a unique ID associated to them, and I would like to retrieve a person data from its ID as simply as possible.
I want a find function, thus an associative container
1.1. I couldn't care less about order, thus an unordered_ container
1.2. My key (ID) is separate from the value it is associated with, thus a map
1.3. The ID is unique, thus no duplicate should creep in.
The final answer is: std::unordered_map<ID, PersonData>.
Question 2: Memory stable ?
If the elements should be stable in memory (ie, they should not move around when the container itself is modified), then use some list
Otherwise, jump to question 3.
Question 2.1: Which ?
Settle for a list; a forward_list is only useful for lesser memory footprint.
Question 3: Dynamically sized ?
If the container has a known size (at compilation time), and this size will not be altered during the course of the program, and the elements are default constructible or you can provide a full initialization list (using the { ... } syntax), then use an array. It replaces the traditional C-array, but with convenient functions.
Otherwise, jump to question 4.
Question 4: Double-ended ?
If you wish to be able to remove items from both the front and back, then use a deque, otherwise use a vector.
You will note that, by default, unless you need an associative container, your choice will be a vector. It turns out it is also Sutter and Stroustrup's recommendation.
I like Matthieu's answer, but I'm going to restate the flowchart as this:
When to NOT use std::vector
By default, if you need a container of stuff, use std::vector. Thus, every other container is only justified by providing some functionality alternative to std::vector.
Constructors
std::vector requires that its contents are move-constructible, since it needs to be able to shuffle the items around. This is not a terrible burden to place on the contents (note that default constructors are not required, thanks to emplace and so forth). However, most of the other containers don't require any particular constructor (again, thanks to emplace). So if you have an object where you absolutely cannot implement a move constructor, then you will have to pick something else.
A std::deque would be the general replacement, having many of the properties of std::vector, but you can only insert at either ends of the deque. Inserts in the middle require moving. A std::list places no requirement on its contents.
Needs Bools
std::vector<bool> is... not. Well, it is standard. But it's not a vector in the usual sense, as operations that std::vector normally allows are forbidden. And it most certainly does not contain bools.
Therefore, if you need real vector behavior from a container of bools, you're not going to get it from std::vector<bool>. So you'll have to make due with a std::deque<bool>.
Searching
If you need to find elements in a container, and the search tag can't just be an index, then you may need to abandon std::vector in favor of set and map. Note the key word "may"; a sorted std::vector is sometimes a reasonable alternative. Or Boost.Container's flat_set/map, which implements a sorted std::vector.
There are now four variations of these, each with their own needs.
Use a map when the search tag is not the same thing as the item you're looking for itself. Otherwise use a set.
Use unordered when you have a lot of items in the container and search performance absolutely needs to be O(1), rather than O(logn).
Use multi if you need multiple items to have the same search tag.
Ordering
If you need a container of items to always be sorted based on a particular comparison operation, you can use a set. Or a multi_set if you need multiple items to have the same value.
Or you can use a sorted std::vector, but you'll have to keep it sorted.
Stability
When iterators and references are invalidated is sometimes a concern. If you need a list of items, such that you have iterators/pointers to those items in various other places, then std::vector's approach to invalidation may not be appropriate. Any insertion operation may cause invalidation, depending on the current size and capacity.
std::list offers a firm guarantee: an iterator and its associated references/pointers are only invalidated when the item itself is removed from the container. std::forward_list is there if memory is a serious concern.
If that's too strong a guarantee, std::deque offers a weaker but useful guarantee. Invalidation results from insertions in the middle, but insertions at the head or tail causes only invalidation of iterators, not pointers/references to items in the container.
Insertion Performance
std::vector only provides cheap insertion at the end (and even then, it becomes expensive if you blow capacity).
std::list is expensive in terms of performance (each newly inserted item costs a memory allocation), but it is consistent. It also offers the occasionally indispensable ability to shuffle items around for virtually no performance cost, as well as to trade items with other std::list containers of the same type at no loss of performance. If you need to shuffle things around a lot, use std::list.
std::deque provides constant-time insertion/removal at the head and tail, but insertion in the middle can be fairly expensive. So if you need to add/remove things from the front as well as the back, std::deque might be what you need.
It should be noted that, thanks to move semantics, std::vector insertion performance may not be as bad as it used to be. Some implementations implemented a form of move semantic-based item copying (the so-called "swaptimization"), but now that moving is part of the language, it's mandated by the standard.
No Dynamic Allocations
std::array is a fine container if you want the fewest possible dynamic allocations. It's just a wrapper around a C-array; this means that its size must be known at compile-time. If you can live with that, then use std::array.
That being said, using std::vector and reserveing a size would work just as well for a bounded std::vector. This way, the actual size can vary, and you only get one memory allocation (unless you blow the capacity).
Here is the C++11 version of the above flowchart. [originally posted without attribution to its original author, Mikael Persson]
Here's a quick spin, although it probably needs work
Should the container let you manage the order of the elements?
Yes:
Will the container contain always exactly the same number of elements?
Yes:
Does the container need a fast move operator?
Yes: std::vector
No: std::array
No:
Do you absolutely need stable iterators? (be certain!)
Yes: boost::stable_vector (as a last case fallback, std::list)
No:
Do inserts happen only at the ends?
Yes: std::deque
No: std::vector
No:
Are keys associated with Values?
Yes:
Do the keys need to be sorted?
Yes:
Are there more than one value per key?
Yes: boost::flat_map (as a last case fallback, std::map)
No: boost::flat_multimap (as a last case fallback, std::map)
No:
Are there more than one value per key?
Yes: std::unordered_multimap
No: std::unordered_map
No:
Are elements read then removed in a certain order?
Yes:
Order is:
Ordered by element: std::priority_queue
First in First out: std::queue
First in Last out: std::stack
Other: Custom based on std::vector?????
No:
Should the elements be sorted by value?
Yes: boost::flat_set
No: std::vector
You may notice that this differs wildly from the C++03 version, primarily due to the fact that I really do not like linked nodes. The linked node containers can usually be beat in performance by a non-linked container, except in a few rare situations. If you don't know what those situations are, and have access to boost, don't use linked node containers. (std::list, std::slist, std::map, std::multimap, std::set, std::multiset). This list focuses mostly on small and middle sided containers, because (A) that's 99.99% of what we deal with in code, and (B) Large numbers of elements need custom algorithms, not different containers.

iterate ordered versus unordered containers

I want to know which data-structures are more efficient for iterating through their elements between std::set, std::map and std::unordered_set, std::unordered_map.
I searched through SO and I found this question. The answers either propose to copy the elements in a std::vector or to use Boost.Container, which IMHO don't answer my question.
My purpose is to keep in a container a big number of unique elements, that most of the time I want to iterate through them. Insertions and extractions are more rare. I want to avoid std::vector in combination with std::unique.
Lets consider set vs unordered_set.
The main difference here is the 'nature' of the iteration, that is the traversal of the set will give you the elements in order while traversing a range in an unordered set will give you a bunch of values in no particular order.
Suppose you want to traverse a range [it1, it2]. If we exclude the lookup time that's needed to find elements it1 and it2 there can be no direct mapping from one case to another since the elements in between are not guarrandeed to be the same even if you've used the same elements to construct the container.
There are cases however where something like this has meaning when e.g. you want to traverse a fixed number of elements (regardless of what they are) or when you need to traverse the whole container. In such cases you need to consider implementation mechanics :
Sets are usually implemented like Red–black trees (a form of binary search trees). Like all binary search trees allow efficient in-order traversal (LRR: left root right) of their elements. That is to traverse you pay the cost of pointer chasing (just like traversing a list).
Unordered sets on the other hand are hash tables and to my knowledge the STL implementation uses hashing with chaining. That means (in a very very high level) that what's used for the structure is a (contiguous) buffer where each element is the head of a chain (list) that contains the elements. The way the elements are layed out across those chains (buckets) and across the buffer will affect the traversal time, however you'll be chasing pointers once again jumping through differents lists as well this time. I don't think it'll vary significantly from the tree case but won't be any better for sure.
In any case micro tuning and benchmarking will give you the answer for your particular application.
The difference does not lie between the ordering or lack of one but in the backing container. If it's a contiguous memory it should be fast to iterate over, due to simple implementation of iterator and cache friendliness.
Unordered containers are usually stored as a vector of vectors (or a similar thing), while ordered containers are implemented using trees, but it is left for implementation after all. This would suggest that iterating over unordered version should be waster. However this is left for implementation after all, and I saw implementations (which bent rules a little to be fair) with different behaviour.
Generally speaking, container performance is quite a complex topic and usually has to be tested in actual application to get reliable answer. There is plenty on implemention-defined stuff that might affect the performance. I'd go with hash_set if I had to go in blind. Copying into a vector might also turn out a good option.
EDIT: As #TonyD said in it's comment, there is a rule, that disallows invalidating iterators during addition of element when the max_load_factor() is not exceeded, this practically rules out backing containers which are contiguous in memory.
Thus, copying everything into a vector seems like even more reasonable option. If you need to remove duplicates, a feasible option might be to use http://en.cppreference.com/w/cpp/algorithm/sort and have dupes easily ignored. I have heard that using vector and sort to have a sorted array (or vector) is quite often a used option in case of need for a container that needs to be sorter and is being iterated over more often than modified.
iterate from fastest to slowest should be : set > map > unordered_set > unordered_map;
set is a little lighter than map, and they are ordered with binary tree rule, so should be faster than unordered_ containers.

C++: container replacement for vector/deque for huge sizes

so my applications has containers with 100 million and more elements.
I'm on the hunt for a container which behaves - time-wise - better than std::deque (let alone std::vector) with respect to frequent insertions and deletions all over the container ... including near the middle. Access time to the n-th element does not need to be as fast as vector, but should definetely be better than full traversal like in std::list (which has a huge memory overhead per element anyway).
Elements should be treated ordered by index (like vector, deque, list), so std::set or std::unordered_set also do not work well.
Before I sit down and code such a container myself: has anyone seen such a beast already? I'm pretty sure the STL hasn't anything like this, looking to BOOST I did not find something I could use but I may be wrong.
Any hints?
There's a whole STL replacement for big data, in case your app is centric to such data:
STXXL - http://stxxl.sourceforge.net/
edit: I was actually a bit fast to answer. 100 million is not really a large number. E.g., if each element is one byte, you could save it in a 96MiB array. So whether STXXL is any useful, the size of an element should be significantly bigger.
I think you can get the performance characteristics that you want with a skip list:
https://en.wikipedia.org/wiki/Skip_list#Indexable_skiplist
It's the "indexable" part that you're interested in, of course -- you don't actually want the items to be sorted. So some modification is needed that I leave as an exercise.
You might find that 100 million list nodes begins to strain a 32 bit address space, but probably not an issue in 64 bits.
1) If the data is highly sparse, i.e. has lots of zeroes or can be expressed as such, I would highly recommend a data structure that takes advantage of that:
sparselib++ for matrices
sparsehash for hash maps
2) Hash maps should do O(1) for all the operations you describe and the sparsehash implementation I mentioned earlier is particularly space-efficient; it also includes a sparsetable type which is a bit more low-level and can be used in place of an array.
3) If the strict ordering is not that important (it probably is, because you mentioned elements should be treated ordered by index), you can swap the elements you want to erase to the end of the vector and then resize to do removal in O(1). Insertion would just be push_back.
Try a hash map. The STL has several, all with the unordered naming prefix , such as unorderd_map, etc. It has constant time insertion and look up given a good hashing algorithm. With your 'huge' data set the hash map would most likely cover your needs. Making a slight change to the application to cover the differences in the interfaces is trivial.

Dynamic size of array in c++?

I am confused. I don't know what containers should I use. I tell you what I need first. Basically I need a container that can stored X number of Object (and the number of objects is unknown, it could be 1 - 50k).
I read a lot, over here array vs list its says: array need to be resized if the number of objects is unknown (I am not sure how to resize an array in C++), and it also stated that if using a linked list, if you want to search certain item, it will loop through (iterate) from first to end (or vice versa) while an array can specify "array object at index".
Then I went for an other solution, map, vector, etc. Like this one: array vs vector. Some responder says never use array.
I am new to C++, I only used array, vector, list and map before. Now, for my case, what kind of container you will recommend me to use? Let me rephrase my requirements:
Need to be a container
The number of objects stored is unknown but is huge (1 - 40k maybe)
I need to loop through the containers to find specific object
std::vector is what you need.
You have to consider 2 things when selecting a stl container.
Data you want to store
Operations you want to perform on the stored data
There wasa good diagram in a question here on SO, which depitcs this, I cannot find the link to it but I had it saved long time ago, here it is:
You cannot resize an array in C++, not sure where you got that one from. The container you need is std::vector.
The general rule is: use std::vector until it doesn't work, then shift to something that does. There are all sorts of theoretical rules about which one is better, depending on the operations, but I've regularly found that std::vector outperforms the others, even when the most frequent operations are things where std::vector is supposedly worse. Locality seems more important than most of the theoretical considerations on a modern machine.
The one reason you might shift from std::vector is because of iterator validity. Inserting into an std::vector may invalidate iterators; inserting into a std::list never.
Do you need to loop through the container, or you have a key or ID for your objects?
If you have a key or ID - you can use map to be able to quickly access the object by it, if the id is the simple index - then you can use vector.
Otherwise you can iterate through any container (they all have iterators) but list would be the best if you want to be memory efficient, and vector if you want to be performance oriented.
You can use vector. But if you need to find objects in the container, then consider using set, multiset or map.

Container access and allocation through the same operator?

I have created a container for generic, weak-type data which is accessible through the subscript operator.
The std::map container allows both data access and element insertion through the operator, whereas std::vector I think doesn't.
What is the best (C++ style) way to proceed? Should I allow allocation through the subscript operator or have a separate insert method?
EDIT
I should say, I'm not asking if I should use vector or map, I just wanted to know what people thought about accessing and inserting being combined in this way.
In the case of Vectors: Subscript notation does not insert -- it overwrites.
This rest of this post distils the information from item 1-5 of Effective STL.
If you know the range of your data before hand -- and the size is fixed -- and you won't insert at locations which has data above it -- then you can use insert into vectors without unpleasant side-effects.
However in the general case vector insertions have implications such as shifting members upward and doubling memory when exhausted (which causes a flood of copies from the old vector's objects to locations in the new vector ) when you make ad hoc insertions. Vectors are designed for when you know the locality characteristics of your data..
Vectors come with an insert member function... and this function is very clever with most implementations in that it can infer optimizations from the iterators your supply. Can't you just use this ?
If you want to do ad-hoc insertions of data, you should use a list. Perhaps you can use a list to collect the data and then once its finalized populate a vector using the range based insert or range based constructor ?
it depends what you want. A map can be significantly slower than a vector if you wish to use the thing like an array. A map is very helpful if the index you want to use is non-sequential and you have LOADS of them. Its usually quicker to just use a vector, sort it and do a binary search to find what you are after. I've used this method to replace maps in tonnes of software and I still haven't found something where it was slower to do this with a vector.
So, IMO, std::vector is the better way, though a map MIGHT be useful if you are using it properly.
Separate insert method, definitely. The operator[] on std::map is just stupid and makes the code hard to read and debug.
Also you can't access data from a const context if you're using a operator[] to insert (which will lead to un-const-cancer, the even-more evil cousin of const-cancer).