Compare between stl containers and arrays? - c++

In which case using vectors or sets (stl containers ) is advantageous compared to normal arrays?

"Normal arrays" are static objects: Their size is fixed and determined at compile time. Dynamic containers can have an arbitrary amount of elements which can change at runtime.
Necessarily, dynamic containers have to use more expensive memory allocation operations than static arrays. If you need a dynamic container, there's no way around it, but if a static array suffices, you might prefer that (but use std::array!).
Note also that static arrays with automatic storage usually cannot be too large, since programs typically only have limited memory for automatic objects.
Another point is utility: Several advanced data structures like linked lists and binary search trees are only available in the standard library as dynamic containers. If you need list or a queue or a map, even if it's just small and of bounded size, the dynamic containers are readily available, while there is no static analogue as part of the standard library. (However, thanks to allocators used by the standard containers, you can always put a dynamic container inside a static array by using a pool-type allocator. C++ decouples object lifetime from memory lifetime.)

I suggest that there is almost never a reason to use std::vector. std::deque has all the advantages (constant time access, etc) with none of the drawbacks (terrible resize performance). The only time you would ever choose a vector over a deque is if you need the fact that it's backed by a real, old-fashioned, C-style array. And the only reason for that is if you need to pass it into some legacy function (as an array).
The advantages of vector over a traditional array are limited. It will grow if you insert past it's current size, but extremely inefficiently (see std::deque for a better option). It is just as easy to index past the end of a vector as it is an array, so no benefit there. The memory management quality is only such that it will allocate/deallocate items it contains. But these are typically pointers so that doesn't help. If they're instances (not pointer) then an array will also allocate/deallocate them properly too.
If I need an array, I would probably choose vector because it has some nice API things like size, begin, & end. But in general my suggestion is DON'T USE EITHER ONE! GO WITH std::deque INSTEAD!

One advantage is that STL containers take care of memory management for you and are less likely to result in buffer overflows or memory leaks than are C-style arrays. They're also prebuilt, so you don't have to spend time reinventing the wheel. So any time you're concerned about such things, STL containers are a better choice.

Advantageous in what way? set, multiset, vector, list, map, deque, stack, queue, priority_queue, multimap, bitset are all implemented differently. It depends on what you're doing. Some are implemented with a balanced tree, some with a contiguous array, some as linked lists, etc. Some are faster at inserting, some are faster at accessing, some work well with deleting, etc.
No container is always advantageous to another, or else the other wouldn't exist. Part of software development is being able to make decisions such as "which container should I use" so what's your real question, and how do you need your container to be advantageous?
Obviously, arrays will always be faster than vectors because the underlying component of a vector is an array, so the vector will just have overhead. But that overhead is doing a lot of wonderful things for you that means you don't have to worry about tons of things that you do have to worry about with arrays.

Most of the time the standard containers will be preferred over an old-fashioned array. They just have a lot more capabilities. The only time an array would be reasonable over std::vector would be when the size is known at compile time and is reasonably small (i.e. not megabytes) and you need to save the overhead of heap allocation. Sometimes an array is slightly more convenient as you can pass arr instead of &vec[0] to a function, but that's a very small price to pay.
If you're having trouble choosing between std::vector and std::set and the other standard containers, see here: In which scenario do I use a particular STL container?

Related

Fast data structure that supports finding the minimum element and accessing, inserting, removing and updating data at any index

I'm looking for ideas to implement a templatized sequence container data structure which can beat the performance of std::vector in as many features as possible and potentially perform much faster. It should support the following:
Finding the minimum element (and returning it's index)
Insertion at any index
Removal at any index
Accessing and updating any element by index (via operator[])
What would be some good ways to implement such a structure in C++?
You generally be pretty sure that the STL implementations of all containers tend to be very good at the range of tasks they were designed for. That is to say, you're unlikely to be able to build a container that is as robust as std::vector and quicker for all applications. However, generally speaking, it is almost always possible to beat a generic tool when optimizing for a specific application.
First, let's think about what a vector actually is. You can think of it as a pointer to a c-style array, except that its elements are stored on the heap. Unlike a c array, it also provides a bunch of methods that make it a little bit more convenient to manipulate. But like a c-array, all of it's data is stored contiguously in memory, so lookups are extremely cheap, but changing its size may require the entire array to be shifted elsewhere in memory to make room for the new elements.
Here are some ideas for how you could do each of the things you're asking for better than a vanilla std::vector:
Finding the minimum element: Search is typically O(N) for many containers, and certainly for a vector (because you need to iterate through all elements to find the lowest). You can make it O(1), or very close to free, by simply keeping the smallest element at all times, and only updating it when the container is changed.
Insertion at any index: If your elements are small and there are not many, I wouldn't bother tinkering here, just do what the vector does and keep elements contiguously next to each other to keep lookups quick. If you have large elements, store pointers to the elements instead of the elements themselves (boost's stable vector will do this for you). Keep in mind that this make lookup more expensive, because you now need to dereference the pointer, so whether you want to do this will depend on your application. If you know the number of elements you are going to insert, std::vector provides the reserve method which preallocates some memory for you, but what it doesn't do is allow you to decide how the size of the allocated memory grows. So if your application warrants lots of push_back operations without enough information to intelligently call reserve, you might be able to beat the standard std::vector implementation by tailoring the growth function of your container to your particular needs. Another option is using a linked list (e.g. std::list), which will beat an std::vector in insertions for larger containers. However, the cost here is that lookup (see 4.) will now become vastly slower (O(N) instead of O(1) for vectors), so you're unlikely to want to go down this path unless you plan to do more insertions/erasures than lookups.
Removal at any index: Similar considerations as for 2.
Accessing and updating any element by index (via operator[]): The only way you can beat std::vector in this regard is by making sure your data is in the cache when you try to access it. This is because lookup for a vector is essentially an array lookup, which is really just some pointer arithmetic and a pointer dereference. If you don't access your vector often you might be able to squeeze out a few clock cycles by using a custom allocator (see boost pools) and placing your pool close to the stack pointer.
I stopped writing mainly because there are dozens of ways in which you could approach this problem.
At the end of the day, this is probably more of an exercise in teaching you that the implementation of std::vector is likely to be extremely efficient for most compilers. All of these suggestions are essentially micro-optimizations (which are the root of all evil), so please don't blindly apply these in important code, as they're highly likely to end up costing you a lot of time and headache.
However, that's not to say you shouldn't tinker and learn for yourself, so by all means go ahead and try to beat it for your application and let us know how you go! Good luck :)

STL container for a list with random access?

Is there an STL container similar to a list in that elements of lists are not stored contiguously? The size of this container can be up to 1000x1000 elements with each element being a vector containing 36 doubles. This would be a large chunk to store together (like ~200 megabytes). Is there a variant that instead stores pointers to its contents as a separate vector so it would allow for random access. Is there an STL container class for this that already exists or should I just store the pointers manually?
The container I need is actually a constant size so I think implementing it myself wouldnt be too difficult, but I was wondering if an STL container already exists for this. I'd like to avoid a vector because the list is large and the contents will be of medium size. If the vectors in the container don't need to reside next to each other then wouldn't it be better to separate them in a list to prevent running out of memory from fragmentation?
Both deque<array<double, 36>> and vector<vector<double>> would avoid the need for any really huge contiguous allocations.
The vector<vector<double>> is worse in those terms. For the numbers you specify it needs a contiguous allocation of 1000*1000*sizeof(vector<double>), which is low 10s of MB (most likely a vector is the size of 3 pointers). That's rarely a problem on a "proper computer" (desktop or server). The places where it would be a concern for fragmentation reasons (small virtual address space or no virtual addressing at all), you might also have a more fundamental problem that you don't have 300MB-ish of RAM anyway. But you could play extra-safe by avoiding it, since clearly there can exist environments where you could allocate 300MB total but not 12MB contiguously.
There is no std::array in C++03, but there's boost::array or you could easily write a class to represent 36 doubles.
vector<array<double, 36>> suffers worst from fragmentation, it requires a contiguous 250-MB allocation. Personally I don't find it easy to simulate in testing "the worst possible memory fragmentation we will ever face", but I'm not the best tester. That size of block is about where I start feeling a bit uneasy in a 32 bit process, but it will work fine in good conditions.
I highly recommend you to use the std::array class. It is constant sized, it supports random access to all elements, and has implementations of iterator, const_iterator, reverse_iterator, const_reverse_iterator. More about it: http://www.cplusplus.com/reference/stl/array/
It isn't clear what characteristic of std::list<T> you are after exactly. If you want a container whose elements stay put when adding or removing elements, you might want to have a look at std::deque<T>: when adding/removing elements at the front or the back all other element stay at the same location. That is, pointers and references to elements stay valid, unless elements are add or removed in the middle. Iterators get invalid on any insertion or removal. std::deque<T> provides random access.
There is no container directly given random access and support addition/removal at any poistion with the elements staying put. However, as others have pointed out, using a container of pointers provides such an interface. It may be necessary to wrap it to hide the use of pointers.

What is the C++ equivalent of C# Collection<T> and how do you use it?

I have the need to store a list/collection/array of dynamically created objects of a certain base type in C++ (and I'm new to C++). In C# I'd use a generic collection, what do I use in C++?
I know I can use an array:
SomeBase* _anArrayOfBase = new SomeBase[max];
But I don't get anything 'for free' with this - in other words, I can't iterate over it, it doesn't expand automatically and so on.
So what other options are there?
Thanks
There is std::vector which is a wrapper around an array, but it can expand and will do automatically. However, it is a very expensive operation, so if you are going to do a lot of insertion or removal operations, don't use a vector. (You can use the reserve function, to reserve a certain amount of space)
std::list is a linked list, which has far faster insertion and removal times, but iteration is slower as the values are not stored in contiguous memory, which means that address calculation is far more complex and you can't take advantage of the processors cache when iterating over the list.
The major upside compared to the vector or deque is that elements can be added or removed from anywhere in the list fairly cheaply.
As a compromise, there is std::deque, which externally works in a similar way to a vector, but internally they are very different. The deque's storage doesn't have to be contiguous, so it can be divided up into blocks, meaning that when the deque grows, it doesn't have to reallocate the storage space for its entire contents. Access is slightly slower and you can't do pointer arithmetic to get an element.
You should use a vector.
#include <vector>
int main()
{
std::vector<SomeBase*> baseVector;
baseVector.push_back(new SomeBase());
}
C++ contains a collection of data containers within the STL.
Check it out here.
You should use one of the containers
std::vector<SomeBase>
std::list<SomeBase>
and if you really need dynamically allocated objects
std::vector<boost::shared_ptr<SomeBase>>
std::list<boost::shared_ptr<SomeBase>>
Everyone has mentioned that the common SC++L controls, but there is another important caveat when doing this in C++ (that Chaoz has included in his example).
In C++, your collection will need to be templated on SomeBase*, not on SomeBase. If you try to assign an instance of the derived type to an instance of the base typem you will end up causing what is called object slicing. This is almost definately not what you are trying to do.
Since you are coming from C#, just remember that "SomeBase MyInstance" means something very different in both languages. The C++ equivalent to this is usually "SomeBase* MyPointer" or "SomeBase& MyReference".
Use a vector. Have a look here.
I am a huge fan of std::deque. If you want things for free, the deque gives them to you. Fast access from the head and tail of the list. iterators, reverse_iterators, fast insertion at the head and the tail. It's not super specialized, but you wanted free stuff. ;-)
Also, I will link a great STL reference. The STL is where you get all the standard "free" stuff in C++. Standard Template Library. Enjoy!
Use STL. std::vector and std::set for instance. Plenty of examples out there.

selection of data structure

I use C++, say i want to store 40 usernames, I will simply use an array. However, if I want to store 40000 usernames is this still a good idea in terms of search speed? Which data structure should I use to improve this speed?
You need to specify what the insertion and removal requirements are. Do things need to be removed and inserted at random points in the sequence?
Also, why the requirement to search sequentially? Are you doing searches that aren't suitable for a hash table lookup?
At the moment I'd suggest a deque or a list. Often it's best to choose a container with the interface that makes for the simplest implementation for your algorithm and then only change the choice if the performance is inadequate and an alternative provides the necessary speedup.
A vector has two principle advantages, there is no per-object memory overhead, although vectors will over-allocate to prevent frequent copying and objects are stored contiguously so sequential access tends to be fast. These are also its disadvantages. Growing vectors require reallocation and copying, and insertion and removal from anywhere other than the end of the vector also require copying. Contiguous storage can produce problems for vectors with large numbers of objects or large objects as the contiguous storage requirements can be hard to satisfy even with only mild memory fragmentation.
A list doesn't require contigous storage but list nodes usually have a per-object overhead of two pointers (in most implementation). This can be significant in list of very small objects (e.g. in a list of pointers, each node is 3x the size of the data item). Insertion and removal from the middle of a list is very cheap though and list nodes never need to me moved in memory once created.
A deque uses chunked storage, so it has a low per-object overhead similar to a vector, but doesn't require contiguous storage over the whole container so doesn't have the same problem with fragmented memory spaces. It is often a very good choice for collections and is often overlooked.
As a rule of thumb, prefer vector to list or, diety forbid, C-style array.
After the vector is filled, make sure it is properly ordered using the sort algorithm. You can then search for a particular record using either find, binary_search or lower_bound. (You don't need to sort to use find.)
Seriously unless you are in a resource constrained environment (embedded platform, phone, or other). Use a std::map, save the effort of doing sorting or searching and let the container take care of everything. This will possibly be a sorted tree structure, probably balance (e.g. Red-Black), which means you will get good searching performance. Unless the size of you data is close to the size of one or two pointers, the memory overhead of whatever data structure you pick is negligable. You Graphics Card probably has more memory that you are going to use up for the data you are think about.
As others said there is very little good reason to use vanilla array, if you don't want to use a map use std::vector or std::list depending on whether you need insert/delete data (=>list) or not (=>vector)
Also consider if you really need all that data in memory, how about putting it on disk via sqlite. Or even use sqlite for in memory access. It all depends on what you need to do with your data.
std::vector and std::list seem good for this task. You can use an array if you know the maximum number of records beforehands.
If you need only sequentially search and storage, then list is the proper container.
Also, vector wouldn't be a bad choice.

Benefit of slist over vector?

What I need is just a dynamically growing array. I don't need random access, and I always insert to the end and read it from the beginning to the end.
slist seems to be the first choice, because it provides just enough of what I need. However, I can't tell what benefit I get by using slist instead of vector. Besides, several materials I read about STL say, "vectors are generally the most efficient in time for accessing elements and to add or remove elements from the end of the sequence". Therefore, my question is: for my needs, is slist really a better choice than vector? Thanks in advance.
For starters, slist is non-standard.
For your choice, a linked list will be slower than a vector, count on it. There are two reasons contributing to this:
First and foremost, cache locality; vectors store their elements linearly in the RAM which facilitates caching and pre-fetching.
Secondly, appending to a linked list involves dynamic allocations which add a large overhead. By contrast, vectors most of the time do not need to allocate memory.
However, a std::deque will probably be even faster. In-depth performance analysis has shown that, despite bias to the contrary, std::deque is almost always superior to std::vector in performance (if no random access is needed), due to its improved (chunked) memory allocation strategy.
Yes, if you are always reading beginning to end, slist (a linked list) sounds like the way to go. The possible exception is if you will be inserting a large quantity of elements at the end at the same time. Then, the vector could be better if you use reserve appropriately.
Of course, profile to be sure it's better for your application.
Matt Austern (author of "Generic Programming and the STL" and general C++ guru) is a strong advocate of singly-linked lists for inclusion in the forthcoming C++ standard; see his presentation at http://www.accu-usa.org/Slides/SinglyLinkedLists.ppt and his long article at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2543.htm for more details, including a discussion of the trade-offs involved that may guide you in possibly choosing this data structure. (Note that the currently proposed name is forward_list, though slist is how it was traditionally named in SGI's STL & other popular libraries).
I'll second (or maybe third...) the opinion that std::vector or std::deque will do the job. The only thing that I will add is a few additional factors that should guide the decision between std::vector<T> and std::list<T>. These have a lot to do with the characteristics of T and what algorithms you plan on using.
The first is memory overhead. Std::list is a node-based container so if T is a primitive type or relatively small user-defined type, then the memory overhead of the node-based linkage might be non-negligible - consider that std::list<int> is likely to use at least 3 * sizeof(int) storage for each element whereas std::vector will only use sizeof(int) storage with a small header overhead. Std::deque is similar to std::vector but has a small overhead that is linear to N.
The next issue is the cost of copy construction. If T(T const&) is at all expensive, then steer clear of std::vector<T> since it cause a bunch of copies to occur as the size of the vector grows. This is where std::deque<T> is a clear winner and std::list<T> is also a contender.
The final issue that usually guides the decision on container type is whether your algorithms can work with the iterator invalidation constraints of std::vector and std::deque. If you will be manipulating the container elements a lot (e.g., sorting, inserting in the middle, or shuffling), then you might want to lean towards std::list since manipulating the order requires little more than resetting a few linkage pointers.
I'm guessing you mean std::list by "slist". Vectors are good when you need fast, random-access to a sequence of elements, with guaranteed contiguous memory, and fast sequential reading (IOW, from the beginning to the end). Lists are good when you need fast (constant-time) insertion or deletion of items at the beginning or end of the sequence, but don't care about the performance of random-access or sequential reading.
The reason for the difference is the way the 2 are implemented. Vectors are implemented internally as an array of items, which needs to be reallocated when its size/capacity is reached on adding an item. Lists are implemented as a doubly-linked list, which can cause cache-misses for sequential reading. Random-access for lists also requires scanning from the first (or last) item in the list, until it locates the item you're requesting.
Sounds like a good job for std::deque to me. It has the memory benefits of a vector like contiguous memory allocation for each "slab" (good for CPU caches), no overhead for each element like with std::list and it does not need to reallocate the whole set as std::vector does. Read more about std::deque here