What are the time complexities for size? - c++

I am studying the complexity of various operations of the different STL containers. Through a different question on this site I have found this chart.
website link
One operation I noticed was missing form this chart was the size operation.
I would suppose that if one knew the complexity of .begin and .end one could also calculate the complexity for size. But those are also missing.
I have found an answer similar to what I am looking for in this question, but this one is for Java so it does not cover all the STL containers and it only defines the big O of size for a few of the given datatypes.
Does anyone know the complexity for the .size operation of the various containers or could someone give me a pointer as to where I could find these complexities. Any help would be greatly appreciated.
Also, if my question is wrongly phrased and/or off-topic. Do not hesitate to suggest an edit.

Since C++11, the complexity of the size member function is constant for all standard containers.
std::forward_list which is an implementation of the singly linked list data structure does not provide a size member function. The size can be calculated in linear time using the iterators.
Aside from standard C++ containers, all data structures can be augmented with a separately stored size variable to achieve such constant complexity at the cost of small constant overhead on insert and delete operations. Array is special in regard that it does not require any additional overhead assuming iterator to end is stored.

Related

Why prefer std::vector over std::deque? [duplicate]

This question already has answers here:
Why would I prefer using vector to deque
(10 answers)
Closed 3 years ago.
They both have access complexity of O(1) and random insertion/removal complexity of O(n). But vector costs more when expanding because of reallocation and copy, while deque does not have this issue.
It seems deque has a better performance, but why most people use vector instead of deque?
why most people use vector instead of deque?
Because this is what they have been taught.
vector and deque serve slightly different purposes. They can both be used as a simply container of objects, if that's all you need. When learning to program C++, that is all most people need -- a bucket to drop stuff in to, get stuff out of, and walk over.
When StackOverflow is asked a question like "which container should I use by default," the answer is almost invariably vector. The question is generally asked from the context of learning to program in C++, and at the point where a programmer is asking such a question, they don't yet know what they don't know. And there's a lot they don't yet know. So, we (StackOverflow) need a container that fits almost every need for better or worse, can be used in almost any context, and doesn't require that the programmer has asked all the right questions before landing on something approximating the correct answer. Furthermore, the Standard specifically recommends using vector. vector isn't best for all uses, and in fact deque is better than vector for many common uses -- but it's not so much better for a learning programmer that we should vary from the Standard's advice to newbie C++ programmers, so StackOverflow lands on vector.
After having learned the basics of the syntax and, shall we say, the strategies behind programming in C++, programmers split in to two branches: those who care to learn more and write better programs, and those who don't. Those who don't will stick on vector forever. I think many programmers fall in to this camp.
The rarer programmers who try to move beyond this phase start asking other questions -- questions like you've asked here. They know there is lots they don't yet know, and they want to start discovering what those things are. They will quickly (or less quickly) discover that when choosing between vector and deque, some questions they didn't think to ask before are:
Do I need the memory to be contigious?
Do I need to avoid lots of reallocations?
Do I need to keep valid iterators after insertions?
Do I need my collection to be compatible with some ancient C-like function?
Then they really start thinking about the code they are writing, discover yet more stuff they don't know, and the beat goes on...
From C++ standard section 23.1.1:
vector is the type of sequence that should be used by default... deque is
the data structure of choice when most insertions and deletions take place
at the beginning or at the end of the sequence.
However there are some arguments in the opposite direction.
In theory vector is at least as efficient as deque as it provides a subset of its functionality. If your task only needs what vector's interface provides, prefer vector - it can not be worse than a deque.
But vector costs more when expanding because of reallocation and copy
While it's true that vector sometimes has to reallocate its array as it grows, it will grow exponentially so that the amortised complexity is still O(1). Often, you can avoid reallocations by judicious use of reserve().
It seems deque has a better performance
There are many aspects to performance; the time taken by push_back is just one. In some applications, a container might be modified rarely, or populated on start-up and then never modified. In cases like that, iteration and access speed might be more important.
vector is the simplest possible container: a contiguous array. That means that iteration and random access can be achieved by simple pointer arithmetic, and accessing an element can be as fast as dereferencing a pointer.
deque has further requirements: it must not move the elements. This means that a typical implementation requires an extra level of indirection - it is generally implemented as something like an array of pointers to arrays. This means that element access requires dereferencing two pointers, which will be slower than one.
Of course, often speed is not a critical issue, and you choose containers according to their behavioural properties rather than their performance. You might choose vector if you need the elements to be contiguous, perhaps to work with an API based on pointers and arrays. You might choose deque or list if you want a guarantee that the elements won't move, so you can store pointers to them.
For cplusplus :
Therefore they provide a similar functionality as vectors, but with
efficient insertion and deletion of elements also at the beginning of
the sequence, and not only at its end. But, unlike vectors, deques are
not guaranteed to store all its elements in contiguous storage
locations, thus not allowing direct access by offsetting pointers to
elements.
Personally, I prefer using deque (I always end up spoiling myself and having to use push_front for some reason or other), but vector does have its uses/differences, with the main one being:
vector has contiguous memory, while a deque allocates via pages/chunks.
Note, the pages/chunks are fairly useful: constant-time insert/erase on the front of the container. It is also typical that a large block of memory broken up into a series of smaller blocks is more efficient than a single block of a memory.
You could also argue that, because deque is 'missing' size reservation methods (capacity/reserve), you have less to worry about.
I highly suggest you read Sutters' GotW on the topic.

How can I implement Python sets in another language (maybe C++)?

I want to translate some Python code that I have already written to C++ or another fast language because Python isn't quite fast enough to do what I want to do. However the code in question abuses some of the impressive features of Python sets, specifically the average O(1) membership testing which I spam within performance critical loops, and I am unsure of how to implement Python sets in another language.
In Python's Time Complexity Wiki Page, it states that sets have O(1) membership testing on average and in worst-case O(n). I tested this personally using timeit and was astonished by how blazingly fast Python sets do membership testing, even with large N. I looked at this Stack Overflow answer to see how C++ sets compare when using find operations to see if an element is a member of a given set and it said that it is O(log(n)).
I hypothesize the time complexity for find is logarithmic in that C++ std library sets are implemented with some sort of binary tree. I think that because Python sets have average O(1) membership testing and worst case O(n), they are probably implemented with some sort of associative array with buckets which can just look up an element with ease and test it for some dummy value which indicates that the element is not part of the set.
The thing is, I don't want to slow down any part of my code by switching to another language (since that is the problem im trying to fix in the first place) so how could I implement my own version of Python sets (specifically just the fast membership testing) in another language? Does anybody know anything about how Python sets are implemented, and if not, could anyone give me any general hints to point me in the right direction?
I'm not looking for source code, just general ideas and links that will help me get started.
I have done a bit of research on Associative Arrays and I think I understand the basic idea behind their implementation but I'm unsure of their memory usage. If Python sets are indeed just really associative arrays, how can I implement them with a minimal use of memory?
Additional note: The sets in question that I want to use will have up to 50,000 elements and each element of the set will be in a large range (say [-999999999, 999999999]).
The theoretical difference betwen O(1) and O(log n) means very little in practice, especially when comparing two different languages. log n is small for most practical values of n. Constant factors of each implementation are easily more significant.
C++11 has unordered_set and unordered_map now. Even if you cannot use C++11, there are always the Boost version and the tr1 version (the latter is named hash_* instead of unordered_*).
Several points: you have, as has been pointed out, std::set and
std::unordered_set (the latter only in C++11, but most compilers have
offered something similar as an extension for many years now). The
first is implemented by some sort of balanced tree (usually a red-black
tree), the second as a hash_table. Which one is faster depends on the
data type: the first requires some sort of ordering relationship (e.g.
< if it is defined on the type, but you can define your own); the
second an equivalence relationship (==, for example) and a hash
function compatible with this equivalence relationship. The first is
O(lg n), the second O(1), if you have a good hash function. Thus:
If comparison for order is significantly faster than hashing,
std::set may actually be faster, at least for "smaller" data sets,
where "smaller" depends on how large the difference is—for
strings, for example, the comparison will often resolve after the first
couple of characters, whereas the hash code will look at every
character. In one experiment I did (many years back), with strings of
30-50 characters, I found the break even point to be about 100000
elements.
For some data types, simply finding a good hash function which is
compatible with the type may be difficult. Python uses a hash table for
its set, and if you define a type with a function __hash__ that always
returns 1, it will be very, very slow. Writing a good hash function
isn't always obvious.
Finally, both are node based containers, which means they use a lot
more memory than e.g. std::vector, with very poor locality. If lookup
is the predominant operation, you might want to consider std::vector,
keeping it sorted and using std::lower_bound for the lookup.
Depending on the type, this can result in a significant speed-up, and
much less memory use.

Boost flat_map container

Working on some legacy code, I am running into memory issues due mainly (I believe) to the extensive use of STL maps (particularly “maps-of-maps”.)
I am looking at Boost flat_map as a possible solution. Does anyone have any firsthand experience with flat_maps, in particular with regards improvements in speed and/or memory usage? I realize of course this can be very dependent on the types of data stored and the manner in which they are stored but still curious of folk’s actual experience.
Can anyone point me to some solid examples?
As an example: there are several cases in this code of a map-of-a-map; that is, a map where the value is another map.
By replacing the “inner” map with a pair of vectors, I reduced the memory footprint 10:1 (3G to 300M). Of course this can slow down searches but for this particular case it doesn’t seem to matter much. And it involved about a day of refactoring and careful testing.
Boost’s flat_map sounds like it might be just what I need but I can’t seem to find out much about it other than the class description on the Boost web site. Looking for some firsthand feedback.
Boost's flat_map is a binary-tree-based map implementation, except that that binary tree is stored as a (sorted) vector of key-value pairs.
You can basically figure out the answers regarding performance (relative to an std::map by yourself based on that fact:
Iterating the map or a large part of it should be super-fast, relatively
Lookup should typically be relatively fast
Adding or removing values is theoretically much slower, but in practice - assuming your key and value types are small and the number of map elements not very high - probably comparable in speed (or even better on small maps - often no allocation is necessary on insert)
etc.
In your case - maps-of-maps - you're going to lose some of the benefit of "flattening things out", since you'll have an outer map with a pointer to an inner map (if not more levels of indirection); but the flat map would at least help you reduce that. Also, supposing you have two levels of maps, you could arrange it so that you store all of the inner maps contiguously (either by constructing the inner maps appropriately or by instantiating them with your own allocator, a trickier affair); in that case, you could replace pointers to maps with map indices, reducing the amount of space they take up and making life easier for the compiler.
You might also want read Boost's documentation of flat_map; and you could also just use the force and read the source (and the source of the underlying flat_tree) - like I have; I dont actually have flat_map experience myself.
I know this is an old question, but this might be of use to someone finding this question.
I found that flat_map was a big improvement in searching, lookup and iterating large maps. The fact the map is using contiguous data in memory also makes inserting faster than you might expect due to great data locality. If you're doing more inserts than lookups in your map then it might not be for you.
Having said that, repeatedly inserting a random value into a sorted vector is faster than the same on a linked list because of the data locality - despite what Big O might tell you. (tested in VS2017 and G++ 4.8).

Should std::list be deprecated?

According to Bjarne Stroustrup's slides from his Going Native 2012 keynote, insertion and deletion in a std::list are terribly inefficient on modern hardware:
Vector beats list massively for insertion and deletion
If this is indeed true, what use cases are left for std::list? Shouldn't it be deprecated then?
Vector and list solve different problems. List provides the guarantee that iterators never become invalidated as you insert and remove other elements. Vector doesn't make that guarantee.
Its not all about performance. So the answer is no. List should not be deprecated.
Edit Beyond this, C++ isn't designed to work solely on "modern hardware." It is intended to be useful across a much wider range of hardware than that. I'm a programmer in the financial industries and I use C++, but other domains such as embedded devices, programmable controllers, heart-lung machines and myriad others are just as important. The C++ language should not be designed solely with the needs of certain domains and the performance of certain classes of hardware in mind. Just because I might not use a list doesn't mean it should be deprecated from the language.
Whether a vector outperforms a list or not also depends on the type of the elements. For example, for int elements vector is indeed very fast as most of the data fits inside the CPU cache and SIMD instructions can be used for the data copying. So the O(n) complexity of vector doesn't have much impact.
But what about larger data types, where copying doesn't translate to a stream operation, and instead data must be fetched from all over the place? Also, what about hardware that doesn't have large CPU caches and possibly also lacks SIMD instructions? C++ is used on much more than just modern desktop and workstation machines. Deprecating std::list is out of the question.
What Stroustrup is saying in that presentation is that before you pick std::list for your data, you should make sure that it's the right choice for your particular situation. In other words, benchmark and profile. It definitely doesn't say you should always pick std::vector.
No, and especially not based on one particular graph. There are instances where list will perform better than vector. See: http://www.baptiste-wicht.com/2012/12/cpp-benchmark-vector-list-deque/
And that's ignoring the non-performance differences, as others have mentioned.
Bjarne's point in that talk wasn't that you shouldn't use list. It was that people make too many assumptions about list's performance that often turn out to be wrong. He was simply justifying the stance that vector should always be your default go-to container type unless you actually find a need for the performance or other semantic characteristics of lists.
std::list is a deque, it has push_front() and pop_front(). It still has a niche role as such, though it may not be the best choice for a deque.
std::list does not reallocate memory, while std::vector may. Sometimes you don't want an item to move in memory (e.g. a stackful coroutine).
Linked lists are related to tree data structures. Both contain links. If we deprecate std::list, then what about tree-based containers?
Of course not. std::list is a different data structure. Comparing different data structure is good indication of its properties, advantages or disadvantages. But each data structure has its advantage.

Fastest way to speed up map<string,int> .find() in c++ . Where the keys are in alphabetical order

I have a map with about 100,000 pairs . Is there any way that i can speed up searching when using find(), given that the keys are in alphabetical order. Also how should i go about doing it. I know that you can specify a new comparator when you create the map. But will that speed up the find() function at all?
Thanks in advance.
[solved] Thanks a bunch guys i have decided to go with a vector and use lower and upperbound to "snip" some of the searching.
Also i am new here is there any way to mark this question as answered , or pick a best answer?
A different comparator will only speed up find if it manages to do the comparison faster (which, for strings will usually be pretty difficult).
If you're basically inserting all the data in order, then doing the searching, it may be faster to use a std::vector with std::lower_bound or std::upper_bound.
If you don't really care about ordering, and just want to find the data as quickly as possible, you might find that std::unordered_map works better for you.
Edit: Just for the record: the way you "might find" or "may find" those things is normally by profiling. Depending on the situation, it might be enough faster that it's pretty obvious even in simple testing, so profiling isn't really necessary, but if there's (much) doubt, or you want to quantify the effect, a profiler is probably the right way to do it.
std::map is already taking advantage of the fact the keys are in alphabetical order - it guarantees that itself. You aren't going to be able to improve it by changing the comparator (one assumes it's already a reasonably efficient string comparison).
Have you considered using unordered_map (aka hash_map in various implementations pre C++11? It should be able to search in O(1) instead of O(log(n)) for std::map.
You could also look into something slightly more exotic, like a trie, but that's not part of the standard library so you'd either have to find one elsewhere or roll your own, so I'd suggest unordered_map is a good place to start.
If you're using std::find to find elements, you should switch to using map::find (you don't really say in your question.) map::find uses the fact that the map is ordered to search much faster.
If that's still not good enough, you might look into a hash container such as unordered_map rather than map.
I've put in a vote for unordered_map but I wanted to also make another point.
One of the things that can hurt performance on modern machines is poor use of the cache. A map is going to have nodes allocated all over the place and there won't be much locality of reference. Also since it has to store a bunch of pointers between nodes it will use up more memory.
At the recent Going Native 2012 conference Bjarne Stroustroup gave an interesting talk that touched on this topic. He compared vector and list performance at a task involving a lot of random insertions and deletions, where it might seem list ought to have dominated, but because of the memory size and layout issue vector was in fact the fastest by far. Take a look at his slides, starting at slide 43.
unordered_map gives you direct access to the element and so it probably means even less hopping around in memory than trying to stick your data in a vector (and thus better performance than vector) so my comment is simply an admonishment to always keep your memory access pattern in mind for performance