Time Complexity of find operation [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
C++ string::find complexity
What is the time complexity of the find operation that comes built-in with the string library in STL?

The Standard, §21.4.7.2, doesn't give any guarantees as to the complexity.
You can reasonably assume std::basic_string::find takes linear time in the length of the string being searched in, though, as even the naïve algorithm (check each substring for equality) has that complexity, and it's unlikely that the std::string constructor will build a fancy index structure to enable anything faster than that.
The complexity in terms of the pattern being searched for may reasonably vary between linear and constant, depending on the implementation.

As pointed out in comments, standard doesn't specify that.
However, since std::string is a generalized container and it can't make any assumptions about the nature of the string it holds, you can reasonably assume that complexity will be O(n) in case when you search for a single char.

At most, performs as many comparisons as the number of elements in the range [first,last).
http://cplusplus.com/reference/algorithm/find/

Related

How can I implement Python sets in another language (maybe C++)?

I want to translate some Python code that I have already written to C++ or another fast language because Python isn't quite fast enough to do what I want to do. However the code in question abuses some of the impressive features of Python sets, specifically the average O(1) membership testing which I spam within performance critical loops, and I am unsure of how to implement Python sets in another language.
In Python's Time Complexity Wiki Page, it states that sets have O(1) membership testing on average and in worst-case O(n). I tested this personally using timeit and was astonished by how blazingly fast Python sets do membership testing, even with large N. I looked at this Stack Overflow answer to see how C++ sets compare when using find operations to see if an element is a member of a given set and it said that it is O(log(n)).
I hypothesize the time complexity for find is logarithmic in that C++ std library sets are implemented with some sort of binary tree. I think that because Python sets have average O(1) membership testing and worst case O(n), they are probably implemented with some sort of associative array with buckets which can just look up an element with ease and test it for some dummy value which indicates that the element is not part of the set.
The thing is, I don't want to slow down any part of my code by switching to another language (since that is the problem im trying to fix in the first place) so how could I implement my own version of Python sets (specifically just the fast membership testing) in another language? Does anybody know anything about how Python sets are implemented, and if not, could anyone give me any general hints to point me in the right direction?
I'm not looking for source code, just general ideas and links that will help me get started.
I have done a bit of research on Associative Arrays and I think I understand the basic idea behind their implementation but I'm unsure of their memory usage. If Python sets are indeed just really associative arrays, how can I implement them with a minimal use of memory?
Additional note: The sets in question that I want to use will have up to 50,000 elements and each element of the set will be in a large range (say [-999999999, 999999999]).
The theoretical difference betwen O(1) and O(log n) means very little in practice, especially when comparing two different languages. log n is small for most practical values of n. Constant factors of each implementation are easily more significant.
C++11 has unordered_set and unordered_map now. Even if you cannot use C++11, there are always the Boost version and the tr1 version (the latter is named hash_* instead of unordered_*).
Several points: you have, as has been pointed out, std::set and
std::unordered_set (the latter only in C++11, but most compilers have
offered something similar as an extension for many years now). The
first is implemented by some sort of balanced tree (usually a red-black
tree), the second as a hash_table. Which one is faster depends on the
data type: the first requires some sort of ordering relationship (e.g.
< if it is defined on the type, but you can define your own); the
second an equivalence relationship (==, for example) and a hash
function compatible with this equivalence relationship. The first is
O(lg n), the second O(1), if you have a good hash function. Thus:
If comparison for order is significantly faster than hashing,
std::set may actually be faster, at least for "smaller" data sets,
where "smaller" depends on how large the difference is—for
strings, for example, the comparison will often resolve after the first
couple of characters, whereas the hash code will look at every
character. In one experiment I did (many years back), with strings of
30-50 characters, I found the break even point to be about 100000
elements.
For some data types, simply finding a good hash function which is
compatible with the type may be difficult. Python uses a hash table for
its set, and if you define a type with a function __hash__ that always
returns 1, it will be very, very slow. Writing a good hash function
isn't always obvious.
Finally, both are node based containers, which means they use a lot
more memory than e.g. std::vector, with very poor locality. If lookup
is the predominant operation, you might want to consider std::vector,
keeping it sorted and using std::lower_bound for the lookup.
Depending on the type, this can result in a significant speed-up, and
much less memory use.

What is the fastest way to sort array containing numbers of double type in C++? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
What is the fastest way to sort array containing numbers of double type in C++ ? I have two types of arrays first of length 20 and second of length 5000, does length of arrays makes difference for which algorithms is fastest ? Arrays of length 5000 contain on average 28 different values.
http://www.codeproject.com/Articles/38381/STL-Sort-Comparison-Function
For the first question: If your array have only a small set of unique values (like 28 as you say) you may want to consider some sort of counting sort (flavors: radix, pigeonhole, bucket). If you know hard limits and ranges of your array content you may be able to do something good.
But as previously said, for such small array you are probably good with std::sort, unless you have a lot of 5000-element arrays to sort.
For the second question: Length matter (see sky's answer). O(n log n) is the best any normal sort can do. O(n^2) is normally the worst case. O(n^2) means that in worst case your 20 element array would need time corresponding to 20^2 (=400) operations, and your 5000 array time corresponding to 5000^2 (=25million) operations. As you can see larger array means much more time in this case. For your case and a O(n log n) algorithm the 5000 array would need time corresponding to 5000 log 5000 (=18500) operations.
What an operation is and how long it takes depend on the particular implementation, and is in general irrelevant for comparison (and thus ignored with Ordo notation). A slow implementation of a O(n log n) algorithm will still be faster than a fast implementation of a O(n^2) algorithm when the array size is large enough. But for a small array like 20 elements a good low overhead implementation matter most. 400 fast operations will be faster than 26 slow operations. Same comparison for 5000 array give that 25million fast operations would still not be faster than 18500 slow operations.
Another factor is the content of the array. Some algorithms, like insertion sort, are particularly fast (approaching O(n)) on arrays that are in almost correct order, while poor O(n^2) on random input.
By utilizing predefined (known) limitations/ranges on array content (thus not a classified as a normal sort) counting sort can approach O(n), that is, the time is directly proportional to the number of elements. See wikipedia.
Happy research!
I makes a difference, but your best bet is to use std::sort . It internally switches the sort algorithm considered best depending on the input size.
See wikipedia references:
https://en.wikipedia.org/wiki/Sort_%28C++%29
https://en.wikipedia.org/wiki/Introsort
You may want to search for sorting algorithms like Quicksort, merge sort, insert sort, or bubble sort.
Sorting depends heavily on the number of items to be sorted, as can be seen from the notation for sorting algorithms, 'big O notation'. Average number of different values and data type often doesn't make a difference enough in runtime to matter. An algorithm of O(n^2)(bubble sort) has a complexity of the square of the number of elements you have, telling you that it time increases roughly quadratically in regards to number of items to sort. Quicksort has O(n log n) complexity, making it one of the fastest sort methods around.
Bubblesort is the easiest to implement and the slowest in runtime.
Edit: As the comments say, short arrays of only 5000 values don't really have a big difference no matter what algorithm you use, provided it's not something like Bogosort.

How does std::multimap store its elements? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How is a C++ multimap implemented?
C++ reference mentions
Multimaps are typically implemented as binary search trees.
But what is their typical internal representation?
Is it like std::map<Key, std::list<Value> > or similar?
My concern is complexity of insertion and iteration over a set of elements with the same key.
If you want to know the complexity of specific operations, you need look no further than to the standard. The standard has guarantees on the complexity, but implementations are free to satisfy those guarantees any way they wish.
For insertion the complexity is O(lg n), unless you specify an optimal hint every time, in which case the complexity is O(1) amortized. (See details here: http://en.cppreference.com/w/cpp/container/multimap/insert)
For iteration over a set of elements with the same key, the complexity is the same as iteration from any iterator to another. Given that you have already found the iterators, the iteration is linear in the count of items you are iterating over. (Sorry, unable to find a reference for this right now)

Performance std::strstr vs. std::string::find [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
C++ string::find complexity
Recently I noticed that the function std::string::find is an order of magnitude slower than the function std::strstr - in my environment with GCC 4.7 on Linux. The performance difference depends on the lengths of the strings and on the hardware architecture.
There seems to be a simple reason for the difference: std::string::find basically calls std::memcmp in a loop - with time complexity O(m * n). In contrast, std::strstr is highly optimized for the hardware architecture (e.g. with SSE instructions) and uses a more sophisticated string matching algorithm (apparently Knuth-Morris-Pratt).
I was also surprised not to find the time complexities of these two functions in the language documents (i.e. drafts N3290 and N1570). I only found time complexities for char_traits. But that doesn't help, because there is no function for substring search in char_traits.
I would expect, that std::strstr and memmem contain similar optimizations with almost identical performance. And until recently, I assumed that std::string::find uses memmem internally.
The questions are: Is there any good reason, why std::string::find does not use std::memmem? And does it differ in other implementations?
The question is not: What is the best implementation of this function? It is really difficult to argue for C++, if it is slower than C. I wouldn't matter if both implementations would be slow. It is the performance difference that really hurts.
First, what's memmem? I can't find this in the C++ standard, nor the
Posix standard (which contains all of the standard C functions).
Second, any measurement values will depend on the actual data. Using
KMP, for example, will be a pessimisation in a lot of cases; probably
most of the cases where the member functions of std::string are used;
the time to set up the necessary tables will often be more than the
total time of the straightforeward algorithm. Things like O(m*n)
don't mean much when the typical length of the string is short.

Does std::sort check if a vector is already sorted?

I believe that the C++ standard for std::sort does not guarantee O(n) performance on a list that's already sorted. But still, I'm wondering whether to your knowledge any implementations of the STL (GCC, MSVC, etc) make the std::is_sorted check before executing the sort algorithm?
Asked another way, what performance can one expect (without guarantees, of course) from running std::sort on a sorted container?
Side note: I posted some benchmarks for GCC 4.5 with C++0x enabled on my blog. Here's the results:
Implementations are free to use any efficient sorting algorithm they want so this is highly implementation dependant
However I have seen a performance comparison of libstdc++ as used on linux and against libc++ the new C++ library developed by Apple/LLVM. Both these libraries are very efficient on sorted or reverse sorted data (much faster than on a random list) with the new library being considerable faster then the old and recognizing many more patterns.
To be certain you should consider doing your own benchmarks.
No. Also, it's not logical to have is_sorted() called for any STL implementation. Since, is_sorted() is available already as a stand-alone. And many users may not want to waste execution cycles unnecessarily to call that function when they already know that their container is not sorted.
STL also should be following the C++ philosophy: "pay per use".
Wow! Did you have optimizations all the way cranked up?
the results of your code on my platform (note the values on the vertical axis).
I suggest you read this comparison of sorting algorithms, it is very well done and informative, it compares a number of sorting algorithms with each other and with GCC's implementation of std::sort. You will notice, in the charts on the given link, that the performance of std::sort for "almost sorted" and "almost reverse" are linear in the number of elements to sort, that is, O(n). So, no guarantee, but you can easily expect that an almost sorted list will be sorted in almost linear-time. But, of course, it does not do a is_sorted check, and even if it will sort a sorted array in linear-time, it won't be as fast as doing a is_sorted check and skipping the sorting altogether. It is your decision to determine if it is better to check before sorting or not.
The standard sanctions only std::sort implementations with complexity O(n log n):
Complexity: Approximately N log N (where N == last - first) comparisons on the average.
See section 25.3.1.1 Sorting [lib.sort] (ISO/IEC 14882:2003(E)).
Thus, the set of allowed sorting functions is limited, and you are right that it does not guarantee linear complexity.
Ideal behavior for a sort is O(n), but this is not possible in the average case.
Of course the average case is not necessarily the exact case you have right now, so for corner cases, there's not much of a guarantee.
And why would any implementation do that check? What would it gain? -- Nothing in average. A good design rule is not to clutter implementation with optimizations for corner cases which make no difference in average. This example is similar to check for self-assignment. A simple answer: don't do it.
There's no guarantee that it'll check this. Some implementations will do it , others probably won't.
However, if you suspect that your input might already be sorted (or nearly sorted), std::stable_sort might be a better option.