I need a fast STL container for finding if an element exists in it, so I tested arrays, vectors, sets, and unordered sets. I thought that sets were optimized for finding elements, because of unique and ordered values, but the fastest for 10 million iterations are:
arrays (0.3 secs)
vectors (1.7 secs)
unordered sets (1.9 secs)
sets (3 secs)
Here is the code:
#include <algorithm>
#include <iostream>
#include <set>
#include <unordered_set>
#include <vector>
int main() {
using std::cout, std::endl, std::set, std::unordered_set, std::vector, std::find;
int i;
const long ITERATIONS = 10000000;
int a[] {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
for (int i = 0; i < ITERATIONS; i++) {
if (find(a, a + 16, rand() % 64) == a + 16) {}
else {}
}
vector<int> v{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
for (i = 0; i < ITERATIONS; i++) {
if (find(v.begin(), v.end(), rand() % 64) == v.end()) {}
else {}
}
set<int> s({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15});
for (i = 0; i < ITERATIONS; i++) {
if (find(s.begin(), s.end(), rand() % 64) == s.end()) {}
else {}
}
unordered_set<int> us({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15});
for (i = 0; i < ITERATIONS; i++) {
if (find(us.begin(), us.end(), rand() % 64) == us.end()) {}
else {}
}
}
Please remember that in C and C++ there is the as if rule!
This means compiler can transform code by any means (even by dropping code) as long as observable result of running code remains unchanged.
Here is godbolt of your code.
Now note what compiler did for if (find(a, a + 16, rand() % 64) == a + 16) {}:
.L206:
call rand
sub ebx, 1
jne .L206
Basically compiler noticed that result of it is not used and remove everything expect calling rand() which has side effects (visible changes in results).
The same happen for std::vector:
.L207:
call rand
sub ebx, 1
jne .L207
And even for std::set and std::unordered_set compiler was able to perform same optimization. The difference you are seeing (you didn't specified how you did that) is just result of initializing all of this variables which is time consuming for more complex containers.
Writing good performance test is hard and should be approached with caution.
There is also second problem with your question. Time complexity of given code.
Searching array and searching std::set and std::unrodered_set scales differently to size of data. For small data set simple array will be fates since its simple implementation and optimal access to memory. As data size grow time complexity of std::find for array will grow as O(n) on other hand slower std::set time to find item will grow as O(log n) and for std::unordered_set it will be constant time O(1). So for small amount of data array will be fastest, for medium size std::set is the winner and if amount of data will be large std::unordered_set will be best.
Take a look on this benchmark example which uses google benchmark.
You are not measuring efficiency, you are measuring performance. And doing so badly.
The effect of address space randomization or just different usernames or other variable in env has up to about 40% effect on speed. That's more than the difference between -O0 and -O2. You are only measuring one single system with one single address space layout 10000000 times. That makes the value about meaningless.
And yet you still manage to figure out that for 16 ints any attempt to be better than just looking at all of them will perform worse. A simple linear search in a single cache line (or two if the layout is bad, more likely) is just simply the best way.
Now try again with 10000000 ints and run the binary 1000 times. Even better if you use a layout randomizer to truly exclude accidents of the layout from the timing.
Note: Iirc the limit for sort where an array with bubble sort is best is somewhere between 32 and 64 ints and find is even simpler.
Related
Suppose that we have a very long array, of, say, int to make the problem simpler.
What is the fastest way (or just a fast way, if it's not the fastest), in C++ to see if an array has more than one common elements in C++?
To clarify, this function should return this:
[2, 5, 4, 3] => false
[2, 8, 2, 5, 7, 3, 4] => true
[8, 8, 5] => true
[1, 2, 3, 4, 1, 7, 1, 1, 7, 1, 2, 2, 3, 4] => true
[9, 1, 12] => false
One strategy is to loop through the array and for each array element loop through the array again to check. However, this can be very costly and expensive (literally O(n^2)). Is there any better way?
(✠Update Below) Insert the array elements to a std::unordered_set and if the insertion fails, it means you have duplicates.
Something like as follows:
#include <iostream>
#include <vector>
#include <unordered_set>
bool has_duplicates(const std::vector<int>& vec)
{
std::unordered_set<int> set;
for (int ele : vec)
if (const auto [iter, inserted] = set.emplace(ele); !inserted)
return true; // has duplicates!
return false;
}
int main()
{
std::vector<int> vec1{ 1, 2, 3 };
std::cout << std::boolalpha << has_duplicates(vec1) << '\n'; // false
std::vector<int> vec2{ 12, 3, 2, 3 };
std::cout << std::boolalpha << has_duplicates(vec2) << '\n'; // true
}
✠Update: As discussed in the comments, this can or may not be the fastest solution. In OP's case, as explained in Marcus Müller's answer, anO(N·log(N)) method would be better, which we can achieve by having a sorted array check for dupes.
Here is a quick benchmark that I made for the two cases "UnorderedSetInsertion" and the "ArraySort". Following are the result for GCC 10.3, C++20, O3:
This is nearly just a sorting problem, just that you can abort the sorting once you've hit a single equality and return true.
So, if you're memory-limited (That's often the case, not actually time-limited), an in-place sorting algorithm that aborts when it encounters to identical elements will do; so, std::sort with a comparator function that raises an exception when it encounters equality. Complexity would be O(N·log(N)), but let's be honest here: the fact that this is probably less indirect in memory addressing then the creation of a tree-like bucket structure might help. In that sense, I can only recommend you actually compare this to JeJos solution – that looks pretty reasonable, too!
The thing here is that there's very likely not a one-size-fits-all solution: what is fastest will depend on the amount of integers we're talking about. Even quadratic complexity might be better than any of our "clever" answers if that keeps memory access nice and linear – I'm almost certain your speed here is not bounded by your CPU, but by the amount of data you need to shuffle to and from RAM.
How about binning data (or create a histogram), and check for mode of the resultant data. A mode > 1 indicates a repeat value.
Consider
a vector of the first n natural numbers, I, I=[0, 1, ...n-1], n<=32.
another vector of naturals, S, S[i]<=2000, for any i=0..n-1, not necessarily unique
a subset of I with m elements, J, 0 <= J[j] < n, for any j=0...m-1
Is it there an efficient way (in terms of CPU cycles/cache friendliness/memory) to sort the elements of J according to S(J)?
C++ code which uses standard algorithms are preferred.
Example:
I = [0, 1, 2, 3, 4]
S = [10, 50, 40, 20, 30]
J = [1, 3, 4]
S(J) = [50, 20, 30]
J sorted according to S(J) = [3, 4, 1]
I've considered working with std::multimap, to get the sorting for 'free', but the machinery behind std::multimap (allocations, etc) seems expensive.
Using std::pair to bind J and S(J) would allow using std::sort. The downside is that extra memory and an extra loop is needed to get the final sorted J.
My take is to sort both J and S(J) simultaneously using S(J) as a criteria in a hand written sort routine. However, writing a sort function in 2019 seems awkward.
Is it a clever way to do this? Is it possible to exploit the fact that n<=32?
My take is to sort both J and S(J) simultaneously using S(J) as a criteria in a hand written sort routine. However, writing a sort function in 2019 seems awkward.
You are on the right track, but you don't need to write your own sort. You can leverage a lambda to get the custom sorting behavior you want while still using std::sort to sort the array for you. What you'll do is take the values supplied to the lambda and use them as indexes into S and the comparing those results. That would give you code like
int main()
{
int S[] = {10, 50, 40, 20, 30};
int J[] = {1, 3, 4};
std::sort(std::begin(J), std::end(J),[&S](auto lhs, auto rhs){ return S[lhs] < S[rhs]; });
for (auto e : J)
{
std::cout << e << " ";
}
}
Which outputs
3 4 1
I need to create a vector/array in the following format:
arr[10] = [1, 2, 3, 4, 5, 6, 7, 8, 9 ,10]
I want to add a new element at the beginning after which all elements are shifted to the right and the last element is deleted. The result should be:
arr[10] = [new_int, 1, 2, 3, 4, 5, 6, 7, 8, 9]
How can I do this in c++? Do I have to write a function or there is already an existing one like .append or .pushback?
If it's an std::vector, the trivial solution is:
vec.pop_back();
vec.insert(vec.begin(), new_int);
The asymptotic complexity is the same as any other method to accomplish this (pop_back() is O(1), insert in head is O(n), no reallocation is ever performed) but it temporarily touches the vector length for no good reason and does overall more bookkeeping.
A better solution (that is fine both for std::vector, std::array as well as for C-style array) is:
std::copy_backward(std::begin(vec), std::end(vec)-1, std::begin(vec)+1);
vec[0] = new_int;
This, again, should have O(n) complexity, but a smaller "offset" (it does exactly what it needs to do, nothing more).
Now, if we move to different data structures the situation is different; with an std::deque you can do as #JesperJuhl shows in his answer; pushing/popping at both ends of a deque costs amortized O(1), so it's reasonably fast for most uses.
Still, if your buffer is fixed in size, generally the natural data structure for the operation you describe is the fixed-size circular buffer; it is not provided by the standard library, but there's an implementation in Boost (besides, it's also a nice exercise to write it yourself, with iterators and everything).
#include <algorithm>
#include <iterator>
...
std::rotate( std::begin(arr), std::end(arr)-1, std::end(arr));
arr[0] = new_int;
The easiest would be to use a std::deque. Then it becomes as simple as
my_deque.pop_back(); // get rid of last element.
my_deque.push_front(42); // put the new element at the front
You can use vector using push and pop. If you want to do it using arrays then you can:
int arr[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9 ,10};
int new_value = 20;
for(int i(0); i < 10 - 2; i++){
arr[i + 1] ^= arr[i];
arr[i] ^= arr[i + 1];
arr[i + 1] ^= arr[i];
}
Or you can use a temporary variable instead of XOR operator:
/*
int tmp = arr[i + 1];
arr[i + 1] = arr[i];
arr[i] = tmp;
*/
arr[0] = new_value;
for(int i = 0; i < 10; i++)
std::cout << arr[i] << ", ";
std::vector<int> arr;
arr.insert(arr.begin(), newvalue);
arr.pop_back();
I want to pick a random PWM pin each time a loop repeats. The pins that are PWM capable in the Arduino UNO are pins: 3,5,6,11,10,9. I tried rnd() but it gives me linear values from a range, same with TrueRandom.Random(1,9).
Well, there are at least two way to do it.
The first (and probably best) way is to load those values into an array of size six, generate a number in the range zero through five and get the value from that position in the array.
In other words, psedo-code such as:
values = [3, 5, 6, 9, 10, 11]
num = values[randomInclusive(0..5)]
In terms of actually implementing that pseudo-code, I'd look at something like:
int getRandomPwmPin() {
static const int candidate[] = {3, 5, 6, 9, 10, 11};
static const int count = sizeof(candidate) / sizeof(*candidate);
return candidate[TrueRandom.random(0, count)];
}
There's also the naive way of doing it, which is to generate numbers in a range and simply throw away those that don't meet your specification (i.e., go back and get another one). This is actually an inferior method as it may take longer to get a suitable number under some circumstances. Technically, it could even take an infinitely(a) long time if suitable values don't appear.
This would be along the lines of (psedo-code):
num = -1 // force entry into loop
while num is not one of 3, 5, 6, 9, 10, 11:
num = randomInclusive(3..11)
which becomes:
int getRandomPwmPin() {
int value;
do {
value = TrueRandom.random(3, 12);
} while ((value == 4) || (value == 7) || (value == 8));
return value;
}
As stated, the former solution is probably the best one. I include the latter only for informational purposes.
(a) Yes, I know. Over an long enough time frame, statistics pretty much guarantees you'll get a useful value. Stop being a pedant about my hyperbole :-)
The trick is to make a list of pins and then pick an entry from the list at random
int pins[]={3,5,6,11,10,9}
int choice = rnd() //in range 0-5
pin = pins[choice]
see Generating random integer from a range to get number in range
I'm curious if anyone here has knowledge on the efficiency of atomics, specifically std::atomic<int>. My problem goes as follows:
I have a data set, say data = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} that is passed into an algorithm algo(begin(data), end(data)). algo partitions the data into chunks and executes each chunk asynchronously, so algo would perform it's operation on, say, 4 different chunks:
{1, 2, 3}
{4, 5, 6}
{7, 8, 9}
{10, 11, 12}
in each separate partition I need to return the count of elements that satisfy a predicate op at the end of each partition
//partition lambda function
{
//'it' corresponds to the position in it's respective partition
if( op(*it) )
count++;
//return the count at the end of this partition
return count;
}
the problem is that I'm going to run into a data race just incrementing 1 variable with 4 chunks executing asynchronously. I was thinking of two possible solutions:
use a std::atomic
the problem here is I know very little about C++'s atomics, and from what i've heard they can be inefficient. Is this true? what results should I expect to see with using atomics to keep track of a count?
use a shared array, where the size is the partition count
I know my shared arrays pretty well so this idea doesn't seem too bad, but I'm unsure how it would hold up when a very small chunk size is given, which would make the shared array keeping track of the count at the end of each partition quite large. It would be useful however as the algorithm doesn't have to wait for anything to finish to increment, it simply places it's respective count in the shared array.
so with both my ideas, I could implement it possible as:
//partition lambda function, count is now atomic
{
//'it' corresponds to the position in it's respective partition
if( op(*it) )
count++;
//return the count at the end of this partition
return count.load();
}
//partition lambda function, count is in shared array that will be be accessed later
//instead of returned
{
int count = 0;
//'it' corresponds to the position in it's respective partition
if( op(*it) )
count++;
//total count at end of each partition. ignore fact that partition_id = 0 wouldn't work
shared_arr[partition_id] = shared_arr[partition_id - 1] + count;
}
any ideas on atomic vs shared_array?