Fastest "trivial" way of shuffling a vector - c++

I am working on a chess engine for some time now. For improving the engine, I wrote some code which loads chess-positions from memory into some tuner code. I have around 1.85B fens on my machine which adds up to 40Gb (24B per position).
After loading, I end up with a vector of positions:
struct Position{
std::bitset<8*24> bits{};
}
void main(){
std::vector<Position> positions{};
// mimic some data loading
for(int i = 0; i < 1.85e9; i++){
positions.push_back(Position{})
}
// ...
}
The data is organised in the following way:
The positions are taken from games where the positions are seperated by just a few moves. Usually about 40-50 consecutive moves come the same game / line and are therefor somewhat equal.
Eventually I will read 16384 position within a single batch and ideally none of those positions come from the same game. Therefor I do some initial sorting before using the data.
My current shuffling method is this:
auto rng = std::default_random_engine {};
std::shuffle(std::begin(positions), std::end(positions), rng);
Unfortunately this takes quiet some time (about 1-2 minutes). Since I dont require perfect shuffles, I assume that some easier shuffles exist.
My second aproach was:
for(int i = 0; i < positions.size(); i++){
std::swap(positions[i], positions[(i*16384) % positions.size()]);
}
which will ensure that there are not going to be positions coming from the same game within a single batch and are evenly spaces by 16384 entries.
I was wondering if there is some even simpler, faster solution. Especially considering that the modulo-operator requires quiet some clock cycles.
I am happy for any "trivial" solution.
Greetings
Finn

There is a tradeoff to be made: Shuffling a a std::vector<size_t> of indices can be expected to be cheaper than shuffling a std::vector<Position> at the cost of an indirection when accessing the Positions via shuffled indices. Actually the example on cppreference for std::iota is doing something along that line (it uses iterators):
#include <algorithm>
#include <iostream>
#include <list>
#include <numeric>
#include <random>
#include <vector>
int main()
{
std::list<int> l(10);
std::iota(l.begin(), l.end(), -4);
std::vector<std::list<int>::iterator> v(l.size());
std::iota(v.begin(), v.end(), l.begin());
std::shuffle(v.begin(), v.end(), std::mt19937{std::random_device{}()});
std::cout << "Contents of the list: ";
for(auto n: l) std::cout << n << ' ';
std::cout << '\n';
std::cout << "Contents of the list, shuffled: ";
for(auto i: v) std::cout << *i << ' ';
std::cout << '\n';
}
Instead of shuffling the list directly, a vector of iterators (with a std::vector indices woud work as well) is shuffled and std::shuffle only needs to swap iterators (/indices) rather than the more costly actual elements (in the example the "costly to swap" elements are just ints).
For a std::list I don't expect a big difference between iterating in order or iterating via shuffled iterators. On the other hand, for a std::vector I do expect a significant impact. Hence, I would shuffle indices, then rearrange the vector once, and profile to see which performs better.
PS: As noted in comments, std::shuffle is already the optimal algorithm to shuffle a range of elements. However, note that it swaps each element twice on average (possible implementation from cppreference):
for (diff_t i = n-1; i > 0; --i) {
using std::swap;
swap(first[i], first[D(g, param_t(0, i))]);
On the other hand, shuffling the indices and then rearranging the vector only requires to copy/move each element once (when additional memory is available).

Randomness won't guarantee that samplings don't get positions from the same game which you wanted to avoid. I propose following pseudo-shuffle that does prevent samplings from the same game (given sufficiently large population):
let N be the length of the longest game + 1
let E be iterator to the end
let i be random index
while E != begin
if i > E - begin
i %= E - begin
--N
Swap elements at i and std::prev(E)
Decrement E
i += N

Related

What is the difference between range-v3 views::drop and views::drop_exactly?

Can someone explain the difference between range-v3's view adaptors drop and drop_exactly?
One difference I've observed is that if the number of elements in the range that is piped to these views is less than the argument to the view adaptors, drop seems to do the right thing, while drop_exactly seems to invoke UB.
When the argument is less than the number of elements in the range that is piped to these views, they both seem to work the same:
#include <iostream>
#include <vector>
#include <range/v3/all.hpp>
namespace rv = ranges::views;
int main()
{
std::vector<int> v { 1, 2, 3, 4, 5};
for (int i : v | rv::drop(3))
std::cout << i; // prints 45
for (int i : v | rv::drop(7))
std::cout << i; // prints nothing
for (int i : v | rv::drop_exactly(3))
std::cout << i; // prints 45
for (int i : v | rv::drop_exactly(7))
std::cout << i; // prints garbage and crashes
}
Here's the code.
From the documentation for drop_exactly:
Given a source range and an integral count, return a range consisting
of all but the first count elements from the source range. The
source range must have at least that many elements.
While the documentation for drop states:
Given a source range and an integral count, return a range consisting
of all but the first count elements from the source range, or an
empty range if it has fewer elements.
emphasis added
I'm guessing that drop_exactly avoids bounds checks and therefore has the potential to be slightly more performant at the cost of maybe running past the end of the piped-in container, while drop apparently performs bounds checks to make sure you don't.
This is consistent with what you see. If you print stuff from begin()+7 up to begin()+5 (aka end()) of a std::vector, and the abort condition is implemented with != instead of <, then you will continue to print the junk data that sits in the space allocated by the vector until at some point you run over the allocated chunk and the operating system steps in and segfaults your binary.
So, if you know the container to have as many entries as you wish to drop use the faster drop_exactly, otherwise use drop.

Find uncommon elements using hashing

I think this is a fairly common question but I didn't find any answer for this using hashing in C++.
I have two arrays, both of the same lengths, which contain some elements, for example:
A={5,3,5,4,2}
B={3,4,1,2,1}
Here, the uncommon elements are: {5,5,1,1}
I have tried this approach- iterating a while loop on both the arrays after sorting:
while(i<n && j<n) {
if(a[i]<b[j])
uncommon[k++]=a[i++];
else if (a[i] > b[j])
uncommon[k++]=b[j++];
else {
i++;
j++;
}
}
while(i<n && a[i]!=b[j-1])
uncommon[k++]=a[i++];
while(j < n && b[j]!=a[i-1])
uncommon[k++]=b[j++];
and I am getting the correct answer with this. However, I want a better approach in terms of time complexity since sorting both arrays every time might be computationally expensive.
I tried to do hashing but couldn't figure it out entirely.
To insert elements from arr1[]:
set<int> uncommon;
for (int i=0;i<n1;i++)
uncommon.insert(arr1[i]);
To compare arr2[] elements:
for (int i = 0; i < n2; i++)
if (uncommon.find(arr2[i]) != uncommon.end())
Now, what I am unable to do is to send only those elements to the uncommon array[] which are uncommon to both of them.
Thank you!
First of all, std::set does not have anything to do with hashing. Sets and maps are ordered containers. Implementations may differ, but most likely it is a binary search tree. Whatever you do, you wont get faster that nlogn with them - the same complexity as sorting.
If you're fine with nlogn and sorting, I'd strongly advice just using set_symmetric_difference algorithm https://en.cppreference.com/w/cpp/algorithm/set_symmetric_difference , it requires two sorted containers.
But if you insist on an implementation relying on hashing, you should use std::unordered_set or std::unordered_map. This way you can be faster than nlogn. You can get your answer in nm time, where n = a.size() and m = b.size(). You should create two unordered_set`s: hashed_a, hashed_b and in two loops check what elements from hashed_a are not in hashed_b, and what elements in hashed_b are not in hashed_a. Here a pseudocode:
create hashed_a and hashed_b
create set_result // for the result
for (a_v : hashed_a)
if (a_v not in hashed_b)
set_result.insert(a_v)
for (b_v : hashed_b)
if (b_v not in hashed_a)
set_result.insert(b_v)
return set_result // it holds the symmetric diference, which you need
UPDATE: as noted in the comments, my answer doesn't count for duplicates. The easiest way to modify it for duplicates would be to use unordered_map<int, int> with the keys for elements in the set and values for number of encounters.
First, you need to find a way to distinguish between the same values contained in the same array (for ex. 5 and 5 in the first array, and 1 and 1 in the second array). This is the key to reducing the overall complexity, otherwise you can't do better than O(nlogn). A good possible algorithm for this task is to create a wrapper object to hold your actual values, and put in your arrays pointers to those wrapper objects with actual data, so your pointer addresses will serve as a unique identifier for objects. This wrapping will cost you just O(n1+n2) operations, but also an additional O(n1+n2) space.
Now your problem is that you have in both arrays only elements unique to each of those arrays, and you want to find the uncommon elements. This means the (Union of both array elements) - (Intersection of both array elements). Therefore, all you need to do is to push all the elements of the first array into a hash-map (complexity O(n1)), and then start pushing all the elements of the second array into the same hash-map (complexity O(n2)), by detecting the collisions (equality of an element from first array with an element from the second array). This comparison step will require O(n2) comparisons in the worst case. So for the maximum performance optimization you could have checked the size of the arrays before starting pushing the elements into the hash-map, and swap the arrays so that the first push will take place with the longest array. Your overall algorithm complexity would be O(n1+n2) pushes (hashings) and O(n2) comparisons.
The implementation is the most boring stuff, so I let it to you ;)
A solution without sorting (and without hashing but you seem to care more about complexity then the hashing itself) is to notice the following : an uncommon element e is an element that is in exactly one multiset.
This means that the multiset of all uncommon elements is the union between 2 multisets:
S1 = The element in A that are not in B
S2 = The element in B that are not in A
Using the std::set_difference, you get:
#include <set>
#include <vector>
#include <iostream>
#include <algorithm>
int main() {
std::multiset<int> ms1{5,3,5,4,2};
std::multiset<int> ms2{3,4,1,2,1};
std::vector<int> v;
std::set_difference( ms1.begin(), ms1.end(), ms2.begin(), ms2.end(), std::back_inserter(v));
std::set_difference( ms2.begin(), ms2.end(), ms1.begin(), ms1.end(), std::back_inserter(v));
for(int e : v)
std::cout << e << ' ';
return 0;
}
Output:
5 5 1 1
The complexity of this code is 4.(N1+N2 -1) where N1 and N2 are the size of the multisets.
Links:
set_difference: https://en.cppreference.com/w/cpp/algorithm/set_difference
compiler explorer: https://godbolt.org/z/o3KGbf
The Question can Be solved in O(nlogn) time-complexity.
ALGORITHM
Sort both array with merge sort in O(nlogn) complexity. You can also use sort-function. For example sort(array1.begin(),array1.end()).
Now use two pointer method to remove all common elements on both arrays.
Program of above Method
int i = 0, j = 0;
while (i < array1.size() && j < array2.size()) {
// If not common, print smaller
if (array1[i] < array2[j]) {
cout << array1[i] << " ";
i++;
}
else if (array2[j] < array1[i]) {
cout << array2[j] << " ";
j++;
}
// Skip common element
else {
i++;
j++;
}
}
Complexity of above program is O(array1.size() + array2.size()). In worst case say O(2n)
The above program gives the uncommon elements as output. If you want to store them , just create a vector and push them into vector.
Original Problem LINK

Set_Intersection with repeated values

I think the set_intersection STL function described here: http://www.cplusplus.com/reference/algorithm/set_intersection/
is not really a set intersection in the mathematical sense. Suppose that the examples given I change the lines:
int first[] = {5,10,15,20,20,25};
int second[] = {50,40,30,20,10,20};
I would like to get 10 20 20 as a result. But I only get unique answers.
Is there a true set intersection in STL?
I know it's possible with a combination of merges and set_differences, btw. Just checking if I'm missing something obvious.
I would like to get 10 20 20 as a result. But I only get unique answers. Is there a true set intersection in STL?
std::set_intersection works how you want.
You probably get the wrong answer because you didn't update the code properly. If you change the sets to have 6 elements you need to update the lines that sort them:
std::sort (first,first+5); // should be first+6
std::sort (second,second+5); // should be second+6
And also change the call to set_intersection to use first+6 and second+6. Otherwise you only sort the first 5 elements of each set, and only get the intersection of the first 5 elements.
Obviously if you don't include the repeated value in the input, it won't be in the output. If you change the code correctly to include all the input values it will work as you want (live example).
cplusplus.com is not a good reference, if you look at http://en.cppreference.com/w/cpp/algorithm/set_intersection you will see it clearly states the behaviour for repeated elements:
If some element is found m times in [first1, last1) and n times in [first2, last2), the first std::min(m, n) elements will be copied from the first range to the destination range.
Even the example at cplusplus.com is bad, it would be simpler, and harder to introduce your bug, if it was written in idiomatic modern C++:
#include <iostream> // std::cout
#include <algorithm> // std::set_intersection, std::sort
#include <vector> // std::vector
int main () {
int first[] = {5,10,15,20,20,25};
int second[] = {50,40,30,20,10,20};
std::sort(std::begin(first), std::end(first));
std::sort(std::begin(second), std::end(second));
std::vector<int> v;
std::set_intersection(std::begin(first), std::end(first),
std::begin(second), std::end(second),
std::back_inserter(v));
std::cout << "The intersection has " << v.size() << " elements:\n";
for (auto i : v)
std::cout << ' ' << i;
std::cout << '\n';
}
This automatically handles the right number of elements, without ever having to explicitly say 5 or 6 or any other magic number, and without having to create initial elements in the output vector and then resize it to remove them again.
set_intersection requires both ranges to be sorted. In the data you've given, second is not sorted.
If you sort it first, you should get your expected answer.

Find elements in a vector which lie within specified ranges

I have a vector of integer elements in sorted. An example is given below:
vector<int> A ={3,4,5,9,20,71,89,92,100,103,109,110,121,172,189,194,198};
Now given the following "start" and "end" ranges I want to find out which elements of vector A fall into the start and end ranges.
int startA=4; int endA=8;
int startB=20; int endB=99;
int startA=120; int endC=195;
For example,
elements lying in range startA and startB are: {4,5}
elements lying in range startA and startB are: {20,71,89,92}
elements lying in range startC and startC are: {121,172,189,194}
One way to do this is to iterate over all elements of "A" and check whether they lie between the specified ranges. Is there some other more efficient way to find out the elements in the vector satisfying a given range
One way to do this is to iterate over all elements of "A" and check whether they lie between the specified ranges. Is there some other more efficient way to find out the elements in the vector satisfying a given range
If the vector is sorted, as you have shown it to be, you can use binary search to locate the index of the element that is higher than the lower value of the range and index of element that is lower than the higher value of the range.
That will make your search O(log(N)).
You can use std::lower_bound and std::upper_bound, which requires the container to be partially ordered, which is true in your case.
If the vector is not sorted, linear iteration is the best you can do.
If the vector is sorted all you need to do is to use dedicated functions to find your start range iterator and end range iterator - std::lower_bound and std::upper_bound. Eg.:
#include <vector>
#include <algorithm>
#include <iostream>
int main() {
std::vector<int> A ={3,4,5,9,20,71,89,92,100,103,109,110,121,172,189,194,198};
auto start = std::lower_bound(A.begin(), A.end(), 4);
auto end = std::upper_bound(A.begin(), A.end(), 8);
for (auto it = start; it != end; it++) {
std::cout << *it << " ";
}
std::cout << std::endl;
}
//or the C++1z version (works in VS2015u3)
int main() {
std::vector<int> A ={3,4,5,9,20,71,89,92,100,103,109,110,121,172,189,194,198};
std::copy(std::lower_bound(A.begin(), A.end(), 4),
std::upper_bound(A.begin(), A.end(), 8),
std::ostream_iterator<int>(cout, " "));
std::cout << std::endl;
}
This however will work only if startX <= endX so you may want to test the appropriate condition before running it with arbitrary numbers...
Searching bound iterators using std::lower_bound and std::upper_bound will cost O(log(N)) however it has to be stated that iterating through the range of elements in average case is O(N) and the range may contain all the elements in your vector...
The best way I can think is to apply modified binary search twice and find two indices in the vector arr and then print all items in between this range . Time complexity will be O(log n).
A modified form of binary search looks like:(PS its for arrays, also applicable for vector):
int binary_search(int *arr,int start,int end,int key)
{
if(start==end)
{
if(arr[start]==key){return start+1;}
else if(arr[start]>key&&arr[start-1]<=key){return start;}
else return 0;
}
int mid=(start+end)/2;
if(arr[mid]>key && arr[mid-1]<=key)return mid;
else if(arr[mid]>key)return binary_search(arr,start,mid-1,key);
else return binary_search(arr,mid+1,end,key);
}
If range of integers of vector A is not wide, bitmap is worth the consideration.
Let's assume all integers of A are positive and are in between 0 ... 1024, the bitmap can be built with:
#include <bitset>
// ...
// If fixed size is not an option
// consider vector<bool> or boost::dynamic_bitset
std::bitset<1024> bitmap;
for(auto i : A)
bitmap.set(i);
That takes N iterations to set bits, and N/8 for storing bits. With the bitmap, one can match elements as follows:
std::vector<int> result;
for(auto i = startA; i < endA; ++i) {
if (bitmap[i]) result.emplace_back(i);
}
Hence speed of the matching depends on size of range rather than N. This solution should be attractive when you have many limited ranges to match.

Deletion from vector took less time than deletion from list. Why?

In C++ manual I found next:
Vectors are relatively efficient adding or removing elements from its
end. For operations that involve inserting or removing elements at
positions other than the end, they perform worse than the others, and
have less consistent iterators and references than lists and
forward_lists.
Also, in 'complexity' of 'erase' method of vector I found next:
Linear on the number of elements erased (destructions) plus the number
of elements after the last element deleted (moving).
In 'complexity' of 'erase' method of list next:
Linear in the number of elements erased (destructions).
But when I tested it in the 30 millions elements in each container (I deleted from 24357 element to 2746591 element), I got that deleting from vector took 5 ms, but from list 8857 ms. Difference is huge and confusing...
Here is my code:
#include "stdafx.h"
#include <vector>
#include <list>
#include <iostream>
#include <ctime>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
const long int x = 30000000;
vector<char> v;
vector<char>::iterator itv1, itv2;
list<char> l;
list<char>::iterator itl1, itl2;
unsigned start, end;
long int first, last;
cout << "Please enter first position: \n";
cin >> first;
cout << "Please enter last position: \n";
cin >> last;
for (long int i = 0; i < x; ++i) {
char c;
c = (rand() % 26) + 'a';
v.push_back(c);
l.push_back(c);
}
cout << "Starting deletion\n";
start = clock();
itv1 = v.begin() + first;
itv2 = v.begin() + last;
v.erase(itv1, itv2);
end = clock();
cout << "End " << end-start << "\n";
start = clock();
itl1 = itl2 = l.begin();
advance(itl1, first);
advance(itl2, last);
l.erase(itl1, itl2);
end = clock();
cout << "End " << end-start << "\n";
return 0;
}
Could you explain - what causes such difference? My opinion - moving iterators in list much slower than in vectors - but I don't sure.
Many thanks!
In your case, likely because you're not measure the erase time, you're measuring the time taken for two advance calls and the erase call.
But more generally: because O() complexity only tells you about the algorithmic complexity not the actual time taken. O(1) can have a huge constant time value. Worse, the complexity is theoretical; it does not consider the realities of how hardware works.
In fact because the vector delete accesses memory in a linear fashion it can be efficiently cached and predicted whilst the list delete operates in a random access fashion. This can mean that vector delete is faster in practice than list delete when you have a small vector.
Erasing a range of elements from a vector requires merely the move of all trailing elements forward, to the start of the gap. This can be done with a memory move instruction, which is very efficient. It depends on the number of trailing elements and not on the number of deleted elements.
Deleting the same number of elements from a list requires iterating over the deleted range in the list and returning each element to the dynamic memory management which is clearly dependent on the number of elements you delete.
Later
Compare a delete of a range of 1000 near the start of the vector with the same operation almost at the end, and then do the same with the list. I predict that the vector will be slower in the first case and (much) faster in the second case.
And here's the result:
Please enter first position:
1
Please enter last position:
1000
Starting deletion
End 10000
End 0
/tmp$ ./del
Please enter first position:
29999000
Please enter last position:
29999999
Starting deletion
End 0
End 360000
:-)
It depends on the task you want to do.
Your task takes advantage of random access and contains only a single erase(): this plays to the strengths of vector.
I think a more interesting task would be to iterate the list and the vector one element at a time deleting every other element.
This forces sequential access and multiple calls to erase(): this will play to the strength of list.
You are deleting all elements with a single call to erase, that means when you delete from vector you get an O(n), but only once. When you delete from list, you will have to iterate to the position 1 (itl1) and to position 2 (itl2), also if you are the deleting a lot of elements, the erase method will have a lot of elements to erase. In other words, unless you are erasing few elements from the begining of the list, you will also have O(n) for lists. Note that for iterating through elements in a vector is much faster than in a list, which may be the cause of those results.
Try deleting only the first element, and you should see the list being much faster than vector.
The textbook answer is that list will be faster, but the textbook isn't always right! I can't prove it, but I suspect it is due to the fact that modern computers have special circuitry that allows them to shift blocks of memory around very fast. So while vector deletion is O(N) in some academic sense, in reality it boils down to (can boil down to) a single hardware operation, which ends up being faster than all the traversal and pointer fiddling you have to do when your remove an element from a list.