VexCL: count amount of values in a vector above minimum - c++

Using VexCL in C++ I am trying to count all values in a vector above a certain minimum and I would like to perform this count on the device. The default Reductors only provide methods for MIN, MAX and SUM and the examples do not show very clear how to perform such a operation. This code is slow as it is probably executed on the host instead of the device:
int amount = 0;
int minimum = 5;
for (vex::vector<int>::iterator i = vector.begin(); i != vector.end(); ++i)
{
if (*i >= minimum)
{
amount++;
}
}
The vector I am using will consists of a large amount of values, say millions and mostly zero's. Besides the amount of values that are above the minimum, I also would like to retrieve a list of vector-ID's which contains these values. Is this possible?

If you only needed to count elements above the minimum, this would be as simple as
vex::Reductor<int, vex::SUM> sum(ctx);
int amount = sum( vec >= minimum );
The vec >= minimum expression results in a sequence of ones and zeros, and sum then counts ones.
Now, since you also need to get the positions of the elements above the minimum, it gets a bit more complicated:
#include <iostream>
#include <vexcl/vexcl.hpp>
int main() {
vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
// Input vector
vex::vector<int> vec(ctx, {1, 3, 5, 2, 6, 8, 0, 2, 4, 7});
int n = vec.size();
int minimum = 5;
// Put result of (vec >= minimum) into key, and element indices into pos:
vex::vector<int> key(ctx, n);
vex::vector<int> pos(ctx, n);
key = (vec >= minimum);
pos = vex::element_index();
// Get number of interesting elements in vec.
vex::Reductor<int, vex::SUM> sum(ctx);
int amount = sum(key);
// Sort pos by key in descending order.
vex::sort_by_key(key, pos, vex::greater<int>());
// First 'amount' of elements in pos now hold indices of interesting
// elements. Lets use slicer to extract them:
vex::vector<int> indices(ctx, amount);
vex::slicer<1> slice(vex::extents[n]);
indices = slice[vex::range(0, amount)](pos);
std::cout << "indices: " << indices << std::endl;
}
This gives the following output:
indices: {
0: 2 4 5 9
}

#ddemidov
Thanks for your help, it is working. However, it is much slower than my original code which copies the device vector to the host and sorts using Boost. Below is the sample code with some timings:
#include <iostream>
#include <cstdio>
#include <vexcl/vexcl.hpp>
#include <vector>
#include <boost/range/algorithm.hpp>
int main()
{
clock_t start, end;
// initialize vector with random numbers
std::vector<int> hostVector(1000000);
for (int i = 0; i < hostVector.size(); ++i)
{
hostVector[i] = rand() % 20 + 1;
}
// copy to device
vex::Context cpu(vex::Filter::Type(CL_DEVICE_TYPE_CPU) && vex::Filter::Any);
vex::Context gpu(vex::Filter::Type(CL_DEVICE_TYPE_GPU) && vex::Filter::Any);
vex::vector<int> vectorCPU(cpu, 1000000);
vex::vector<int> vectorGPU(gpu, 1000000);
copy(hostVector, vectorCPU);
copy(hostVector, vectorGPU);
// sort results on CPU
start = clock();
boost::sort(hostVector);
end = clock();
cout << "C++: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;
// sort results on OpenCL
start = clock();
vex::sort(vectorCPU, vex::greater<int>());
end = clock();
cout << "vexcl CPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;
start = clock();
vex::sort(vectorGPU, vex::greater<int>());
end = clock();
cout << "vexcl GPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;
return 0;
}
which results in:
C++: 17 ms
vexcl CPU: 737 ms
vexcl GPU: 1670 ms
using an i7 3770 CPU and a (slow) HD4650 graphics card. As I'v read OpenCL should be able to perform fast sortings on large vertices. Do you have any advice how to perform a fast sort using OpenCL and vexcl?

Related

Divide elements of a sorted array into least number of groups such that difference between the elements of the new array is less than or equal to 1

How to divide elements in an array into a minimum number of arrays such that the difference between the values of elements of each of the formed arrays does not differ by more than 1?
Let's say that we have an array: [4, 6, 8, 9, 10, 11, 14, 16, 17].
The array elements are sorted.
I want to divide the elements of the array into a minimum number of array(s) such that each of the elements in the resulting arrays do not differ by more than 1.
In this case, the groupings would be: [4], [6], [8, 9, 10, 11], [14], [16, 17]. So there would be a total of 5 groups.
How can I write a program for the same? Or you can suggest algorithms as well.
I tried the naive approach:
Obtain the difference between consecutive elements of the array and if the difference is less than (or equal to) 1, I add those elements to a new vector. However this method is very unoptimized and straight up fails to show any results for a large number of inputs.
Actual code implementation:
#include<cstdio>
#include<iostream>
#include<vector>
using namespace std;
int main() {
int num = 0, buff = 0, min_groups = 1; // min_groups should start from 1 to take into account the grouping of the starting array element(s)
cout << "Enter the number of elements in the array: " << endl;
cin >> num;
vector<int> ungrouped;
cout << "Please enter the elements of the array: " << endl;
for (int i = 0; i < num; i++)
{
cin >> buff;
ungrouped.push_back(buff);
}
for (int i = 1; i < ungrouped.size(); i++)
{
if ((ungrouped[i] - ungrouped[i - 1]) > 1)
{
min_groups++;
}
}
cout << "The elements of entered vector can be split into " << min_groups << " groups." << endl;
return 0;
}
Inspired by Faruk's answer, if the values are constrained to be distinct integers, there is a possibly sublinear method.
Indeed, if the difference between two values equals the difference between their indexes, they are guaranteed to belong to the same group and there is no need to look at the intermediate values.
You have to organize a recursive traversal of the array, in preorder. Before subdividing a subarray, you compare the difference of indexes of the first and last element to the difference of values, and only subdivide in case of a mismatch. As you work in preorder, this will allow you to emit pieces of the groups in consecutive order, as well as detect to the gaps. Some care has to be taken to merge the pieces of the groups.
The worst case will remain linear, because the recursive traversal can degenerate to a linear traversal (but not worse than that). The best case can be better. In particular, if the array holds a single group, it will be found in time O(1). If I am right, for every group of length between 2^n and 2^(n+1), you will spare at least 2^(n-1) tests. (In fact, it should be possible to estimate an output-sensitive complexity, equal to the array length minus a fraction of the lengths of all groups, or similar.)
Alternatively, you can work in a non-recursive way, by means of exponential search: from the beginning of a group, you start with a unit step and double the step every time, until you detect a gap (difference in values too large); then you restart with a unit step. Here again, for large groups you will skip a significant number of elements. Anyway, the best case can only be O(Log(N)).
I would suggest encoding subsets into an offset array defined as follows:
Elements for set #i are defined for indices j such that offset[i] <= j < offset[i+1]
The number of subsets is offset.size() - 1
This only requires one memory allocation.
Here is a complete implementation:
#include <cassert>
#include <iostream>
#include <vector>
std::vector<std::size_t> split(const std::vector<int>& to_split, const int max_dist = 1)
{
const std::size_t to_split_size = to_split.size();
std::vector<std::size_t> offset(to_split_size + 1);
offset[0] = 0;
size_t offset_idx = 1;
for (std::size_t i = 1; i < to_split_size; i++)
{
const int dist = to_split[i] - to_split[i - 1];
assert(dist >= 0); // we assumed sorted input
if (dist > max_dist)
{
offset[offset_idx] = i;
++offset_idx;
}
}
offset[offset_idx] = to_split_size;
offset.resize(offset_idx + 1);
return offset;
}
void print_partition(const std::vector<int>& to_split, const std::vector<std::size_t>& offset)
{
const std::size_t offset_size = offset.size();
std::cout << "\nwe found " << offset_size-1 << " sets";
for (std::size_t i = 0; i + 1 < offset_size; i++)
{
std::cout << "\n";
for (std::size_t j = offset[i]; j < offset[i + 1]; j++)
{
std::cout << to_split[j] << " ";
}
}
}
int main()
{
std::vector<int> to_split{4, 6, 8, 9, 10, 11, 14, 16, 17};
std::vector<std::size_t> offset = split(to_split);
print_partition(to_split, offset);
}
which prints:
we found 5 sets
4
6
8 9 10 11
14
16 17
Iterate through the array. Whenever the difference between 2 consecutive element is greater than 1, add 1 to your answer variable.
`
int getPartitionNumber(int arr[]) {
//let n be the size of the array;
int result = 1;
for(int i=1; i<n; i++) {
if(arr[i]-arr[i-1] > 1) result++;
}
return result;
}
`
And because it is always nice to see more ideas and select the one that suites you best, here the straight forward 6 line solution. Yes, it is also O(n). But I am not sure, if the overhead for other methods makes it faster.
Please see:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
using Data = std::vector<int>;
using Partition = std::vector<Data>;
Data testData{ 4, 6, 8, 9, 10, 11, 14, 16, 17 };
int main(void)
{
// This is the resulting vector of vectors with the partitions
std::vector<std::vector<int>> partition{};
// Iterating over source values
for (Data::iterator i = testData.begin(); i != testData.end(); ++i) {
// Check,if we need to add a new partition
// Either, at the beginning or if diff > 1
// No underflow, becuase of boolean shortcut evaluation
if ((i == testData.begin()) || ((*i) - (*(i-1)) > 1)) {
// Create a new partition
partition.emplace_back(Data());
}
// And, store the value in the current partition
partition.back().push_back(*i);
}
// Debug output: Copy all data to std::cout
std::for_each(partition.begin(), partition.end(), [](const Data& d) {std::copy(d.begin(), d.end(), std::ostream_iterator<int>(std::cout, " ")); std::cout << '\n'; });
return 0;
}
Maybe this could be a solution . . .
How do you say your approach is not optimized? If your is correct, then according to your approach, it takes O(n) time complexity.
But you can use binary-search here which can optimize in average case. But in worst case this binary search can take more than O(n) time complexity.
Here's a tips,
As the array sorted so you will pick such a position whose difference is at most 1.
Binary search can do this in simple way.
int arr[] = [4, 6, 8, 9, 10, 11, 14, 16, 17];
int st = 0, ed = n-1; // n = size of the array.
int partitions = 0;
while(st <= ed) {
int low = st, high = n-1;
int pos = low;
while(low <= high) {
int mid = (low + high)/2;
if((arr[mid] - arr[st]) <= 1) {
pos = mid;
low = mid + 1;
} else {
high = mid - 1;
}
}
partitions++;
st = pos + 1;
}
cout<< partitions <<endl;
In average case, it is better than O(n). But in worst case (where the answer would be equal to n) it takes O(nlog(n)) time.

A vector as a patchwork of two other vectors

Subset a vector
Below is the benchmark of two different solutions to subset a vector
#include <vector>
#include <iostream>
#include <iomanip>
#include <sys/time.h>
using namespace std;
int main()
{
struct timeval timeStart,
timeEnd;
// Build the vector 'whole' to subset
vector<int> whole;
for (int i = 0 ; i < 10000000 ; i++)
    {
whole.push_back(i);
}
// Solution 1 - Use a for loops
gettimeofday(&timeStart, NULL);
vector<int> subset1;
subset1.reserve(9123000 - 1200);
for (int i = 1200 ; i < 9123000 ; i++)
{
subset1.push_back(i);
}
gettimeofday(&timeEnd, NULL);
cout << "Solution 1 took " << ((timeEnd.tv_sec - timeStart.tv_sec) * 1000000 + timeEnd.tv_usec - timeStart.tv_usec) << " us" << endl;
// Solution 2 - Use iterators and constructor
gettimeofday(&timeStart, NULL);
vector<int>::iterator first = whole.begin() + 1200;
vector<int>::iterator last = whole.begin() + 9123000;
vector<int> subset2(first, last);
gettimeofday(&timeEnd, NULL);
cout << "Solution 2 took " << ((timeEnd.tv_sec - timeStart.tv_sec) * 1000000 + timeEnd.tv_usec - timeStart.tv_usec) << " us" << endl;
}
On my old laptop, it outputs
Solution 1 took 243564 us
Solution 2 took 164220 us
Clearly solution 2 is faster.
Make a patchwork of two vectors
I would like to create a vector as a patchwork of two different vectors of the same size. The vector starts as one and then takes the value of the other and back and forth. I guess I don't fully understand how to copy values to a vector by using iterator pointing to elements in another vector. The only implementation I can think of requires using an analogous to solution 1 above. Something like...
#include <vector>
#include <iostream>
#include <cmath>
#include <iomanip>
#include <sys/time.h>
#include <limits.h>
using namespace std;
int main()
{
// input
vector<int> breakpoints = {2, 5, 7, INT_MAX};
vector<int> v1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
vector<int> v2 = { 10, 20, 30, 40, 50, 60, 70, 80, 90 };
// Create output
vector<int> ExpectedOutput;
ExpectedOutput.reserve(v1.size());
int origin = 0;
int breakpoints_index = 0;
for (int i = 0 ; i < v1.size() ; i++)
{
if (origin)
{
ExpectedOutput.push_back(v1[i]);
} else
{
ExpectedOutput.push_back(v2[i]);
}
if (breakpoints[breakpoints_index] == i)
{
origin = !origin;
breakpoints_index++;
}
}
// print output
cout << "output: ";
for (int i = 0 ; i < ExpectedOutput.size() ; i++)
{
cout << ExpectedOutput[i] << " ";
}
cout << endl;
return 0;
}
which outputs
output: 10 20 30 4 5 6 70 80 9
It feels like there must be a better solution such as something analogous to Solution 2 from above. Is there a faster solution?
Repeating push_back() means that every time around the loop, a check is being performed to ensure capacity() is large enough (if not, then more space must be reserved). When you copy a whole range, only one capacity() check needs to be done.
You can still be a bit smarter with your interleaving by copying chunks. Here's the very basic idea:
int from = 0;
for( int b : breakpoints )
{
std::swap( v1, v2 );
int to = 1 + std::min( b, static_cast<int>( v1.size() ) - 1 );
ExpectedOutput.insert( ExpectedOutput.end(), v1.begin() + from, v1.begin() + to );
from = to;
}
For the sake of brevity, this code actually swaps v1 and v2 and so always operates on v1. I did the swap before the insert, to emulate the logic in your code (which is acting on v2 first). You can do this in a non-modifying way instead if you want.
Of course, you can see a bit more is going on in this code. It would only make sense if you have considerably fewer breakpoints than values. Note that it also assumes v1 and v2 are the same length.

c++ sieve of Eratosthenes my code is slow

I'm trying to find the number of prime numbers below 400 million but even with just 40 million my code is taking 8 secs to run. what am i doing wrong?
what can i do to make it faster?
#include<iostream>
#include<math.h>
#include<vector>
using namespace std;
int main()
{
vector<bool> k;
vector<long long int> c;
for (int i=2;i<40000000;i++)
{
k.push_back(true);
c.push_back(i);
}
for ( int i=0;i<sqrt(40000000)+1;i++)
{
if (k[i]==true)
{
for (int j=i+c[i];j<40000000;j=j+c[i])
{
k[j]=false;
}
}
}
vector <long long int> arr;
for ( int i=0;i<40000000-2;i++)
{
if (k[i]==true)
{
arr.push_back(c[i]);
}
}
cout << arr.size() << endl ;
return 0;
}
I profiled your code as well as a simple tweak, below. The tweak is more than twice as fast:
auto start = std::chrono::high_resolution_clock::now();
//original version
vector<bool> k;
vector<long long int> c;
for (int i=2;i<40000000;i++)
{
k.push_back(true);
c.push_back(i);
}
for ( int i=0;i<sqrt(40000000)+1;i++)
{
if (k[i]==true)
{
for (int j=i+c[i];j<40000000;j=j+c[i])
{
k[j]=false;
}
}
}
vector <long long int> arr;
for ( int i=0;i<40000000-2;i++)
{
if (k[i]==true)
{
arr.push_back(c[i]);
}
}
cout << arr.size() << endl ;
auto end1 = std::chrono::high_resolution_clock::now();
std::cout << "Elapsed = " <<
std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start).count() <<
std::endl;
}
{
auto begin = std::chrono::high_resolution_clock::now();
//new version
const long limit{40000000};
vector<bool> k(limit-1,true);
//k[0] is the number 0
k[0]=false; k[1]=false;
auto sq = sqrt(limit) + 1;
//start at the number 2
for ( int i=2;i<sq;i++)
{
if (k[i]==true)
{
for (int j=i+i;j<limit;j+=i)
{
k[j]=false;
}
}
}
vector <long long int> arr;
for ( int i=0;i<limit-2;i++)
{
if (k[i]==true)
{
arr.push_back(i);
}
}
cout << arr.size() << endl ;
auto stop = std::chrono::high_resolution_clock::now();
std::cout << "Elapsed = " <<
std::chrono::duration_cast<std::chrono::milliseconds>(stop - begin).count() <<
std::endl;
}
Here is the output (elapsed in milliseconds), in Debug mode:
2433654
Elapsed = 5787
2433654
Elapsed = 2432
Both have same results, second is much faster.
Here is another version using some nice C++ features (requiring less code), and it is about 11% faster than the second version above:
auto begin = std::chrono::high_resolution_clock::now();
const long limit{40000000};
vector<int> k(limit-1,0);
//fill with sequence of integers
std::iota(k.begin(),k.end(),0);
//k[0] is the number 0
//integers reset to 0 are not prime
k[0]=0; k[1]=0;
auto sq = sqrt(limit) + 1;
//start at the number 2
for (int i=2;i<sq;i++)
{
if (k[i])
{
for (int j=i+i;j<limit;j+=i)
{
k[j]=0;
}
}
}
auto results = std::remove(k.begin(),k.end(),0);
cout << results - k.begin() << endl ;
auto stop = std::chrono::high_resolution_clock::now();
std::cout << "Elapsed = " <<
std::chrono::duration_cast<std::chrono::milliseconds>(stop - begin).count() <<
std::endl;
}
Note that in your original version, you push_back in three different places, while this use of modern idioms never uses push_back at all when operating on the vectors.
In this example, the vector is of ints so that you have the actual list of prime numbers when you are finished.
Output:
2433654
Elapsed = 2160
These above are all Debug mode numbers.
In Release mode, the best is a combination of the second and third techniques above, using the numeric with a vector of bools, if you don't care what the actual prime numbers are in the end:
2433654
Elapsed = 1098
2433654
Elapsed bool remove= 410
2433654
Elapsed = 779
Note that your original code only takes about 1 second on my 5 year-old laptop in Release mode, so you are probably running in Debug mode.
I got it down from taking 10 seconds to run to just half a second on my computer by changing two things. First, I'm guessing you didn't compile it with optimization enabled. That brought it from 10 seconds down to 1 second for me. Second, the vector c is unnecessary. Everywhere you have c[i] in your code you can replace it with i+2. This will make it run twice as fast.
Remove vector c, you don't need it.
Create vector k with known size at start. Repeatedly appending elements to a vector by invoking push_back() is a really bad idea from a performance point of view, as it can cause repeated memory reallocations and copies.
http://primesieve.org/segmented_sieve.html - segmented version for inspiration.
You can skip processing multiples of 2 and 3. link from code review
It looks that you've got some issue in compiler optimization flag settings. Maybe you didn't change configuration from debug to release. What is your release speedup vs debug one?

Faster way then mine to populate a vector with unique integers except two values ? C++

I can´t post my all program here, just snippets. Will answer any question.
What I have:
1) I have a vector with 20 ID´s, like this [0,1,2,3,4,5,6...19].
2) I pick two ID´s, for example number 3 and number 6.
What I need:
1) Generate a vector of size N-1, where the N=5. This vector should not contain number 3 and number 6, only the remaining ID´s, and do not repeat them.
For example: new vector = [7,2,19,4]. Yes, only 4 items because the 5th is the number 3 or number 6, they will play with this new created groups, so 1+4 =5(N).
My problem:
1) I need to do this like 1 millions times. It is very slow. I believe that this part of code is the most heavy, because I deleted that part and the program runs really fast without it.
My question:
1) Below is my code, the do while loop, can I somehow optimize it ? maybe I need to use another structure or smarter method to generate this ?
Code:
for (int i = 0; i < _iterations; i++)
{
players.clear();
int y = 0;
do{
// _pop_size = 20
int rand_i = static_cast<int>(rand_double(0, _pop_size));
if (rand_i != 3 && rand_i != 6){
// verify if the ID already exists in vector
if (std::find(players.begin(), players.end(), rand_i) == players.end()){
players.push_back(rand_i);
++y;
}
}
} while (y < _group_size - 1);
// ...
// ...
// ...
// ...
rand_double() function:
double rand_double(int min, int max) const
{
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(min, max);
return dist(mt);
}
This answer is part gathering up the comments and part to prove a point.
The objective is to get as much as possible out of the processing loop. The first to die is the repeated re-initialization of the random number generator. A random number generator should be set up once and then used repeatedly, so re-init is a bad idea. Good riddance.
The next is to find a faster way to reject already known elements. The current approach uses a linear search through an unsorted vector. Insertion is quick because push_back only really slows down if resizing, but the more items in the vector the longer the worst case search time. A std:: set is an ordered list with very fast look-up, and somewhat slow insert. If the lists are short, stick with vector. If the lists are long (_group_size > 100), go with the set.
Here is an example with long lists:
#include <iostream>
#include <set>
#include <vector>
#include <random>
#include <functional>
#include <chrono>
#include <algorithm>
using namespace std::chrono; // I know, but the lines were ridiculously long
// remove random number generator init from processing loop.
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist(0, 1000000);
// replace function with a bind.
auto rand_int = std::bind(dist, mt);
// test function
int main()
{
int y;
int _group_size = 10000; // loop 10000 times
std::set<int> tempplayers;
std::vector<int> players;
auto start = high_resolution_clock::now(); // get start time
// with vector
do
{
// _pop_size = 20
int rand_i = rand_int();
if (rand_i != 3 && rand_i != 6)
{ //using vector: Linear search.
if (std::find(players.begin(), players.end(), rand_i) == players.end())
{
players.push_back(rand_i);
++y;
} // verify if the ID already exists in vector
}
} while (y < _group_size - 1);
auto t1 = duration_cast<nanoseconds>(high_resolution_clock::now() - start).count();
// Calculate elapsed time
std::cout << "Time (ns) with vector: " << t1 << std::endl;
// reset
players.clear();
y = 0;
// run again with a set instead of a vector
start = high_resolution_clock::now();
do
{
// _pop_size = 20
int rand_i = rand_int();
if (rand_i != 3 && rand_i != 6)
{ //using set. Not sure exactly what search it is. Probably a tree.
if (tempplayers.find(rand_i) == tempplayers.end())
{
tempplayers.insert(rand_i);
//players.push_back(rand_i);
++y;
}
}
} while (y < _group_size - 1);
// copy set into vector for comfortable use.
std::copy(tempplayers.begin(),
tempplayers.end(),
std::back_inserter(players));
//
auto t2 = duration_cast<nanoseconds>(high_resolution_clock::now() - start).count();
std::cout << "Time (ns) with set: " << t2 << std::endl;
if (t2 > 0)
{
std::cout << "Set is " << t1/ t2 << " times faster"<< std::endl;
}
}
A typical output is:
Time (ns) with vector: 373014100
Time (ns) with set: 9000800
Set is 41 times faster
NB: I'm running on Windows and my default tick resolution is horrible.
Better way, is to use a simple array instead of vector.
Since I know the size of the group I just create an array of size x, and add the values to it. To check if the values are already in the array I use a simple for loop.
What happens it that a vector takes time to allocate memory for the next number and an array no, he has already allocated memory for those numbers when I do this:
int array[4];
One test took me 96 seconds, after I changed to an array the same test took only 26 seconds.

Weighted random numbers

I'm trying to implement a weighted random numbers. I'm currently just banging my head against the wall and cannot figure this out.
In my project (Hold'em hand-ranges, subjective all-in equity analysis), I'm using Boost's random -functions. So, let's say I want to pick a random number between 1 and 3 (so either 1, 2 or 3). Boost's mersenne twister generator works like a charm for this. However, I want the pick to be weighted for example like this:
1 (weight: 90)
2 (weight: 56)
3 (weight: 4)
Does Boost have some sort of functionality for this?
There is a straightforward algorithm for picking an item at random, where items have individual weights:
1) calculate the sum of all the weights
2) pick a random number that is 0 or greater and is less than the sum of the weights
3) go through the items one at a time, subtracting their weight from your random number, until you get the item where the random number is less than that item's weight
Pseudo-code illustrating this:
int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
if(rnd < choice_weight[i])
return i;
rnd -= choice_weight[i];
}
assert(!"should never get here");
This should be straightforward to adapt to your boost containers and such.
If your weights are rarely changed but you often pick one at random, and as long as your container is storing pointers to the objects or is more than a few dozen items long (basically, you have to profile to know if this helps or hinders), then there is an optimisation:
By storing the cumulative weight sum in each item you can use a binary search to pick the item corresponding to the pick weight.
If you do not know the number of items in the list, then there's a very neat algorithm called reservoir sampling that can be adapted to be weighted.
Updated answer to an old question. You can easily do this in C++11 with just the std::lib:
#include <iostream>
#include <random>
#include <iterator>
#include <ctime>
#include <type_traits>
#include <cassert>
int main()
{
// Set up distribution
double interval[] = {1, 2, 3, 4};
double weights[] = { .90, .56, .04};
std::piecewise_constant_distribution<> dist(std::begin(interval),
std::end(interval),
std::begin(weights));
// Choose generator
std::mt19937 gen(std::time(0)); // seed as wanted
// Demonstrate with N randomly generated numbers
const unsigned N = 1000000;
// Collect number of times each random number is generated
double avg[std::extent<decltype(weights)>::value] = {0};
for (unsigned i = 0; i < N; ++i)
{
// Generate random number using gen, distributed according to dist
unsigned r = static_cast<unsigned>(dist(gen));
// Sanity check
assert(interval[0] <= r && r <= *(std::end(interval)-2));
// Save r for statistical test of distribution
avg[r - 1]++;
}
// Compute averages for distribution
for (double* i = std::begin(avg); i < std::end(avg); ++i)
*i /= N;
// Display distribution
for (unsigned i = 1; i <= std::extent<decltype(avg)>::value; ++i)
std::cout << "avg[" << i << "] = " << avg[i-1] << '\n';
}
Output on my system:
avg[1] = 0.600115
avg[2] = 0.373341
avg[3] = 0.026544
Note that most of the code above is devoted to just displaying and analyzing the output. The actual generation is just a few lines of code. The output demonstrates that the requested "probabilities" have been obtained. You have to divide the requested output by 1.5 since that is what the requests add up to.
If your weights change more slowly than they are drawn, C++11 discrete_distribution is going to be the easiest:
#include <random>
#include <vector>
std::vector<double> weights{90,56,4};
std::discrete_distribution<int> dist(std::begin(weights), std::end(weights));
std::mt19937 gen;
gen.seed(time(0));//if you want different results from different runs
int N = 100000;
std::vector<int> samples(N);
for(auto & i: samples)
i = dist(gen);
//do something with your samples...
Note, however, that the c++11 discrete_distribution computes all of the cumulative sums on initialization. Usually, you want that because it speeds up the sampling time for a one time O(N) cost. But for a rapidly changing distribution it will incur a heavy calculation (and memory) cost. For instance if the weights represented how many items there are and every time you draw one, you remove it, you will probably want a custom algorithm.
Will's answer https://stackoverflow.com/a/1761646/837451 avoids this overhead but will be slower to draw from than the C++11 because it can't use binary search.
To see that it does this, you can see the relevant lines (/usr/include/c++/5/bits/random.tcc on my Ubuntu 16.04 + GCC 5.3 install):
template<typename _IntType>
void
discrete_distribution<_IntType>::param_type::
_M_initialize()
{
if (_M_prob.size() < 2)
{
_M_prob.clear();
return;
}
const double __sum = std::accumulate(_M_prob.begin(),
_M_prob.end(), 0.0);
// Now normalize the probabilites.
__detail::__normalize(_M_prob.begin(), _M_prob.end(), _M_prob.begin(),
__sum);
// Accumulate partial sums.
_M_cp.reserve(_M_prob.size());
std::partial_sum(_M_prob.begin(), _M_prob.end(),
std::back_inserter(_M_cp));
// Make sure the last cumulative probability is one.
_M_cp[_M_cp.size() - 1] = 1.0;
}
What I do when I need to weight numbers is using a random number for the weight.
For example: I need that generate random numbers from 1 to 3 with the following weights:
10% of a random number could be 1
30% of a random number could be 2
60% of a random number could be 3
Then I use:
weight = rand() % 10;
switch( weight ) {
case 0:
randomNumber = 1;
break;
case 1:
case 2:
case 3:
randomNumber = 2;
break;
case 4:
case 5:
case 6:
case 7:
case 8:
case 9:
randomNumber = 3;
break;
}
With this, randomly it has 10% of the probabilities to be 1, 30% to be 2 and 60% to be 3.
You can play with it as your needs.
Hope I could help you, Good Luck!
Build a bag (or std::vector) of all the items that can be picked.
Make sure that the number of each items is proportional to your weighting.
Example:
1 60%
2 35%
3 5%
So have a bag with 100 items with 60 1's, 35 2's and 5 3's.
Now randomly sort the bag (std::random_shuffle)
Pick elements from the bag sequentially until it is empty.
Once empty re-randomize the bag and start again.
Choose a random number on [0,1), which should be the default operator() for a boost RNG. Choose the item with cumulative probability density function >= that number:
template <class It,class P>
It choose_p(It begin,It end,P const& p)
{
if (begin==end) return end;
double sum=0.;
for (It i=begin;i!=end;++i)
sum+=p(*i);
double choice=sum*random01();
for (It i=begin;;) {
choice -= p(*i);
It r=i;
++i;
if (choice<0 || i==end) return r;
}
return begin; //unreachable
}
Where random01() returns a double >=0 and <1. Note that the above doesn't require the probabilities to sum to 1; it normalizes them for you.
p is just a function assigning a probability to an item in the collection [begin,end). You can omit it (or use an identity) if you just have a sequence of probabilities.
This is my understanding of a "weighted random", I've been using this recently. (Code is in Python but can be implemented in other langs)
Let's say you want to pick a random person and they don't have equal chances of being selected
You can give each person a "weight" or "chance" value:
choices = [("Ade", 60), ("Tope", 50), ("Maryamu", 30)]
You use their weights to calculate a score for each then find the choice with the highest score
highest = [None, 0]
for p in choices:
score = math.floor(random.random() * p[1])
if score > highest[1]:
highest[0] = p
highest[1] = score
print(highest)
For Ade the highest score they can get is 60, Tope 50 and so on, meaning that Ade has a higher chance of generating the largest score than the rest.
You can use any range of weights, the greater the difference the more skewed the distribution.
E.g if Ade had a weight of 1000 they will almost always be chosen.
Test
votes = [{"name": "Ade", "votes": 0}, {"name": "Tope", "votes": 0}, {"name": "Maryamu", "votes": 0]
for v in range(100):
highest = [None, 0]
for p in choices:
score = math.floor(random.random() * p[1])
if score > highest[1]:
highest[0] = p
highest[1] = score
candidate = choices(index(highest[0])) # get index of person
votes[candidate]["count"] += 1 # increase vote count
print(votes)
// votes printed at the end. your results might be different
[{"name": "Ade", "votes": 45}, {"name": "Tope", "votes": 30}, {"name": "Maryamu", "votes": 25}]
Issues
It looks like the more the voters, the more predictable the results. Welp
Hope this gives someone an idea...
I have just implemented the given solution by "will"
#include <iostream>
#include <map>
using namespace std;
template < class T >
class WeightedRandomSample
{
public:
void SetWeigthMap( map< T , unsigned int >& WeightMap )
{
m_pMap = &WeightMap;
}
T GetRandomSample()
{
unsigned int sum_of_weight = GetSumOfWeights();
unsigned int rnd = (rand() % sum_of_weight);
map<T , unsigned int>& w_map = *m_pMap;
typename map<T , unsigned int>::iterator it;
for(it = w_map.begin() ; it != w_map.end() ; ++it )
{
unsigned int w = it->second;
if(rnd < w)
return (it->first);
rnd -= w;
}
//assert(!"should never get here");
T* t = NULL;
return *(t);
}
unsigned int GetSumOfWeights()
{
if(m_pMap == NULL)
return 0;
unsigned int sum = 0;
map<T , unsigned int>& w_map = *m_pMap;
typename map<T , unsigned int>::iterator it;
for(it = w_map.begin() ; it != w_map.end() ; ++it )
{
sum += it->second;
}
return sum;
}
protected:
map< T , unsigned int>* m_pMap = NULL;
};
typedef pair<int , int> PAIR_INT_INT;
typedef map<PAIR_INT_INT ,unsigned int> mul_table_weighted_map;
int main()
{
mul_table_weighted_map m;
m[PAIR_INT_INT(2,3)] = 10;
m[PAIR_INT_INT(4,5)] = 20;
m[PAIR_INT_INT(2,5)] = 10;
WeightedRandomSample<PAIR_INT_INT> WRS;
WRS.SetWeigthMap(m);
unsigned int sum_of_weight = WRS.GetSumOfWeights();
cout <<"Sum of weights : " << sum_of_weight << endl;
unsigned int number_of_test = 10000;
cout << "testing " << number_of_test << " ..." << endl;
map<PAIR_INT_INT , unsigned int> check_map;
for(int i = 0 ; i < number_of_test ; i++)
{
PAIR_INT_INT res = WRS.GetRandomSample();
check_map[res]++;
//cout << i+1 << ": random = " << res.first << " * " << res.second << endl;
}
cout << "results: " << endl;
for(auto t : check_map)
{
PAIR_INT_INT p = t.first;
unsigned int expected = (number_of_test * m[p]) / sum_of_weight;
cout << " pair " << p.first << " * " << p.second
<< ", counted = " << t.second
<< ", expected = " << expected
<< endl;
}
return 0;
}
For example, generating a random index in a vector of weights for that index can be done this way:
#include <bits/stdc++.h>
using namespace std;
int getWeightedRandomNumber(vector<int> weights){
vector<int> vec;
for(int i=0; i<weights.size(); i++){
for(int j=0; j<weights[i]; j++){
vec.push_back(i);
}
}
random_shuffle(vec.begin(), vec.end());
return vec.front();
}
int main()
{
vector<int> v{2,4,5,100,1,2,4,4};
for(int i=0; i<100; i++){
cout<<getWeightedRandomNumber(v)<<endl;
}
}
Since we are constructing another vector with (no of elements) = almost (current no of elements) * (mean weight), this approach might now work when dealing with large data.