I'm using boost sparse matrices holding bool's and trying to write a comparison function for storing them in a map. It is a very simple comparison function. Basically, the idea is to look at the matrix as a binary number (after being flattened into a vector) and sorting based on the value of that number. This can be accomplished in this way:
for(unsigned int j = 0; j < maxJ; j++)
{
for(unsigned int i = 0; i < maxI; i++)
{
if(matrix1(i,j) < matrix2(i,j) return true;
else if(matrix1(i,j) > matrix2(i,j) return false;
}
}
return false;
However, this is inefficient because of the sparseness of the matrices and I'd like to use iterators for the same result. The algorithm using iterators seems straightforward, i.e.
1) grab the first nonzero cell in each matrix, 2) compare j*maxJ+i for both, 3) if equal, grab the next nonzero cells in each matrix and repeat. Unfortunately, in code this is extremely tedious and I'm worried about errors.
What I'm wondering is (a) is there a better way to do this and (b) is there a simple way to get the "next nonzero cell" for both matrices? Obviously, I can't use nested for loops like one would to iterate through one sparse matrix.
Thanks for your help.
--
Since it seems that the algorithm I proposed above may be the best solution in my particular application, I figured I should post the code I developed for the tricky part, getting the next nonzero cells in the two sparse matrices. This code is not ideal and not very clear, but I'm not sure how to improve it. If anyone spots a bug or knows how to improve it, I would appreciate some comments. Otherwise, I hope this is useful to someone else.
typedef boost::numeric::ublas::mapped_matrix<bool>::const_iterator1 iter1;
typedef boost::numeric::ublas::mapped_matrix<bool>::const_iterator2 iter2;
// Grabs the next nonzero cell in a sparse matrix after the cell pointed to by i1, i2.
std::pair<iter1, iter2> next_cell(iter1 i1, iter2 i2, iter1 end) const
{
if(i2 == i1.end())
{
if (i1 == end)
return std::pair<iter1, iter2>(i1, i2);
++i1;
i2 = i1.begin();
}
else
{
++i2;
}
for(; i1 != end;)
{
for(; i2 != i1.end(); ++i2)
{
return std::pair<iter1, iter2>(i1,i2);
}
++i1;
if(i1 != end) i2 = i1.begin();
}
return std::pair<iter1, iter2>(i1, i2);
}
I like this question, by the way.
Let me pseudocode out what I think you're asking
declare list of sparse matrices ListA
declare map MatMAp with a sparse Matrix type mapping to a double, along with a
`StrictWeakMatrixOrderer` function which takes two sparse matrices.
Insert ListA into MatMap.
The Question: How do I write a StrictWeakMatrixOrderer efficiently?
This is an approach. I'm inventing this on the fly....
Define a function flatten() and precompute the flattened matrices, storing the flattened vectors in a vector(or another container with a random indexing upper bound). flatten() could be as simple as concatenating each row(or column) with the previous one(which can be done in linear time if you have a constant-time function to grab a row/column).
This yields a set of vectors with size on the order of 10^6. This is a tradeoff - saving this information instead of on-the-fly computing it. This is useful if you're going to be doing a lot of compares as you go along.
Remember, zeros contain information - dropping them will possibly yield two vectors equal to each other, whereas their generating matrix may not be equal.
Then, we have transformed the algorithm question from "order matrices" into "order vectors".
I've never heard of a distance metric for matrices, but I've heard of distance metrics for vectors.
You could use a "sum of differences" ordering aka Hamming distance. (foreach element that's different, add 1). That will be a O(n) algorithm:
for i = 0 to max.
if(a[i] != b[i])
distance++
return distance
The Hamming distance satisfies these conditions
d(a,b) = d(b,a)
d(a,a) = 0
d(x, z) <= d(x, y) + d(y, z)
Now to do some off-the-cuff analysis....
10^6 elements in a matrix(or its corrosponding vector).
O(n) distance metric.
But it's O(n) compares. If each array access has O(m) time, then you would have an O(n*(n+n)) = O(n^2) metric. So you have to have < O(n) access. It turns out that std::vector [] operator provides "amortized constant time access to arbitrary elements" according to SGI's STL site.
Providing you have sufficient memory to store k*2*10^6, where k is the number of matrices you are managing, this is a working solution that uses lots of memory in exchange for being linear.
(a) I don't fully understand what you're trying to accomplish, but if you want to compare if both matrices have the same value at the same index it's sufficient to use elementwise matrix multiplication (which should be implemented for sparse as well):
matrix3 = element_prod (matrix1, matrix2);
That way you'll get for each index:
0 (false) * 1 (true) = 0 (false)
0*0 = 0
1*1 = 1
So resulting matrix3 will have your solution in one line :)
It seems to me we're talking about implementing bitwise,elementwise operators on boost::sparse_matrix, since comparing if one vector (or matrix) is smaller than another without using any standard vector norms demands special operators (or special mappings/norms).
To my knowledge boost does not provide special operators for binary matrices (not to speak of sparse binary matrices). There are unlikely any straightforward solutions to this using BLAS level matrix/vector algebra. Binary matrices have an own place in the linear algebra field, so there are tricks and theorems but i doubt those are easier than your solution.
Your question could be reformulated as: How do i sort efficiently astronomically large numbers represented by a 2d-bitmap (n=100 so 100x100 elements would give you a number like 2^10000).
Good question !
Related
Given set N = {1,...,n}, consider P different pre-existing subsets of N. A subset, S_p, is characterized by the 0-1 n vector x_p where the ith element is 0 or 1 depending on whether the ith (of n) items is part of the subset or not. Let us call such x_ps indicator vectors.
For e.g., if N={1,2,3,4,5}, subset {1,2,5} is represented by vector (1,0,0,1,1).
Now, given P pre-existing subsets and their associated vectors x_ps.
A candidate subset denoted by vector yis computed.
What is the most efficient way of checking whether y is already part of the set of P pre-existing subsets or whether y is indeed a new subset not part of the P subsets?
The following are the methods I can think of:
(Method 1) Basically, we have to do an element by element check against all pre-existing sets. Pseudocode follows:
for(int p = 0; p < P; p++){
//(check if x_p == y by doing an element by element comparison)
int i;
for(i = 0; i < n; i++){
if(x_pi != y_i){
i = 999999;
}
}
if(i < 999999)
return that y is pre-existing
}
return that y is new
(Method 2) Another thought that comes to mind is to store the decimal equivalent of the indicator vectors x_ps (where the indicator vectors are taken to be binary representations) and compare it with the decimal equivalent of y. That is, if set of P pre-existing sets is: { (0,1,0,0,1), (1,0,1,1,0) }, the stored decimals for this set would be {9, 22}. If y is (0,1,1,0,0), we compute 12 and check this against the set {9, 22}. The benefit of this method is that for each new y, we don't have to check against the n elements of every pre-existing set. We can just compare the decimal numbers.
Question 1. It appears to me that (Method 2) should be more efficient than (Method 1). For (Method 2), is there an efficient way (inbuilt library function in C/C++) that converts the x_ps and y from binary to decimal? What should be data type of these indicator variables? For e.g., bool y[5]; or char y[5];?
Question 2. Is there any method more efficient than (Method 2)?
As you've noticed, there's a trivial isomorphism between your indicator vectors and N-bit integers. That means the answer to your question 2 is "no": the tools available for maintain a set and testing membership in it are the same as integers (hash tables bring the normal approach). A commented mentioned Bloom fillers, which can efficiently test membership at the risk of some false positives, but Bloom filters are generally for much larger data sizes than you're looking at.
As for your question 1: Method 2 is reasonable, and it's even easier than you think. While vector<bool> doesn't give you an easy way to turn it into integer blocks, on implementations I'm aware of it's already implemented this way (the C++ standard allows special treatment of that particular vector type, something that is generally considered nowadays to have been a poor decision, but which occasionally yields some benefit). And those vectors are hashable. So just keep an unordered_set<vector<bool>> around, and you'll get performance which is reasonably close to the optimum. (If you know N at compile time you may want to prefer bitset to vector<bool>.)
Method 2 can be optimized by calculating the decimal equivalent of the given subset and hashing it using modulus 1e9+7. This results in different decimal numbers every time since N<=1000(No collision occurs).
#define M 1000000007 //big prime number
unordered_set<long long> subset; //containing decimal representation of all the
//previous found subsets
/*fast computation of power of 2*/
long long Pow(long long num,long long pow){
long long result=1;
while(pow)
{
if(pow&1)
{
result*=num;
result%=M;
}
num*=num;
num%=M;
pow>>=1;
}
return result;
}
/*checks if subset pre exists*/
bool check(vector<bool> booleanVector){
long long result=0;
for(int i=0;i<booleanVector.size();i++)
if(booleanVector[i])
result+=Pow(2,i);
return (subset.find(result)==subset.end());
}
I want to optimize the following code:
During a monte carlo simulation I accumulate some quantities f(x) (f(x) is expensive to compute) and save them in the array bins after every sampling step.
EDIT: f(x) is not a deterministic function of x (by that i mean it generates pseudo random numbers and uses them to modify the result) and also depends on previoulsy calculated values f(y)
for(int n=0;n<N;n++)
{
// compute some values f(x) at points "p"
for(auto k: p) bins[k] += f(k);
}
p.size() is much smaller than the size of bins, but eventually most elements will be set.
After the simulation I accumulate my final values by doing a weighted sum over bins (g is a lookup in another array):
for(int l=0;l<M;l++)
for(int k=0;k<bins.size();k++)
finalResult[l] += g(k,l)*bins[k];
I could of course compute my updated finalResult after every sampling step, this does however slow the program down a lot, due to the loop over M.
I already tried a very basic boost::accumulate, but this did not improve performance (if I stay with this design I will have to use it eventually due to stability, though).
All arrays are of type Eigen::MatrixXd since I need them for BLAS operations.
p.size() < 10^2
N ~ 10^7
M ~ 10^4
bins.size() ~ 10^5
Do you have any suggestions on which techniques could be useful for optimization here?
Try computing f(x) just once for each of the N values (i.e. memoization). So for instance, if N is large (like it is in this situation), try changing your loop to something like the following:
static std::unordered_map<unsigned int, double> memoizedFunction;
for(int n=0;n<N;n++)
{
// compute some values f(x) at points "p"
for(auto k: p)
{
auto it = memoizedFunction.find( k );
if (it == memoizedFunction.end())
{
it = memoizedFunction.emplace( f(k) ).first;
}
bins[k] += *it;
}
}
Alternatively, you could just store the number of times the kth bin has been hit in bins[k] and then at the end go through and compute bins[k] * f(k) for each k.
Just a thought here but you if you could verify that f(x) is a linear
transformation then you could create the matrix A such that
[f(x)] = A[x] where [x] is the coordinates of x with respect to some basis B.
That could make f(x) easier and faster to compute especially if x
exists in a vector space with a small basis.
However if converting between coordinates and the answer is expensive
that could overall kill any benefits (just keep that in mind).
Here are some links that could help explain matrix representation of
linear transformations.
https://math.colorado.edu/~nita/MatrixRepresentations.pdf
https://math.dartmouth.edu/archive/m24w07/public_html/Lecture12.pdf
https://en.wikipedia.org/wiki/Transformation_matrix
I have the loop below. The goal is to perform an operation between all elements of an array tmp and store it in a scalar b. The operation is equivalent to an addition, so there is no specific execution order. For example if we have a + b + c + d, we can compute this in any order, which means (a+b) + (c+d) is possible as well. The same is applicable to this operation. However, there are some special conditions which lead to the result by different ways.
tmp.e and b.e are longs, while tmp.xand b.x are doubles.
Is there any form to compare all tmp.e, in for example pairs of 2 for SSE, and perform the correct computation of b.x accordingly. In all cases, it can be viewed as an addMul, in the first case it's just multiplying by 1, in others by 0 or BOUND. Is it possible to vectorize this? If so, how?
Thanks.
void op(vec& tmp, scalar& b)
{
for (i = 1; i < n; ++i)
{
if (b.e == tmp.e[i])
{
b.x += tmp.x[i];
b.normalize();
continue;
}
else if (b.e > tmp.e[i])
{
if (b.e > tmp.e[i]+1)
{
continue;
}
b.x += tmp.x[i] * BOUND;
b.normalize();
}
else
{
if (tmp.e[i] > b.e+1)
{
b.x = tmp.x[i];
b.e = tmp.e[i];
b.normalize();
continue;
}
b.x = b.x * BOUND + tmp.x[i];
b.e = tmp.e[i];
b.normalize();
}
}
}
Per-element conditionals in SIMD code are usually handled by using a packed-compare instruction to generate a mask of all-zero and all-one elements. You can use this to AND or OR vectors. So e.g. you can increment only the elements that pass a test by using AND to produce a vector with 1 in elements that should be incremented, and 0 in elements that shouldn't, because 0 is the identity value for addition. (x+0 = x).
You can also compute two results and then blend them together, according to a mask. (using AND and OR, or using vector blend instructions.)
This method of doing SIMD conditionals is like a cmov: you have to compute both sides of the branch, even if all the elements you're processing in a vector take the same side of the branch.
It looks like your data is in struct-of-arrays format already. So you could generate masks from operations on vectors of e values, for use with vectors of x values. If long is 32bits, you could do a compare of 4 elements, and unpack-low and unpack-high to get two masks with 64bit elements to match your doubles. If the arrays are small (so they'd fit in cache even .e[] taking as much space as .x[]), having the longs the same as the doubles means less unpacking.
Anyway, it doesn't look promising. Too many conditions, and I have no idea what the whole thing is really trying to accomplish, and what restrictions there might be on the input data. If I knew more about the problem, maybe I could think of a vectorized way to do some of it.
Oh, I think another fatal flaw is that each iteration depends on the previous iteration, because it might modify b. So you can't vectorize to do multiple iterations in parallel, unless you can work out a rule for updating b based on the last vector element.
Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. Let's work with numbers.
Obviously the easiest way would be:
bool opposite(int* arr, int n) // n - array length
{
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(arr[i] == - arr[j])
return true;
}
}
return false;
}
I would like to ask if any of you guys can think of an algorithm that has a complexity less than n^2.
My first idea was the following:
1) sort array ( algorithm with worst case complexity: n.log(n) )
2) create two new arrays, filled with negative and positive numbers from the original array
( so far we've got -> n.log(n) + n + n = n.log(n))
3) ... compare somehow the two new arrays to determine if they have opposite numbers
I'm not pretty sure my ideas are correct, but I'm opened to suggestions.
An important alternative solution is as follows. Sort the array. Create two pointers, one initially pointing to the front (smallest), one initially pointing to the back (largest). If the sum of the two pointed-to elements is zero, you're done. If it is larger than zero, then decrement the back pointer. If it is smaller than zero, then increment the front pointer. Continue until the two pointers meet.
This solution is often the one people are looking for; often they'll explicitly rule out hash tables and trees by saying you only have O(1) extra space.
I would use an std::unordered_set and check to see if the opposite of the number already exist in the set. if not insert it into the set and check the next element.
std::vector<int> foo = {-10,12,13,14,10,-20,5,6,7,20,30,1,2,3,4,9,-30};
std::unordered_set<int> res;
for (auto e : foo)
{
if(res.count(-e) > 0)
std::cout << -e << " already exist\n";
else
res.insert(e);
}
Output:
opposite of 10 alrready exist
opposite of 20 alrready exist
opposite of -30 alrready exist
Live Example
Let's see that you can simply add all of elements to the unordered_set and when you are adding x check if you are in this set -x. The complexity of this solution is O(n). (as #Hurkyl said, thanks)
UPDATE: Second idea is: Sort the elements and then for all of the elements check (using binary search algorithm) if the opposite element exists.
You can do this in O(n log n) with a Red Black tree.
t := empty tree
for each e in A[1..n]
if (-e) is in t:
return true
insert e into t
return false
In C++, you wouldn't implement a Red Black tree for this purpose however. You'd use std::set, because it guarantees O(log n) search and insertion.
std::set<int> s;
for (auto e : A) {
if (s.count(-e) > 0) {
return true;
}
s.insert(e);
}
return false;
As Hurkyl mentioned, you could do better by just using std::unordered_set, which is a hashtable. This gives you O(1) search and insertion in the average case, but O(n) for both operations in the worst case. The total complexity of the solution in the average case would be O(n).
What/where are the practical uses of the partial_sum algorithm in STL?
What are some other interesting/non-trivial examples or use-cases?
I used it to reduce memory usage of a simple mark-sweep garbage collector in my toy lambda calculus interpreter.
The GC pool is an array of objects of identical size. The goal is to eliminate objects that aren't linked to other objects, and condense the remaining objects into the beginning of the array. Since the objects are moved in memory, each link needs to be updated. This necessitates an object remapping table.
partial_sum allows the table to be stored in compressed format (as little as one bit per object) until the sweep is complete and memory has been freed. Since the objects are small, this significantly reduces memory use.
Recursively mark used objects and populate the Boolean array.
Use remove_if to condense the marked objects to the beginning of the pool.
Use partial_sum over the Boolean values to generate a table of pointers/indexes into the new pool.
This works because the Nth marked object has N preceding 1's in the array and acquires pool index N.
Sweep over the pool again and replace each link using the remap table.
It's especially friendly to the data cache to put the remap table in the just-freed, thus still hot, memory.
One thing to note about partial sum is that it is the operation that undoes adjacent difference much like - undoes +. Or better yet if you remember calculus the way differentiation undoes integration. Better because adjacent difference is essentially differentiation and partial sum is integration.
Let's say you have simulation of a car and at each time step you need to know the position, velocity, and acceleration. You only need to store one of those values as you can compute the other two. Say you store the position at each time step you can take the adjacent difference of the position to give the velocity and the adjacent difference of the velocity to give the acceleration. Alternatively, if you store the acceleration you can take the partial sum to give the velocity and the partial sum of the velocity gives the position.
Partial sum is one of those functions that doesn't come up too often for most people but is enormously useful when you find the right situation. A lot like calculus.
Last time I (would have) used it is when converting a discrete probability distribution (an array of p(X = k)) into a cumulative distribution (an array of p(X <= k)). To select once from the distribution, you can pick a number from [0-1) randomly, then binary search into the cumulative distribution.
That code wasn't in C++, though, so I did the partial sum myself.
You can use it to generate a monotonically increasing sequence of numbers. For example, the following generates a vector containing the numbers 1 through 42:
std::vector<int> v(42, 1);
std::partial_sum(v.begin(), v.end(), v.begin());
Is this an everyday use case? Probably not, though I've found it useful on several occasions.
You can also use std::partial_sum to generate a list of factorials. (This is even less useful, though, since the number of factorials that can be represented by a typical integer data type is quite limited. It is fun, though :-D)
std::vector<int> v(10, 1);
std::partial_sum(v.begin(), v.end(), v.begin());
std::partial_sum(v.begin(), v.end(), v.begin(), std::multiplies<int>());
Personal Use Case: Roulette-Wheel-Selection
I'm using partial_sum in a roulette-wheel-selection algorithm (link text). This algorithm choses randomly elements from a container with a probability which is linear to some value given beforehands.
Because all my elements to choose from bringing a not-necessarily normalized value, I use the partial_sum algorithm for constructing something like a "roulette-wheel", because I sum up all the elements. Then I chose a random variable in this range (the last partial_sum is the sum of all) and use stl::lower_bound for searching "the wheel" where my random search landed. The element returned by the lower_bound algorithm is the chosen one.
Besides the advantage of clear and expressive code with the use of partial_sum, I could also gain some speed when experimenting with the GCC parallel mode which brings parallelized versions for some algorithms and one of them is the partial_sum (link text).
Another use I know of: One of the most important algorithmic primitives in parallel processing (but maybe a little bit away from STL)
If you're interested in heavy optimized algorithms which are using partial_sum (in this case maybe more results under the synonyms "scan" or "prefix_sum"), than go to the parallel algorithms community. They need it all the time. You won't find a parallel sorting algorithm based on quicksort or mergesort without using it. This operation is one of the most important parallel primitives used. I think it is most commonly used for calculating offsets in dynamic algorithms. Think of a partition step in quicksort, which is split and fed to the parallel threads. You don't know the number of elements in each slot of the partition before calculating it. So you need some offsets for all the threads for later access.
Maybe you will find more informatin in the now-hot topic of GPU processing. One short article regarding Nvidia's CUDA and the scan-primitive with a few application examples you will find in Chapter 39. Parallel Prefix Sum (Scan) with CUDA.
Personal Use Case: intermediate step in counting sort from CLRS:
COUNTING_SORT (A, B, k)
for i ← 1 to k do
c[i] ← 0
for j ← 1 to n do
c[A[j]] ← c[A[j]] + 1
//c[i] now contains the number of elements equal to i
// std::partial_sum here
for i ← 2 to k do
c[i] ← c[i] + c[i-1]
// c[i] now contains the number of elements ≤ i
for j ← n downto 1 do
B[c[A[i]]] ← A[j]
c[A[i]] ← c[A[j]] - 1
You could build a "moving sum" (precursor to a moving average):
template <class T>
void moving_sum (const vector<T>& in, int num, vector<T>& out)
{
// cummulative sum
partial_sum (in.begin(), in.end(), out.begin());
// shift and subtract
int j;
for (int i = out.size() - 1; i >= 0; i--) {
j = i - num;
if (j >= 0)
out[i] -= out[j];
}
}
And then call it with:
vector<double> v(10);
// fill in v
vector<double> v2 (v.size());
moving_sum (v, 3, v2);
You know, I actually did use partial_sum() once... It was this interesting little problem that I was asked on a job interview. I enjoyed it so much, I went home and coded it up.
The problem was: Given a sequential sequence of integers, find the shortest sub-sequence with the highest value. E.g. Given:
Value: -1 2 3 -1 4 -2 -4 5
Index: 0 1 2 3 4 5 6 7
We would find the subsequence [1,4]
Now the obvious solution is to run with 3 for loops, iterating over all possible starts & ends, and adding up the value of each possible subsequence in turn. Inefficient, but quick to code up and hard to make mistakes. (Especially when the third for loop is just accumulate(start,end,0).)
The correct solution involves a divide-and-conquer / bottom up approach. E.g. Divide the problem space in half, and for each half compute the largest subsequence contained within that section, the largest subsequence including the starting number, the largest subsequence including the ending number, and the entire section's subsequence. Armed with this data we can then combine the two halves together without any further evaluation of either one. Obviously the data for each half can be computed by further breaking each half into halves (quarters), each quarter into halves (eighths), and so on until we have trivial singleton cases. It's all quite efficient.
But all that aside, there's a third (somewhat less efficient) option that I wanted to explore. It's similar to the 3-for-loop case, only we add the adjacent numbers to avoid so much work. The idea is that there's no need to add a+b, a+b+c, and a+b+c+d when we can add t1=a+b, t2=t1+c, and t3=t2+d. It's a space/computation tradeoff thing. It works by transforming the sequence:
Index: 0 1 2 3 4
FROM: 1 2 3 4 5
TO: 1 3 6 10 15
Thereby giving us all possible substrings starting at index=0 and ending at indexes=0,1,2,3,4.
Then we iterate over this set subtracting the successive possible "start" points...
FROM: 1 3 6 10 15
TO: - 2 5 9 14
TO: - - 3 7 12
TO: - - - 4 9
TO: - - - - 5
Thereby giving us the values (sums) of all possible subsequences.
We can find the maximum value of each iteration via max_element().
The first step is most easily accomplished via partial_sum().
The remaining steps via a for loop and transform(data+i,data+size,data+i,bind2nd(minus<TYPE>(),data[i-1])).
Clearly O(N^2). But still interesting and fun...
Partial sums are often useful in parallel algorithms. Consider the code
for (int i=0; N>i; ++i) {
sum += x[i];
do_something(sum);
}
If you want to parallelise this code, you need to know the partial sums. I am using GNUs parallel version of partial_sum for something very similar.
I often use partial sum not to sum but to calculate the current value in the sequence depending on the previous.
For example, if you integrate a function. Each new step is a previous step, vt += dvdt or vt = integrate_step(dvdt, t_prev, t_prev+dt);.
In nonparametric Bayesian methods there is a Metropolis-Hastings step (per observation) that determines to sample a new or an existing cluster. If an existing cluster has to be sampled this needs to be done with different weights. These weighted likelihoods are simulated in the following example code.
#include <random>
#include <iostream>
#include <algorithm>
int main() {
std::default_random_engine generator(std::random_device{}());
std::uniform_real_distribution<double> distribution(0.0,1.0);
int K = 8;
std::vector<double> weighted_likelihood(K);
for (int i = 0; i < K; ++i) {
weighted_likelihood[i] = i*10;
}
std::cout << "Weighted likelihood: ";
for (auto i: weighted_likelihood) std::cout << i << ' ';
std::cout << std::endl;
std::vector<double> cumsum_likelihood(K);
std::partial_sum(weighted_likelihood.begin(), weighted_likelihood.end(), cumsum_likelihood.begin());
std::cout << "Cumulative sum of weighted likelihood: ";
for (auto i: cumsum_likelihood) std::cout << i << ' ';
std::cout << std::endl;
std::vector<int> frequency(K);
int N = 280000;
for (int i = 0; i < N; ++i) {
double pick = distribution(generator) * cumsum_likelihood.back();
auto lower = std::lower_bound(cumsum_likelihood.begin(), cumsum_likelihood.end(), pick);
int index = std::distance(cumsum_likelihood.begin(), lower);
frequency[index]++;
}
std::cout << "Frequencies: ";
for (auto i: frequency) std::cout << i << ' ';
std::cout << std::endl;
}
Note that this is not different from the answer by https://stackoverflow.com/users/13005/steve-jessop. It's added to give a bit more context about a particular situation (nonparametric Bayesian mehods, e.g. the algorithms by Neal using the Dirichlet process as prior) and the actual code which uses partial_sum in combination with lower_bound.