Weighted probability with long doubles - c++

I am working with an array of roughly 2000 elements in C++.
Each element represents the probability of that element being selected randomly.
I then have convert this array into a cumulative array, with the intention of using this to work out which element to choose when a dice is rolled.
Example array:
{1,2,3,4,5}
Example cumulative array:
{1,3,6,10,15}
I want to be able to select 3 in the cumulative array when numbers 3, 4 or 5 are rolled.
The added complexity is that my array is made up of long doubles. Here's an example of a few consecutive elements:
0.96930161525189592646367317541056252139242133125662803649902343750
0.96941377254127855667142910078837303444743156433105468750000000000
0.96944321382974149711383993199831365927821025252342224121093750000
0.96946143938926617454089618153290075497352518141269683837890625000
0.96950069444055009509463721739663810694764833897352218627929687500
0.96951751803395748961766908990966840065084397792816162109375000000
This could be a terrible way of doing weighted probabilities with this data set, so I'm open to any suggestions of better ways of working this out.

You can use partial_sum:
unsigned int SIZE = 5;
int array[SIZE] = {1,2,3,4,5};
int partials[SIZE] = {0};
partial_sum(array, array+SIZE, partials);
// partials is now {1,3,6,10,15}
The value you want from the array is available from the partial sums:
12 == array[2] + array[3] + array[4];
12 == partials[4] - partials[1];
The total is obviously the last value in the partial sums:
15 == partial[4];

consider storing the information as an integer numerator and denominator so that there is no loss of precision until the final step.

You can actually do this using stream selection without having to compute an array of partial sums. Here's code I have for this in Java:
public static int selectRandomWeighted(double[] wts, Random rnd) {
int selected = 0;
double total = wts[0];
for( int i = 1; i < wts.length; i++ ) {
total += wts[i];
if( rnd.nextDouble() <= (wts[i] / total)) {
selected = i;
}
}
return selected;
}
The above could potentially be further improved using Kahan summation if you want to preserve as many digits of accuracy in the sum as possible.
However, if you want to draw from this array repeatedly, then pre-computing an array of partial sums and using binary search to find the right index will be faster.

Ok I think I've solved this one.
I just did a binary split search, but instead of just having
if (arr[middle] == value)
I added in an OR
if (arr[middle] == value || (arr[middle] < value && arr[middle+1] > value))
This seems to handle it in the way I was hoping for.

Related

minimum total move to balance array if we can increase/decrease a specific array element by 1

It is leetcode 462.
I have one algorithm but it failed some tests while passing others.
I tried to think through but not sure what is the corner case that i overlooked.
We have one array of N elements. One move is defined as increasing OR decreasing one single element of the array by 1. We are trying to find the minimum number of moves to make all elements equal.
My idea is:
1. find the average
2. find the element closest to the average
3. sum together the difference between each element and the element closest to the average.
What am i missing? Please provide one counter example.
class Solution {
public:
int minMoves2(vector<int>& nums) {
int sum=0;
for(int i=0;i<nums.size();i++){
sum += nums[i];
}
double avg = (double) sum / nums.size();
int min = nums[0];
int index =0 ;
for(int i=0;i<nums.size();i++){
if(abs(nums[i]-avg) <= abs(min - avg)){
min = nums[i];
index = i;
}
}
sum=0;
for(int i=0;i<nums.size();i++){
sum += abs(min - nums[i]);
}
return sum;
}
};
Suppose the array is [1, 1, 10, 20, 100]. The average is a bit over 20. So your solution would involving 19 + 19 + 10 + 0 + 80 moves = 128. What if we target 10 instead? Then we have 9 + 9 + 0 + 10 + 90 moves = 118. So this is a counter example.
Suppose you decide to target changing all array elements to some value T. The question is, what's the right value for T? Given some value of T, we could ask if increasing or decreasing T by 1 will improve or worsen our outcome. If we decrease T by 1, then all values greater than T need an extra move, and all those below need one move less. That means that if T is above the median, there are more values below it than above, and so we benefit from decreasing T. We can make the opposite argument if T is less than the median. From this we can conclude that the correct value of T is actually the median itself, which my example demonstreates (strictly speaking, when you have an even sized array, T can be anywhere between the two middle elements).

How to trace error with counter in do while loop in C++?

I am trying to get i to read array with numbers and get the smaller number, store it in variable and then compare it with another variable that is again from two other numbers (like 2,-3).
There is something wrong in the way I implement the do while loop. I need the counter 'i' to be updated twice so it goes through I have 2 new variables from 4 compared numbers. When I hard code it n-1,n-2 it works but with the loop it gets stuck at one value.
int i=0;
int closestDistance=0;
int distance=0;
int nextDistance=0;
do
{
distance = std::min(values[n],values[n-i]); //returns the largest
distance=abs(distance);
i++;
nextDistance=std::min(values[n],values[n-i]);
nextDistance=abs(closestDistance); //make it positive then comp
if(distance<nextDistance)
closestDistance=distance;//+temp;
else
closestDistance=nextDistance;
i++;
}
while(i<n);
return closestDistance;
Maybe this:
int i = 0;
int m = 0;
do{
int lMin = std::min(values[i],values[i + 1]);
i += 2;
int rMin = std::min(values[i], values[i + 1]);
m = std::min(lMin,rMin);
i += 2;
}while(i < n);
return m;
I didn't understand what you meant, but this compares values in values 4 at a time to find the minimal. Is that all you needed?
Note that if n is the size of values, this would go out of bounds. n would have to be the size minus 4, leading to odd exceptional cases.
The issue with your may be in the call to abs. Are all the values positive? Are you trying to find the smallest absolute value?
Also, note that using i += 2 twice ensures that you do not repeat any values. This means that you will go over 4 unique values. Your code goes through 3 in each iteration of the loop.
I hope this clarified.
What are you trying to do in following lines.
nextDistance=std::min(values[n],values[n-i]);
nextDistance=abs(closestDistance); //make it positive , then computed

Trying to multiply the kiddy way

I'm supposed to multiply two 3-digit numbers the way we used to do in childhood.
I need to multiply each digit of a number with each of the other number's digit, calculate the carry, add the individual products and store the result.
I was able to store the 3 products obtained (for I/P 234 and 456):
1404
1170
0936
..in a 2D array.
Now when I try to arrange them in the following manner:
001404
011700
093600
to ease addition to get the result; by:
for(j=5;j>1;j--)
{
xx[0][j]=xx[0][j-2];
}
for(j=4;j>0;j--)
{
xx[1][j]=xx[1][j-1];
}
xx is the 2D array I've stored the 3 products in.
everything seems to be going fine till I do this:
xx[0][0]=0;
xx[0][1]=0;
xx[1][0]=0;
Here's when things go awry. The values get all mashed up. On printing, I get 001400 041700 093604.
What am I doing wrong?
Assuming the first index of xx is the partial sum, that the second index is the digit in that sum, and that the partial sums are stored with the highest digit at the lowest index,
for (int i = 0; i < NUM_DIGITS; i++) // NUM_DIGITS = number of digits in multiplicands
{
for (int j = 5; j >= 0; j--) // Assuming 5 is big enough
{
int index = (j - 1) - (NUM_DIGITS - 1) - i;
xx[i][j] = index >= 0 ? xx[i][index] : 0;
}
}
There are definitely more efficient/logical ways of doing this, of course, such as avoiding storing the digits individually, but within the constraints of the problem, this should give you the right answer.

n-th or Arbitrary Combination of a Large Set

Say I have a set of numbers from [0, ....., 499]. Combinations are currently being generated sequentially using the C++ std::next_permutation. For reference, the size of each tuple I am pulling out is 3, so I am returning sequential results such as [0,1,2], [0,1,3], [0,1,4], ... [497,498,499].
Now, I want to parallelize the code that this is sitting in, so a sequential generation of these combinations will no longer work. Are there any existing algorithms for computing the ith combination of 3 from 500 numbers?
I want to make sure that each thread, regardless of the iterations of the loop it gets, can compute a standalone combination based on the i it is iterating with. So if I want the combination for i=38 in thread 1, I can compute [1,2,5] while simultaneously computing i=0 in thread 2 as [0,1,2].
EDIT Below statement is irrelevant, I mixed myself up
I've looked at algorithms that utilize factorials to narrow down each individual element from left to right, but I can't use these as 500! sure won't fit into memory. Any suggestions?
Here is my shot:
int k = 527; //The kth combination is calculated
int N=500; //Number of Elements you have
int a=0,b=1,c=2; //a,b,c are the numbers you get out
while(k >= (N-a-1)*(N-a-2)/2){
k -= (N-a-1)*(N-a-2)/2;
a++;
}
b= a+1;
while(k >= N-1-b){
k -= N-1-b;
b++;
}
c = b+1+k;
cout << "["<<a<<","<<b<<","<<c<<"]"<<endl; //The result
Got this thinking about how many combinations there are until the next number is increased. However it only works for three elements. I can't guarantee that it is correct. Would be cool if you compare it to your results and give some feedback.
If you are looking for a way to obtain the lexicographic index or rank of a unique combination instead of a permutation, then your problem falls under the binomial coefficient. The binomial coefficient handles problems of choosing unique combinations in groups of K with a total of N items.
I have written a class in C# to handle common functions for working with the binomial coefficient. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the K-indexes to the proper lexicographic index or rank of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the set.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it is also faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
The following tested code will iterate through each unique combinations:
public void Test10Choose5()
{
String S;
int Loop;
int N = 500; // Total number of elements in the set.
int K = 3; // Total number of elements in each group.
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
// The Kindexes array specifies the indexes for a lexigraphic element.
int[] KIndexes = new int[K];
StringBuilder SB = new StringBuilder();
// Loop thru all the combinations for this N choose K case.
for (int Combo = 0; Combo < NumCombos; Combo++)
{
// Get the k-indexes for this combination.
BC.GetKIndexes(Combo, KIndexes);
// Verify that the Kindexes returned can be used to retrive the
// rank or lexigraphic order of the KIndexes in the table.
int Val = BC.GetIndex(true, KIndexes);
if (Val != Combo)
{
S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString();
Console.WriteLine(S);
}
SB.Remove(0, SB.Length);
for (Loop = 0; Loop < K; Loop++)
{
SB.Append(KIndexes[Loop].ToString());
if (Loop < K - 1)
SB.Append(" ");
}
S = "KIndexes = " + SB.ToString();
Console.WriteLine(S);
}
}
You should be able to port this class over fairly easily to C++. You probably will not have to port over the generic part of the class to accomplish your goals. Your test case of 500 choose 3 yields 20,708,500 unique combinations, which will fit in a 4 byte int. If 500 choose 3 is simply an example case and you need to choose combinations greater than 3, then you will have to use longs or perhaps fixed point int.
You can describe a particular selection of 3 out of 500 objects as a triple (i, j, k), where i is a number from 0 to 499 (the index of the first number), j ranges from 0 to 498 (the index of the second, skipping over whichever number was first), and k ranges from 0 to 497 (index of the last, skipping both previously-selected numbers). Given that, it's actually pretty easy to enumerate all the possible selections: starting with (0,0,0), increment k until it gets to its maximum value, then increment j and reset k to 0 and so on, until j gets to its maximum value, and so on, until j gets to its own maximum value; then increment i and reset both j and k and continue.
If this description sounds familiar, it's because it's exactly the same way that incrementing a base-10 number works, except that the base is much funkier, and in fact the base varies from digit to digit. You can use this insight to implement a very compact version of the idea: for any integer n from 0 to 500*499*498, you can get:
struct {
int i, j, k;
} triple;
triple AsTriple(int n) {
triple result;
result.k = n % 498;
n = n / 498;
result.j = n % 499;
n = n / 499;
result.i = n % 500; // unnecessary, any legal n will already be between 0 and 499
return result;
}
void PrintSelections(triple t) {
int i, j, k;
i = t.i;
j = t.j + (i <= j ? 1 : 0);
k = t.k + (i <= k ? 1 : 0) + (j <= k ? 1 : 0);
std::cout << "[" << i << "," << j << "," << k << "]" << std::endl;
}
void PrintRange(int start, int end) {
for (int i = start; i < end; ++i) {
PrintSelections(AsTriple(i));
}
}
Now to shard, you can just take the numbers from 0 to 500*499*498, divide them into subranges in any way you'd like, and have each shard compute the permutation for each value in its subrange.
This trick is very handy for any problem in which you need to enumerate subsets.

C++: function creation using array

Write a function which has:
input: array of pairs (unique id and weight) length of N, K =< N
output: K random unique ids (from input array)
Note: being called many times frequency of appearing of some Id in the output should be greater the more weight it has.
Example: id with weight of 5 should appear in the output 5 times more often than id with weight of 1. Also, the amount of memory allocated should be known at compile time, i.e. no additional memory should be allocated.
My question is: how to solve this task?
EDIT
thanks for responses everybody!
currently I can't understand how weight of pair affects frequency of appearance of pair in the output, can you give me more clear, "for dummy" explanation of how it works?
Assuming a good enough random number generator:
Sum the weights (total_weight)
Repeat K times:
Pick a number between 0 and total_weight (selection)
Find the first pair where the sum of all the weights from the beginning of the array to that pair is greater than or equal to selection
Write the first part of the pair to the output
You need enough storage to store the total weight.
Ok so you are given input as follows:
(3, 7)
(1, 2)
(2, 5)
(4, 1)
(5, 2)
And you want to pick a random number so that the weight of each id is reflected in the picking, i.e. pick a random number from the following list:
3 3 3 3 3 3 3 1 1 2 2 2 2 2 4 5 5
Initially, I created a temporary array but this can be done in memory as well, you can calculate the size of the list by summing all the weights up = X, in this example = 17
Pick a random number between [0, X-1], and calculate which which id should be returned by looping through the list, doing a cumulative addition on the weights. Say I have a random number 8
(3, 7) total = 7 which is < 8
(1, 2) total = 9 which is >= 8 **boom** 1 is your id!
Now since you need K random unique ids you can create a hashtable from initial array passed to you to work with. Once you find an id, remove it from the hash and proceed with algorithm. Edit Note that you create the hashmap initially only once! You algorithm will work on this instead of looking through the array. I did not put in in the top to keep the answer clear
As long as your random calculation is not using any extra memory secretly, you will need to store K random pickings, which are <= N and a copy of the original array so max space requirements at runtime are O(2*N)
Asymptotic runtime is :
O(n) : create copy of original array into hastable +
(
O(n) : calculate sum of weights +
O(1) : calculate random between range +
O(n) : cumulative totals
) * K random pickings
= O(n*k) overall
This is a good question :)
This solution works with non-integer weights and uses constant space (ie: space complexity = O(1)). It does, however modify the input array, but the only difference in the end is that the elements will be in a different order.
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Subtract input[i-1].weight from input[i].weight (unless i == 0). Now subtract input[i].weight from to following (> i) input weights and also sum_weight.
Move input[i] to position [n-1] (sliding the intervening elements down one slot). This is the expensive part, as it's O(N) and we do it K times. You can skip this step on the last iteration.
subtract 1 from n
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*N). The expensive part (of the time complexity) is shuffling the chosen elements. I suspect there's a clever way to avoid that, but haven't thought of anything yet.
Update
It's unclear what the question means by "output: K random unique Ids". The solution above assumes that this meant that the output ids are supposed to be unique/distinct, but if that's not the case then the problem is even simpler:
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*log(N)).
My short answer: in no way.
Just because the problem definition is incorrect. As Axn brilliantly noticed:
There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.
Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.
The task may be splitted on two subtasks:
Generate random numbers with a given distribution (specified by weights)
Generate unique random numbers
Generate random numbers with a given distribution
Calculate sum of weights (sumOfWeights)
Generate random number from the range [1; sumOfWeights]
Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number
Code
#include <iostream>
#include <cstdlib>
#include <ctime>
// 0 - id, 1 - weight
typedef unsigned Pair[2];
unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][2];
}
const unsigned random = rand() % sumOfWeights + 1;
sumOfWeights = 0;
unsigned i = 0;
for (; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][3];
if (sumOfWeights >= random)
{
break;
}
}
return i;
}
Generate unique random numbers
Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.
It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.
Now, we have to combine these two algorithms. We just need to use our own Random() function instead of standard rand() in unique numbers generation algorithm.
Code
template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
unsigned deck[N];
for (unsigned i = 0; i < N; ++i)
{
deck[i] = i;
}
unsigned max = N - 1;
for (unsigned i = 0; i < K; ++i)
{
const unsigned index = Random(i_set, deck, max + 1);
std::swap(deck[max], deck[index]);
o_res[i] = i_set[deck[max]][0];
--max;
}
}
Usage
int main()
{
srand((unsigned)time(0));
const unsigned c_N = 5; // N
const unsigned c_K = 2; // K
Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
unsigned result[c_K] = {};
const unsigned c_total = 1000000; // number of iterations
unsigned counts[c_N] = {0}; // frequency counters
for (unsigned i = 0; i < c_total; ++i)
{
Generate<c_N, c_K>(input, result);
for (unsigned j = 0; j < c_K; ++j)
{
++counts[result[j]];
}
}
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < c_N; ++i)
{
sumOfWeights += input[i][1];
}
for (unsigned i = 0; i < c_N; ++i)
{
std::cout << (double)counts[i]/c_K/c_total // empirical frequency
<< " | "
<< (double)input[i][1]/sumOfWeights // expected frequency
<< std::endl;
}
return 0;
}
Output
N = 5, K = 2
Frequencies
Empiricical | Expected
0.253813 | 0.263158
0.16584 | 0.157895
0.113878 | 0.105263
0.253582 | 0.263158
0.212888 | 0.210526
Corner case when weights are actually ignored
N = 5, K = 5
Frequencies
Empiricical | Expected
0.2 | 0.263158
0.2 | 0.157895
0.2 | 0.105263
0.2 | 0.263158
0.2 | 0.210526
I do assume that the ids in the output must be unique. This makes this problem a specific instance of random sampling problems.
The first approach that I can think of solves this in O(N^2) time, using O(N) memory (The input array itself plus constant memory).
I Assume that the weights are possitive.
Let A be the array of pairs.
1) Set N to be A.length
2) calculate the sum of all weights W.
3) Loop K times
3.1) r = rand(0,W)
3.2) loop on A and find the first index i such that A[1].w + ...+ A[i].w <= r < A[1].w + ... + A[i+1].w
3.3) add A[i].id to output
3.4) A[i] = A[N-1] (or swap if the array contents should be preserved)
3.5) N = N - 1
3.6) W = W - A[i].w