Using empty Boost accumulators - c++

I am curious, what average is obtained from this code snippet? The accumulator is intended to be empty.
boost::accumulators::accumulator_set<
int,
boost::accumulators::features<boost::accumulators::tag::mean>
> Accumulator;
int Mean = boost::accumulators::mean(Accumulator);
The average is non-zero when I test it. Is there some way I can tell that the average was taken for an empty data set? Why is the resulting value for "Mean" non-zero?
I was looking around in the documentation for the accumulator library, but was unable to find an answer to this question.

Any value would be a valid mean for an empty set of values. That is x * 0 = 0 holds for any x.
You could add a count feature to your accumulator_set and query it to see if its 0.

You don't need to add count feature since mean accumulator is based on count and sum accumulators
from boost User's Guide:
mean depends on the sum and count accumulators ... The result of the mean accumulator is merely the result of the sum accumulator divided by the result of the count accumulator.
so you just need to validate that count is bigger then 0:
bool isEmpty = boost::accumulators::count(Accumulator) == 0;

Related

What would be the fastest algorithm to randomly select N items from a list based on weights distribution?

I have a large list of items, each item has a weight.
I'd like to select N items randomly without replacement, while the items with more weight are more probable to be selected.
I'm looking for the most performing idea. Performance is paramount. Any ideas?
If you want to sample items without replacement, you have lots of options.
Use a weighted-choice-with-replacement algorithm to choose random indices. There are many algorithms like this. One of them is WeightedChoice, described later in this answer, and another is rejection sampling, described as follows. Assume that the highest weight is max, there are n weights, and each weight is 0 or greater. To choose an index in [0, n) using rejection sampling:
Choose a uniform random integer i in [0, n).
With probability weights[i]/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is weights[i] or less, return i, or go to step 1 otherwise.)
Each time the weighted choice algorithm chooses an index, set the weight for the chosen index to 0 to keep it from being chosen again. Or...
Assign each index an exponentially distributed random number (with a rate equal to that index's weight), make a list of pairs assigning each number to an index, then sort that list by those numbers. Then take each item from first to last, in ascending order. This sorting can be done on-line using a priority queue data structure (a technique that leads to weighted reservoir sampling). Notice that the naïve way to generate the random number, -ln(1-RNDU01())/weight, where RNDU01() is a uniform random number in [0, 1], is not robust, however ("Index of Non-Uniform Distributions", under "Exponential distribution").
Tim Vieira gives additional options in his blog.
A paper by Bram van de Klundert compares various algorithms.
EDIT (Aug. 19): Note that for these solutions, the weight expresses how likely a given item will appear first in the sample. This weight is not necessarily the chance that a given sample of n items will include that item (that is, an inclusion probability). The methods given above will not necessarily ensure that a given item will appear in a random sample with probability proportional to its weight; for that, see "Algorithms of sampling with equal or unequal probabilities".
Assuming you want to choose items at random with replacement, here is pseudocode implementing this kind of choice. Given a list of weights, it returns a random index (starting at 0), chosen with a probability proportional to its weight. This algorithm is a straightforward way to implement weighted choice. But if it's too slow for you, see my section "Weighted Choice With Replacement" for a survey of other algorithms.
METHOD WChoose(weights, value)
// Choose the index according to the given value
lastItem = size(weights) - 1
runningValue = 0
for i in 0...size(weights) - 1
if weights[i] > 0
newValue = runningValue + weights[i]
lastItem = i
// NOTE: Includes start, excludes end
if value < newValue: break
runningValue = newValue
end
end
// If we didn't break above, this is a last
// resort (might happen because rounding
// error happened somehow)
return lastItem
END METHOD
METHOD WeightedChoice(weights)
return WChoose(weights, RNDINTEXC(Sum(weights)))
END METHOD
Let A be the item array with x itens. The complexity of each method is defined as
< preprocessing_time, querying_time >
If sorting is possible: < O(x lg x), O(n) >
sort A by the weight of the itens.
create an array B, for example:
B = [ 0, 0, 0, x/2, x/2, x/2, x/2, x/2 ].
it's clear to see that B has a bigger probability from choosing x/2.
if you haven't picked n elements yet, choose a random element e from B.
pick a random element from A within the interval e : x-1.
If iterating through the itens is possible: < O(x), O(tn) >
iterate through A and find the average weight w of the elements.
define the maximum number of tries t.
try (at most t times) to pick a random number in A whose weight is bigger than w.
test for some t that gives you good/satisfactory results.
If nothing above is possible: < O(1), O(tn) >
define the maximum number of tries t.
if you haven't picked n elements yet, take t random elements in A.
pick the element with biggest value.
test for some t that gives you good/satisfactory results.

outputting serializable sorted integers

I am wanting to generate a serialized list of randomly selected positive integers in sorted order, but the number of integers desired and the range of numbers that it may select from in a given use case could easily each be in the many millions (or sometimes even each in the range of billions, if 64 bit integers are being used), so it isn't really feasible to store the numbers that into an array that can then be accessed randomly by the software.
Therefore, I wanted to generate the numbers via a simple loop that looked something like this:
unsigned current = 0;
while(remaining>0) {
if (find_next_to_output(current,max,remaining)) {
// do stuff having output a value
}
}
Where remaining is initialized to however many random numbers I intend to output, and max is the upper bound (plus one) on the numbers that may be generated. It can be assumed that remaining will always be initialized to a number less than or equal to max.
The find_next_to_output function would look similar to this:
/**
* advance through the range of accepted values until all values have been output
* #param current [in/out] integer to examine. Advances to the next integer
* to consider for output
* #param max one more than the largest integer to ever output
* #param remaining [in/out] number of integers left to output.
* #return true if the function ouputted an integer, false otherwise
*/
bool find_next_to_output(unsigned &current, unsigned max, unsigned &remaining)
{
bool result = false;
if (remaining == 0) {
return false;
} if (rnd() * (max - current) < remaining) {
// code to output 'current' goes here.
remaining--;
result = true;
}
int x = ?; // WHAT GOES HERE?
current += x;
return result;
}
Note, the function rnd() used above would return a uniform randomly generated floating point number on the range [0..1).
As the comment highlights, I am unsure how I can calculate a reasonable value for x, such that the number of values of current that get skipped over by the function is reflective of the probability that none of the values that are skipped would be picked (while still leaving a sufficient number remaining that all remaining numbers can still be picked). I know that it needs to be a random number (probably not from a uniform distribution), but I don't know how to calculate a good value for it. At worst, it would simply increment current by one each time, but this should be statistically unlikely when there is a sufficient difference between the number of integers remaining to output and the number of integers remaining in the range.
I do not want to use any third party libraries such as boost, although I am fine with using any random number generators that may be packed in the standard library for C++11.
If any part of my question is unclear, please comment below and I will endeavor to clarify.
If I understand you correctly, you want to generate random, ascending numbers. You are trying to do this by creating a step of a random size to add to the previous number.
Your worry is that if the step is too large, then you will overflow and wrap back around, breaking the ascending requirement.
x needs to be constrained in a manner that prevents the overflow, while still satisfying the random requirement.
You want the modulo operator (modulus). %
const unsigned step = (max - current) / remaining;
x = unsigned(rnd() * max) % step; // will never be larger than step

How to calc percentage of coverage in an array of 1-100 using C++?

This is for an assignment so I would appreciate no direct answers; rather, any logic help with my algorithms (or pointing out any logic flaws) would be incredibly helpful and appreciated!
I have a program that receives "n" number of elements from the user to put into a single-dimensional array.
The array uses random generated numbers.
IE: If the user inputs 88, a list of 88 random numbers (each between 1 to 100) is generated).
"n" has a max of 100.
I must write 2 functions.
Function #1:
Determine the percentage of numbers that appear in the array of "n" elements.
So any duplicates would decrease the percentage.
And any missing numbers would decrease the percentage.
Thus if n = 75, then you have a maximum possible %age of 0.75
(this max %age decreases if there are duplicates)
This function basically calls upon function #2.
FUNCTION HEADER(GIVEN) = "double coverage (int array[], int n)"
Function #2:
Using a linear search, search for the key (key being the current # in the list of 1 to 100, which should be from the loop in function #1), in the array.
Return the position if that key is found in the array
(IE: if this is the loops 40th run, it will be at the variable "39",
and will go through every instance of an element in the array
and if any element is equal to 39, all of those positions will be returned?
I believe that is what our prof is asking)
Return -1 if the key is not found.
Given notes = "Only function #1 calls function #2,
and does so to find out if a certain value (key) is found among the first n elements of the array."
FUNCTION HEADER(GIVEN) = "int search (int array[], int n, int key)"
What I really need help with is the logic for the algorithm.
I would appreciate any help with this as I would approach this problem completely differently than our professor wants us.
My first thoughts would be to loop through function #1 for all variable keys of 1 through 100.
And in that loop, go to the search function (function #2), in which a loop would go through every number in the array and add to a counter if a number was (1)a duplicate or (2) non-existent in the array. Then I would subtract that counter from 100. Thus if all numbers were included in the array except for the #40 and #41, and then #77 was a duplicate , the total percentage of coverage would be 100 - 3 = 97%.
Although as I type this I think that may in of itself be flawed? ^ Because with a max of 100 elements in the array, if the only number missing was 99, then you would subtract 1 for having that number missing, and then if there was a duplicate you would subtract another 1, and thus your percentage of coverage would be (100-2) = 98, when clearly it ought to be 99.
And this ^ is exactly why I would REALLY appreciate any logic help. :)
I know I am having some problems approaching this logically.
I think I can figure out the coding with a relative amount of ease; what I am struggling witht he most is the steps to take. So any pseudocode ideas would be amazing!
(I can post my entire program code so far if necessary for anyone, just ask, but it is rather long as of now as I have many other functions performing other tasks in the program)
I may be mistaken, but as I read it all you need to do is:
write a function that loops through the array of n elements to find a given number in it. It would return the index of first occurence, or a negative value in case the number cannot be found in the array.
write a loop to call the function for all numbers 1 to 100 and count the finds. Then divide the result by 100.
I'm not sure if I understand this whole thing right, but 1 function you can do it, if you don't care about speed, it's better to put array into a vector, loop through 1..100 and use boost find function http://www.boost.org/doc/libs/1_41_0/doc/html/boost/algorithm/find_nth.html. There you can compare current value with the second entry value in the vector, if it contains you decrease, not not decrease, if you want to find if the unique number is in array, use http://www.cplusplus.com/reference/algorithm/find/. I don't understand, how the percentage decreases, so it's on your own and I don't rly understand second function, but if its linear search use again find.
P.S. Vector description http://www.cplusplus.com/reference/vector/vector/begin/.
You want to know how many numbers in the range [1, 100] appear in your given array. You can search for each number in turn:
size_t count_unique(int array[], size_t n)
{
size_t result = 0;
for (int i = 1; i <= 100; ++i)
{
if (contains(array, n, i))
{
++result;
}
}
return result;
}
All you still need is an implementation of the containment check contains(array, n, i), and to transform the unique count into a percentage (by using division).

Find pair of elements in integer array such that abs(v[i]-v[j]) is minimized

Lets say we have int array with 5 elements: 1, 2, 3, 4, 5
What I need to do is to find minimum abs value of array's elements' subtraction:
We need to check like that
1-2 2-3 3-4 4-5
1-3 2-4 3-5
1-4 2-5
1-5
And find minimum abs value of these subtractions. We can find it with 2 fors. The question is, is there any algorithm for finding value with one and only for?
sort the list and subtract nearest two elements
The provably best performing solution is assymptotically linear O(n) up until constant factors.
This means that the time taken is proportional to the number of the elements in the array (which of course is the best we can do as we at least have to read every element of the array, which already takes O(n) time).
Here is one such O(n) solution (which also uses O(1) space if the list can be modified in-place):
int mindiff(const vector<int>& v)
{
IntRadixSort(v.begin(), v.end());
int best = MAX_INT;
for (int i = 0; i < v.size()-1; i++)
{
int diff = abs(v[i]-v[i+1]);
if (diff < best)
best = diff;
}
return best;
}
IntRadixSort is a linear time fixed-width integer sorting algorithm defined here:
http://en.wikipedia.org/wiki/Radix_sort
The concept is that you leverage the fixed-bitwidth nature of ints by paritioning them in a series of fixed passes on the bit positions. ie partition them on the hi bit (32nd), then on the next highest (31st), then on the next (30th), and so on - which only takes linear time.
The problem is equivalent to sorting. Any sorting algorithm could be used, and at the end, return the difference between the nearest elements. A final pass over the data could be used to find that difference, or it could be maintained during the sort. Before the data is sorted the min difference between adjacent elements will be an upper bound.
So to do it without two loops, use a sorting algorithm that does not have two loops. In a way it feels like semantics, but recursive sorting algorithms will do it with only one loop. If this issue is the n(n+1)/2 subtractions required by the simple two loop case, you can use an O(n log n) algorithm.
No, unless you know the list is sorted, you need two
Its simple Iterate in a for loop
keep 2 variable "minpos and maxpos " and " minneg" and "maxneg"
check for the sign of the value you encounter and store maximum positive in maxpos
and minimum +ve number in "minpos" do the same by checking in if case for number
less than zero. Now take the difference of maxpos-minpos in one variable and
maxneg and minneg in one variable and print the larger of the two . You will get
desired.
I believe you definitely know how to find max and min in one for loop
correction :- The above one is to find max difference in case of minimum you need to
take max and second max instead of max and min :)
This might be help you:
end=4;
subtractmin;
m=0;
for(i=1;i<end;i++){
if(abs(a[m]-a[i+m])<subtractmin)
subtractmin=abs(a[m]-a[i+m];}
if(m<4){
m=m+1
end=end-1;
i=m+2;
}}

Random algorithm with priority to one end

I am writing a little program with a GUI in C++ and Qt.
It is supposed to be similar to a vocabulary trainer. I will use it for my own studying.
I have a QList of objects (name and description as string for example).
Then I have a second QList with ints in it. For every object in my other list, an int is in this list. Start value is 50 for every object; if user clicks correct, it gets decremented, vice versa.
So a object with value 70 should be shown more often to the user than an object with value 30. So in the correct answer method I increase/decrease it, sort the QList and use my random algorithm:
if(packList.count()==0) // the QList with objects
return;
int Min = 0;
int Max = packList.count()-1; // -1 because i need the index
qsrand(QTime::currentTime().msec());
if (Min > Max)
{
int Temp = Min;
Min = Max;
Max = Temp;
}
int randNum = ((rand()%(Max-Min+1))+Min);
setPage(randNum); // randNum will be used as index in this method
Now what I need is a way to implement my priority in this random algorithm. I don't want the ones with a higher value to appear 90% of the time, but just more often, just like a vocabulary trainer.
First a remark: You should use qsrand only once at the beginning of the program.
Now to your algorithm: First get the sum of all your values, let us call it sumValues, then compute your random number between 0 and sumValues-1. Go through your list and sum the values into the variable currentSum until it is greater or equal to your random number, and use the index of this entry. This will be more efficient if you sort your list by decreasing values.