ND4J: Generate random number within range - dl4j

If we want to create an INDArray of given rows and columns filled with random number between 0 and 1 we can use:
Nd4j.rand(row, cols);
But what if we want INDArray of given rows and columns filled with random number between -5000 and 5000?
Moreover, what if each column / vector has random number selected from specific range?
For example:
row = 2; columns = 3
pool = { {-500, 500}, {100, 100}, {1, 10} }
INDArray = [ [randRange(-500,500), randRange(100,100), randRange(1,10)],
[randRange(-500,500), randRange(100,100), randRange(1,10)] ]
How can this be achieved by harnessing the efficiency of Nd4j?

Use other signatures.
I.e this one:
Nd4j.rand(long[] shape, double min, double max, org.nd4j.linalg.api.rng.Random rng);
or this one:
Nd4j.rand(INDArray target, double min, double max, org.nd4j.linalg.api.rng.Random rng);

Related

Can I exclude a number or subrange of numbers inside a range of random numbers in modern C++?

I have:
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> probability(0, 100);
I want to exclude some numbers in this range of probabilities.
Example 1: Let's say, I want to generate a random number between 0 and 100, but this number can never be 4.
Example 2: Let's say, I want to generate a random number between 0 and 100, but this number can never be any number between 4 and 7.
I wonder if it is possible to achieve in modern C++ without using std::rand?
If you want to stay with a uniform_int_distribution you can do it manually like this:
Example1: Let's say, I want to generate a random number in between 0 and 100, but this number can never be 4.
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> distribution(0,99);
auto temp = distribution(mt);
auto random_number = (temp < 4) ? temp : temp + 1;
Example2: Let's say, I want to generate a random number in between 0 and 100, but this number can never be any number between 4 and 7.
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> distribution(0,96);
auto temp = distribution(mt);
auto random_number = (temp < 4) ? temp : temp + 4;
This could be generalize to write a function random_int_between_excluding(int first, int last, std::vector<int> exclude), though at some point it will be simpler to follow NathanOlivers suggestion and use a std::discrete_distribution instead.
Example2: Let's say, I want to generate a random number in between 0
and 100, but this number can never be any number between 4 and 7.
This is what std::piecewise_constant_distribution is for.
std::vector<int> i{0, 4, 8, 101};
std::vector<int> w{ 4, 0, 93};
std::piecewise_constant_distribution<> d(i.begin(), i.end(), w.begin());
Live demo
If you want to miss out 4 say, then a very good way (which doesn't compromise any statistical properties of the generator), is to draw in the half-open interval [0, 99) then add 1 if the number is 4 or greater.
You do something similar to omit numbers in a range.
This method is a surprisingly good way of modelling the quantile function associated with the desired probability distribution.
You can use a filter of arbitrary complexity on uniform distribution:
template<typename D, typename G, typename F>
auto sample(D &distribution, G &generator, F const &filter)
{
while(true)
{
auto const value = distribution(generator);
if(filter(value))
return value;
}
}
Your example case transforms into the following
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> probability(0, 100);
auto const filter = +[](int n) {return n < 4 || n > 7;}
int const i = sample(probability, mt, filter);
You have to keep in mind that this kind of filtering comes at a cost.
Let N be the number of distinct values the distribution returns, F - the number of these values filtered out; then, if you need to sample S values, you have to sample and filter S * N / (N - F) values at average. It's okay if F is small compared to N, but horribly inefficient when F approaches N. In your case, N = 100, F = 4, and N / (N - F) = 1.04166...
If you care prefer readability and simplicity, that's your choice. Otherwise, if you need performance, you'd better try out piecewise distributions or mess with the value range manually.
There is an option to do it manually within a reasonable range of numbers..., create a look up table and exclude the numbers that are invalid:
static int rand_pool[]{1,2,3,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23}; //no number 4
srand((int)time(0));
int random_number = rand_pool[rand() % 22];

How to uniform spread every k values over a collection of n values with k <= n?

I've a collection of k elements. I need to spread them uniformly random into a collection of n elements, where k <= n.
So for example, with this k-collection (with k = 3):
{ 3, 5, 6 }
and give n = 7, a valid permutation result (with n = 7 elements) could be:
{ 6, 5, 6, 3, 3, 6, 5}
Notice that every item within the k-collection must be used into the permutation.
So this is not a valid result:
{ 6, 3, 6, 3, 3, 6, 6} // it lacks "5"
What's the fast way to accomplish this?
The simplest way I can think of.
Add one of each item to the array. So with your example, your initial array is [3,5,6]. This guarantees that every element is represented at least once.
Then, successively pick an element at random, and add it to the array. Do this n-3 times. (i.e. fill the array with randomly selected items from the list of elements)
Shuffle the array.
This takes O(n) to fill the array, and O(n) to shuffle it.
Let's assume you have a
std::vector<int> input;
that contains the k elements you need to spread and
std::vector<int> output;
that will be filled with n elements.
I used the following approach for a similiar problem. (Edit: Thinking about it, here is a simpler and probably faster version than the original)
First we satisfy the condition that every item from input must occurr at least once in output. Therefore we put every element from input once into output.
output.resize(n); // fill with n 0's
std::copy(input.begin(), input.end(), output.begin()); // fill k first items
Now we can fill up the remaining n - k slots with random elements from input:
std::random_device rd;
std::mt19937 rand(rd()); // get seed from random device
std::uniform_int_distribution<> dist(0, k - 1); // for random numbers in [0, k-1]
for(size_t i = k; i < n; i++) {
output[i] = input[dist(rand)];
}
At the end shuffle the whole thing, to randomize the position of the first k elements:
std::random_shuffle(output.begin(), output.end(), rand);
I hope this is what you wanted.
You can try just randomly put values to ur n-collection, then verify if it contains all k-collection values if not try again. However it's not always fast xd u can also put missing values in a random place of n-collection, but remember to verify again.
Simply make an array of the k elements, say {3,5,6} in the given example. Make a variable counter, which is zero initially. If you want to spread it over n elements, simply iterate over n elements of array with the counter incrementing as
counter=(counter+1)%k;

std::discrete_distribution of a specified range of random numbers

I know I can use std::discrete_distribution like this:
std::default_random_engine generator;
std::discrete_distribution<int> distribution {2,2,1,1,2,2,1,1,2,2};
int p[10]={};
for (int i=0; i<100; ++i) {
int number = distribution(generator);
++p[number];
}
However, this is going to generate the numbers in the range of 0-9 as per the weight specified.
What can I do to generate the numbers within a user specified range, say for example, 24-33 or 95-104 etc but still using the distribution specified in discrete_distribution ?
You can just add 24 or 95 to the number that is generated. At the beginning you have numbers from 0 to 9, when you add 24 to them you have numbers from 24 to 33.
when you have a function that returns a random number in range [r1,r2), and you want [min,max) mathematics to help.
First we calculate d=(max-min)/(r2-r1) and then we multiply the range [r1,r2) by d so we get [d*r1,d*r2) which we'll make [r1',r2'), now you calculate diff=abs(min-r1)' and sum that to the range [r1',r2'), we used abs because we want the difference without sign, and that my friend will get you the range you want, let's assume the function that returns in range [r1,r2) is somerand() then :
const int r1=0,r2=9
int myrand(const int min,const int max){
const int d=(max-min)/(r2-r1);
const int localrand=d*somerand();
return localrand + abs(min - d*r1);
}
you can check that the function returns min when somerand() returns r1 and your function returns max when somerand() returns r2.
The solution above won't work with e.g. : r1=0,r2=2,min=0,max=3, in short max-min should be divisible by r2-r1.

algorithm: find count of numbers within a given range

given an unsorted number array where there can be duplicates, pre-process the array so that to find the count of numbers within a given range, the time is O(1).
For example, 7,2,3,2,4,1,4,6. The count of numbers both >= 2 and <= 5 is 5. (2,2,3,4,4).
Sort the array. For each element in the sorted array, insert that element into a hash table, with the value of the element as the key, and its position in the array as the associated value. Any values that are skipped, you'll need to insert as well.
To find the number of items in a range, look up the position of the value at each end of the range in the hash table, and subtract the lower from the upper to find the size of the range.
This sounds suspiciously like one of those clever interview questions some interviewers like to ask, which is usually associated with hints along the way to see how you think.
Regardless... one possible way of implementing this is to make a list of the counts of numbers equal to or less than the list index.
For example, from your list above, generate the list: 0, 1, 3, 4, 6, 6, 7, 8. Then you can count the numbers between 2 and 5 by subtracting list[1] from list[5].
Since we need to access in O(1), the data structure needed would be memory-intensive.
With Hash Table, in worst case access would take O(n)
My Solution:
Build a 2D matrix.
array = {2,3,2,4,1,4,6} Range of numbers = 0 to 6 so n = 7
So we've to create nxn matrix.
array[i][i] represents total count of element = i
so array[4][4] = 2 (since 4 appears 2 times in array)
array[5][5] = 0
array[5][2] = count of numbers both >= 2 and <= 5 = 5
//preprocessing stage 1: Would populate a[i][i] with total count of element = i
a[n][n]={0};
for(i=0;i<=n;i++){
a[i][i]++;
}
//stage 2
for(i=1;i<=n;i++)
for(j=0;j<i;j++)
a[i][j] = a[i-1][j] + a[i][i];
//we are just adding count of element=i to each value in i-1th row and we get ith row.
Now (5,2) would query for a[5][2] and would give answer in O(1)
int main()
{
int arr[8]={7,2,3,2,4,1,4,6};
int count[9];
int total=0;
memset(count,0, sizeof(count));
for(int i=0;i<8;i++)
count[arr[i]]++;
for(int k=0;k<9;k++)
{
if(k>=2 && k<=5 && count[k]>0 )
{
total= total+count[k] ;
}
}
printf("%d:",total);
return 0;
}

C++: function creation using array

Write a function which has:
input: array of pairs (unique id and weight) length of N, K =< N
output: K random unique ids (from input array)
Note: being called many times frequency of appearing of some Id in the output should be greater the more weight it has.
Example: id with weight of 5 should appear in the output 5 times more often than id with weight of 1. Also, the amount of memory allocated should be known at compile time, i.e. no additional memory should be allocated.
My question is: how to solve this task?
EDIT
thanks for responses everybody!
currently I can't understand how weight of pair affects frequency of appearance of pair in the output, can you give me more clear, "for dummy" explanation of how it works?
Assuming a good enough random number generator:
Sum the weights (total_weight)
Repeat K times:
Pick a number between 0 and total_weight (selection)
Find the first pair where the sum of all the weights from the beginning of the array to that pair is greater than or equal to selection
Write the first part of the pair to the output
You need enough storage to store the total weight.
Ok so you are given input as follows:
(3, 7)
(1, 2)
(2, 5)
(4, 1)
(5, 2)
And you want to pick a random number so that the weight of each id is reflected in the picking, i.e. pick a random number from the following list:
3 3 3 3 3 3 3 1 1 2 2 2 2 2 4 5 5
Initially, I created a temporary array but this can be done in memory as well, you can calculate the size of the list by summing all the weights up = X, in this example = 17
Pick a random number between [0, X-1], and calculate which which id should be returned by looping through the list, doing a cumulative addition on the weights. Say I have a random number 8
(3, 7) total = 7 which is < 8
(1, 2) total = 9 which is >= 8 **boom** 1 is your id!
Now since you need K random unique ids you can create a hashtable from initial array passed to you to work with. Once you find an id, remove it from the hash and proceed with algorithm. Edit Note that you create the hashmap initially only once! You algorithm will work on this instead of looking through the array. I did not put in in the top to keep the answer clear
As long as your random calculation is not using any extra memory secretly, you will need to store K random pickings, which are <= N and a copy of the original array so max space requirements at runtime are O(2*N)
Asymptotic runtime is :
O(n) : create copy of original array into hastable +
(
O(n) : calculate sum of weights +
O(1) : calculate random between range +
O(n) : cumulative totals
) * K random pickings
= O(n*k) overall
This is a good question :)
This solution works with non-integer weights and uses constant space (ie: space complexity = O(1)). It does, however modify the input array, but the only difference in the end is that the elements will be in a different order.
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Subtract input[i-1].weight from input[i].weight (unless i == 0). Now subtract input[i].weight from to following (> i) input weights and also sum_weight.
Move input[i] to position [n-1] (sliding the intervening elements down one slot). This is the expensive part, as it's O(N) and we do it K times. You can skip this step on the last iteration.
subtract 1 from n
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*N). The expensive part (of the time complexity) is shuffling the chosen elements. I suspect there's a clever way to avoid that, but haven't thought of anything yet.
Update
It's unclear what the question means by "output: K random unique Ids". The solution above assumes that this meant that the output ids are supposed to be unique/distinct, but if that's not the case then the problem is even simpler:
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*log(N)).
My short answer: in no way.
Just because the problem definition is incorrect. As Axn brilliantly noticed:
There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.
Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.
The task may be splitted on two subtasks:
Generate random numbers with a given distribution (specified by weights)
Generate unique random numbers
Generate random numbers with a given distribution
Calculate sum of weights (sumOfWeights)
Generate random number from the range [1; sumOfWeights]
Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number
Code
#include <iostream>
#include <cstdlib>
#include <ctime>
// 0 - id, 1 - weight
typedef unsigned Pair[2];
unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][2];
}
const unsigned random = rand() % sumOfWeights + 1;
sumOfWeights = 0;
unsigned i = 0;
for (; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][3];
if (sumOfWeights >= random)
{
break;
}
}
return i;
}
Generate unique random numbers
Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.
It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.
Now, we have to combine these two algorithms. We just need to use our own Random() function instead of standard rand() in unique numbers generation algorithm.
Code
template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
unsigned deck[N];
for (unsigned i = 0; i < N; ++i)
{
deck[i] = i;
}
unsigned max = N - 1;
for (unsigned i = 0; i < K; ++i)
{
const unsigned index = Random(i_set, deck, max + 1);
std::swap(deck[max], deck[index]);
o_res[i] = i_set[deck[max]][0];
--max;
}
}
Usage
int main()
{
srand((unsigned)time(0));
const unsigned c_N = 5; // N
const unsigned c_K = 2; // K
Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
unsigned result[c_K] = {};
const unsigned c_total = 1000000; // number of iterations
unsigned counts[c_N] = {0}; // frequency counters
for (unsigned i = 0; i < c_total; ++i)
{
Generate<c_N, c_K>(input, result);
for (unsigned j = 0; j < c_K; ++j)
{
++counts[result[j]];
}
}
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < c_N; ++i)
{
sumOfWeights += input[i][1];
}
for (unsigned i = 0; i < c_N; ++i)
{
std::cout << (double)counts[i]/c_K/c_total // empirical frequency
<< " | "
<< (double)input[i][1]/sumOfWeights // expected frequency
<< std::endl;
}
return 0;
}
Output
N = 5, K = 2
Frequencies
Empiricical | Expected
0.253813 | 0.263158
0.16584 | 0.157895
0.113878 | 0.105263
0.253582 | 0.263158
0.212888 | 0.210526
Corner case when weights are actually ignored
N = 5, K = 5
Frequencies
Empiricical | Expected
0.2 | 0.263158
0.2 | 0.157895
0.2 | 0.105263
0.2 | 0.263158
0.2 | 0.210526
I do assume that the ids in the output must be unique. This makes this problem a specific instance of random sampling problems.
The first approach that I can think of solves this in O(N^2) time, using O(N) memory (The input array itself plus constant memory).
I Assume that the weights are possitive.
Let A be the array of pairs.
1) Set N to be A.length
2) calculate the sum of all weights W.
3) Loop K times
3.1) r = rand(0,W)
3.2) loop on A and find the first index i such that A[1].w + ...+ A[i].w <= r < A[1].w + ... + A[i+1].w
3.3) add A[i].id to output
3.4) A[i] = A[N-1] (or swap if the array contents should be preserved)
3.5) N = N - 1
3.6) W = W - A[i].w