How to uniform spread every k values over a collection of n values with k <= n? - c++

I've a collection of k elements. I need to spread them uniformly random into a collection of n elements, where k <= n.
So for example, with this k-collection (with k = 3):
{ 3, 5, 6 }
and give n = 7, a valid permutation result (with n = 7 elements) could be:
{ 6, 5, 6, 3, 3, 6, 5}
Notice that every item within the k-collection must be used into the permutation.
So this is not a valid result:
{ 6, 3, 6, 3, 3, 6, 6} // it lacks "5"
What's the fast way to accomplish this?

The simplest way I can think of.
Add one of each item to the array. So with your example, your initial array is [3,5,6]. This guarantees that every element is represented at least once.
Then, successively pick an element at random, and add it to the array. Do this n-3 times. (i.e. fill the array with randomly selected items from the list of elements)
Shuffle the array.
This takes O(n) to fill the array, and O(n) to shuffle it.

Let's assume you have a
std::vector<int> input;
that contains the k elements you need to spread and
std::vector<int> output;
that will be filled with n elements.
I used the following approach for a similiar problem. (Edit: Thinking about it, here is a simpler and probably faster version than the original)
First we satisfy the condition that every item from input must occurr at least once in output. Therefore we put every element from input once into output.
output.resize(n); // fill with n 0's
std::copy(input.begin(), input.end(), output.begin()); // fill k first items
Now we can fill up the remaining n - k slots with random elements from input:
std::random_device rd;
std::mt19937 rand(rd()); // get seed from random device
std::uniform_int_distribution<> dist(0, k - 1); // for random numbers in [0, k-1]
for(size_t i = k; i < n; i++) {
output[i] = input[dist(rand)];
}
At the end shuffle the whole thing, to randomize the position of the first k elements:
std::random_shuffle(output.begin(), output.end(), rand);
I hope this is what you wanted.

You can try just randomly put values to ur n-collection, then verify if it contains all k-collection values if not try again. However it's not always fast xd u can also put missing values in a random place of n-collection, but remember to verify again.

Simply make an array of the k elements, say {3,5,6} in the given example. Make a variable counter, which is zero initially. If you want to spread it over n elements, simply iterate over n elements of array with the counter incrementing as
counter=(counter+1)%k;

Related

Random generation algorithm in C++

Suppose you need to generate a random permutation of the first N integers. For example, {4, 3, 1, 5, 2} and {3, 1, 4, 2, 5} are legal permutations, but {5, 4, 1, 2, 1} is not, because one number (1) is duplicated and another (3) is missing. This routine is often used in simulation of algorithms. We assume the existence of a random number generator, RandInt(i,j), that generates between i and j with equal probability. Here is the algorithm:
Fill the array A from A[0] to A[N-1] as follows: To fill A[i], generate random numbers until you get one that is not already in A[0], A[1],…, A[i-1].
Implement this algorithm in C++ and find the complexity. This is my code:
int a;
bool b = false;
A[0] = RandInt(1,n);
for (int i=1;i<n;i++) {
do {
b = false;
a = RandInt(1,n);
for (int j=0;j<i;j++)
if(A[j] == a)
b = true;
} while(b);
A[i] = a;
}
Is this code correct? And how can I find the complexity of the algorithm? Since, RandInt(i,j) generates random numbers, I don't know how many times the do while loop will be repeated.
This algorithm will produce correct results, selecting a permutation uniformly at random from all possible permutations.
The running time is not bounded above by any deterministic function since, as you point out, it could run literally forever. In the best case, this algorithm runs in O(n^2) and selects a random permutation without having to repeat any selection. On average, you'd expect to have to try n/n=1 time to get the first unique random, n/(n-1) times to get the second, and so on down to an expected value of n/1=n times to get the last one. Adding those together gives you n*H(n), where H(n) is the nth harmonic number. It turns out H(N) is Theta(log n) so this algorithm is O(n^2 log n) in the average case.
There is a better way to do what you're trying to do: you can start with any permutation and shuffle it into another one using an algorithm that is O(n) in the worst case. The algorithm is the Fisher-Yates algorithm and works as follows:
FisherYates(array[1...n])
1. if n == 1 then return
2. r = random(2, n)
3. temp = array[1]
4. array[1] = array[r]
5. array[r] = temp
6. FisherYates(array[2...n])
This is a recursive formulation but an iterative one is straightforward. It calls random exactly n times, where n is the size of the array at the topmost invocation.

Algorithm that can create all combinations and all groups of those combinations

Let's say I have a set of elements S = { 1, 2, 3, 4, 5, 6, 7, 8, 9 }
I would like to create combinations of 3 and group them in a way such that no number appears in more than one combination.
Here is an example:
{ {3, 7, 9}, {1, 2, 4}, {5, 6, 8} }
The order of the numbers in the groups does not matter, nor does the order of the groups in the entire example.
In short, I want every possible group combination from every possible combination in the original set, excluding the ones that have a number appearing in multiple groups.
My question: is this actually feasible in terms of run time and memory? My sample sizes could be somewhere around 30-50 numbers.
If so, what is the best way to create this algorithm? Would it be best to create all possible combinations, and choose the groups only if the number hasn't already appeared?
I'm writing this in Qt 5.6, which is a C++ based framework.
You can do this recursively, and avoid duplicates, if you keep the first element fixed in each recursion, and only make groups of 3 with the values in order, eg:
{1,2,3,4,5,6,7,8,9}
Put the lowest element in the first spot (a), and keep it there:
{a,b,c} = {1, *, *}
For the second spot (b), iterate over every value from the second-lowest to the second-highest:
{a,b,c} = {1, 2~8, *}
For the third spot (c), iterate over every value higher than the second value:
{1, 2~8, b+1~9}
Then recurse with the rest of the values.
{1,2,3} {4,5,6} {7,8,9}
{1,2,3} {4,5,7} {6,8,9}
{1,2,3} {4,5,8} {6,7,9}
{1,2,3} {4,5,9} {6,7,8}
{1,2,3} {4,6,7} {5,8,9}
{1,2,3} {4,6,8} {5,7,9}
{1,2,3} {4,6,9} {5,7,8}
{1,2,3} {4,7,8} {5,6,9}
{1,2,3} {4,7,9} {5,6,8}
{1,2,3} {4,8,9} {5,6,7}
{1,2,4} {3,5,6} {7,8,9}
...
{1,8,9} {2,6,7} {3,4,5}
Wen I say "in order", that doesn't have to be any specific order (numerical, alphabetical...), it can just be the original order of the input. You can avoid having to re-sort the input of each recursion if you make sure to pass the rest of the values on to the next recursion in the order you received them.
A run-through of the recursion:
Let's say you get the input {1,2,3,4,5,6,7,8,9}. As the first element in the group, you take the first element from the input, and for the other two elements, you iterate over the other values:
{1,2,3}
{1,2,4}
{1,2,5}
{1,2,6}
{1,2,7}
{1,2,8}
{1,2,9}
{1,3,4}
{1,3,5}
{1,3,6}
...
{1,8,9}
making sure the third element always comes after the second element, to avoid duplicates like:
{1,3,5} &lrarr; {1,5,3}
Now, let's say that at a certain point, you've selected this as the first group:
{1,3,7}
You then pass the rest of the values onto the next recursion:
{2,4,5,6,8,9}
In this recursion, you apply the same rules as for the first group: take the first element as the first element in the group and keep it there, and iterate over the other values for the second and third element:
{2,4,5}
{2,4,6}
{2,4,8}
{2,4,9}
{2,5,6}
{2,5,8}
{2,5,9}
{2,6,7}
...
{2,8,9}
Now, let's say that at a certain point, you've selected this as the second group:
{2,5,6}
You then pass the rest of the values onto the next recursion:
{4,8,9}
And since this is the last group, there is only one possibility, and so this particular recursion would end in the combination:
{1,3,7} {2,5,6} {4,8,9}
As you see, you don't have to sort the values at any point, as long as you pass them onto the next recursion in the order you recevied them. So if you receive e.g.:
{q,w,e,r,t,y,u,i,o}
and you select from this the group:
{q,r,u}
then you should pass on:
{w,e,t,y,i,o}
Here's a JavaScript snippet which demonstrates the method; it returns a 3D array with combinations of groups of elements.
(The filter function creates a copy of the input array, with elements 0, i and j removed.)
function clone2D(array) {
var clone = [];
for (var i = 0; i < array.length; i++) clone.push(array[i].slice());
return clone;
}
function groupThree(input) {
var result = [], combination = [];
group(input, 0);
return result;
function group(input, step) {
combination[step] = [input[0]];
for (var i = 1; i < input.length - 1; i++) {
combination[step][1] = input[i];
for (var j = i + 1; j < input.length; j++) {
combination[step][2] = input[j];
if (input.length > 3) {
var rest = input.filter(function(elem, index) {
return index && index != i && index != j;
});
group(rest, step + 1);
}
else result.push(clone2D(combination));
}
}
}
}
var result = groupThree([1,2,3,4,5,6,7,8,9]);
for (var r in result) document.write(JSON.stringify(result[r]) + "<br>");
For n things taken 3 at a time, you could use 3 nested loops:
for(k = 0; k < n-2; k++){
for(j = k+1; j < n-1; j++){
for(i = j+1; i < n ; i++){
... S[k] ... S[j] ... S[i]
}
}
}
For a generic solution of n things taken k at a time, you could use an array of k counters.
I think You can solve it by using coin change problem with dynamic programming, just assume You are looking for change of 3 and every index in array is a coin value 1, then just output coins(values in Your array) that has been found.
Link: https://www.youtube.com/watch?v=18NVyOI_690

How to increase value of a k consecutive elements in an vector in c++?

Suppose we have an vector in c++ of size 8 with elements {0, 1, 1, 0, 0, 0, 1, 1} and i want to increase the size of a specific portion of vector by one, for example, lets say the portion of vector which needs to be increase by 1 is 0 to 5, then our final result is {1, 2, 2, 1, 1, 0, 0, 1, 1}.
Is it possible to do this in constant time using standard method of vectors (like we a memset in c), without running any loop?
No... and by the way with memset you don't have a guaranteed constant-time operation either (in most implementation is just very fast but still linear in the number of elements).
If you need to do this kind of operation (addition/subtraction of a constant over a range) on a very huge vector a lot of times and you need to get the final result then you can get O(1) per update using a different algorithm:
Step 1: convert the data to its "derivative"
This mean replacing each element with the difference from previous one.
// O(n) on the size of the vector, but done only once
for (int n=v.size()-1; i>0; i--) {
v[i] -= v[i-1];
}
Step 2: do all the interval operations (each in constant time)
With this representation adding a constant to a range simply means adding it to the first element and subtracting it from the element past the ending one. In code:
// intervals contains structures with start/stop/value fields
// Operation is O(n) on the **number of intervals**, and does
// not depend on the size of them
for (auto r : intervals) {
v[r.start] += r.value;
v[r.stop+1] -= r.value;
}
Step 3: Collect the results
Finally you just need to un-do the initial processing, getting back to the normal values on each cell by integrating. In code:
// O(n) on the size of vector, but done only once
for (int i=1,n=v.size(); i<n; i++) {
v[i] += v[i-1];
}
Note that both step 1 and 3 (derivation and integration) can be done in parallel on N cores with perfect efficiency if the size is large enough, even if how this is possible may be not obvious at a first sight (it wasn't for me, at least).

Longest Incresing Subsequence using std::set in c++

I have found a code for LIS in a book, I am not quite able to work out the proof for correctness . Can some one help me out with that. All the code is doing is deleting the element next to new inserted element in the set if the new element is not the max else just inserting the new element.
set<int> s;
set<int>::iterator it;
for(int i=0;i<n;i++)
{
s.insert(arr[i]);
it=s.find(arr[i]);
it++;
if(it!=s.end())
s.erase(it);
}
cout<<s.size()<<endl;
n is the size of sequence and arr is the sequence. I dont think the following code will work if we dont have to find "strictly" increasing sequences . Can we modify the code to find increasing sequences in which equality is allowed.
EDIT: the algorithm works only when the input are distinct.
There are several solutions to LIS.
The most typical is O(N^2) algorithm using dynamic programming, where for every index i you calculate "longest increasing sequence ending at index i".
You can speed this up to O(N log N) using clever data structures or binary search.
Your code bypasses this and only calculated the length of the LIS.
Consider input "1 3 4 5 6 7 2", the contents of the set at the end will be "1 2 4 5 6 7", which is not the LIS, but the length is correct.
Proof should go using induction as follows:
After i-th iteration the j-th smallest element is the smallest possible end of increasing sequence of the length j in the first i elements of the array.
Consider input "1 3 2". After second iteration we have set "1 3", so 1 is smallest possible end of increasing sequence of length 1 and 3 is smallest possible end of increasing sequence of length 2.
After third iteration we have set "1 2", where now the 2 is smallest possible end of increasing sequence of length 2.
I hope you can do induction step by yourself :)
The proof is relatively straightforward: consider set s as a sorted list. We can prove it with a loop invariant. After each iteration of the algorithm, s[k] contains the smallest element of arr that ends an ascending subsequence of length k in the sub-array from zero to the last element of arr that we have considered so far. We can prove this by induction:
After the first iteration, this statement is true, because s will contain exactly one element, which is a trivial ascending sequence of one element.
Each iteration can change the set in one of two ways: it could expand it by one in cases when arr[i] is the largest element found so far, or replace an existing element with arr[i], which is smaller than the element that has been there before.
When an extension of the set occurs, it happens because the current element arr[i] can be appended to the current LIS. When a replacement happens at position k, the index of arr[i], it happens because arr[i] ends an ascending subsequence of length k, and is smaller than or is equal to the old s[i] that used to end the previous "best" ascending subsequence of length k.
With this invariant in hand, it's easy to see that s contains as many elements as the longest ascending subsequence of arr after the entire arr has been exhausted.
The code is a O(nlogn) solution for LIS, but you want to find the non-strictly increasing sequence, the implementation has a problem because the std::set doesn't allow duplicate element. Here is the code that works.
#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int main()
{
int arr[] = {4, 4, 5, 7, 6};
int n = 5;
multiset<int> s;
multiset<int>::iterator it;
for(int i=0;i<n;i++)
{
s.insert(arr[i]);
it = upper_bound(s.begin(), s.end(), arr[i]);
if(it!=s.end())
s.erase(it);
}
cout<<s.size()<<endl;
return 0;
}
Problem Statement:
For A(n) :a0, a1,….an-1 we need to find LIS
Find all elements in A(n) such that, ai<aj and i<j.
For example: 10, 11, 12, 9, 8, 7, 5, 6
LIS will be 10,11,12
This is O(N^2) solution based on DP.
1 Finding SubProblems
Consider D(i): LIS of (a0 to ai) that includes ai as a part of LIS.
2 Recurrence Relation
D(i) = 1 + max(D(j) for all j<i) if ai > aj
3 Base Case
D(0) = 1;
Check out link for the code:
https://innosamcodes.wordpress.com/2013/07/06/longest-increasing-subsequence/

Pick a unique random subset from a set of unique values

C++. Visual Studio 2010.
I have a std::vector V of N unique elements (heavy structs). How can efficiently pick M random, unique, elements from it?
E.g. V contains 10 elements: { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } and I pick three...
4, 0, 9
0, 7, 8
But NOT this: 0, 5, 5 <--- not unique!
STL is preferred. So, something like this?
std::minstd_rand gen; // linear congruential engine??
std::uniform_int<int> unif(0, v.size() - 1);
gen.seed((unsigned int)time(NULL));
// ...?
// Or is there a good solution using std::random_shuffle for heavy objects?
Create a random permutation of the range 0, 1, ..., N - 1 and pick the first M of them; use those as indices into your original vector.
A random permutation is easily made with the standard library by using std::iota together with std::random_shuffle:
std::vector<Heavy> v; // given
std::vector<unsigned int> indices(V.size());
std::iota(indices.begin(), indices.end(), 0);
std::random_shuffle(indices.begin(), indices.end());
// use V[indices[0]], V[indices[1]], ..., V[indices[M-1]]
You can supply random_shuffle with a random number generator of your choice; check the docu­men­tation for details.
Most of the time, the method provided by Kerrek is sufficient. But if N is very large, and M is orders of magnitude smaller, the following method may be preferred.
Create a set of unsigned integers, and add random numbers to it in the range [0,N-1] until the size of the set is M. Then use the elements at those indexes.
std::set<unsigned int> indices;
while (indices.size() < M)
indices.insert(RandInt(0,N-1));
Since you wanted it to be efficient, I think you can get an amortised O(M), assuming you have to perform that operation a lot of times. However, this approach is not reentrant.
First of all create a local (i.e. static) vector of std::vector<...>::size_type (i.e. unsigned will do) values.
If you enter your function, resize the vector to match N and fill it with values from the old size to N-1:
static std::vector<unsigned> indices;
if (indices.size() < N) {
indices.reserve(N);
for (unsigned i = indices.size(); i < N; i++) {
indices.push_back(i);
}
}
Then, randomly pick M unique numbers from that vector:
std::vector<unsigned> result;
result.reserver(M);
for (unsigned i = 0; i < M; i++) {
unsigned const r = getRandomNumber(0,N-i); // random number < N-i
result.push_back(indices[r]);
indices[r] = indices[N-i-1];
indices[N-i-1] = r;
}
Now, your result is sitting in the result vector.
However, you still have to repair your changes to indices for the next run, so that indices is monotonic again:
for (unsigned i = N-M; i < N; i++) {
// restore previously changed values
indices[indices[i]] = indices[i];
indices[i] = i;
}
But this approach is only useful, if you have to run that algorithm a lot and N doesn't grow so big that you cannot live with indices eating up RAM all the the time.