Sampling Data into two Groups

Sampling Data into two Groups - c++

I am seeking help to make the code below efficient. I not satisfied though it works. There is bug to be fixed (currently irrelevant). I am using < random> header for the first time and stable_partition for first time.
The Problem definition/specification:
I have a population (vector) of numerical data (float values). I want to create two RANDOM samples (2 vectors) based on a user specified percentage. i.e. popu_data = 30%Sample1 + 70%Sample2 - here 30% will be given by the user. I didnt implement as % yet but its trivial.
The Problem in Programming: I am able to create the 30% Sample from the population. The 2nd part of creating another vector (sample2 - 70%) is my problem. The reason being while selecting the 30% data, I have to select the values randomly. I have to keep track of the indexes to remove them. But some how I am not getting an efficient logic than the one I implemented.
My Logic is (NOT happy): In the population data, the values at random indexes are replaced with a unique value (here it is 0.5555). Later I learnt about stable_partition function where individual values of the Population are compared with 0.5555. On false, that data is created as a new Sample2 which complements sample1.
Further to this: How can I make this Generic i.e. a population into N sub-samples of user defined % of population.
Thank you for any help. I tried vector erase, remove, copy etc but it didn't materialize as the current code. I am looking for a better and more efficient logic and stl usage.
#include <random>
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
bool Is05555 (float i){
if ( i > 0.5560 ) return true;
return false;
}
int main()
{
random_device rd;
mt19937 gen(rd());
uniform_real_distribution<> dis(1, 2);
vector<float>randVals;
cout<<"All the Random Values between 1 and 2"<<endl;
for (int n = 0; n < 20; ++n) {
float rnv = dis(gen);
cout<<rnv<<endl;
randVals.push_back(rnv);
}
cout << '\n';
random_device rd2;
mt19937 gen2(rd2());
uniform_int_distribution<int> dist(0,19);
vector<float>sample;
vector<float>sample2;
for (int n = 0; n < 6; ++n) {
float rnv = dist(gen2);
sample.push_back(randVals.at(rnv));
randVals.at(rnv) = 0.5555;
}
cout<<"Random Values between 1 and 2 with 0.5555 a Unique VAlue"<<endl;
for (int n = 0; n < 20; ++n) {
cout<<randVals.at(n)<<" ";
}
cout << '\n';
std::vector<float>::iterator bound;
bound = std::stable_partition (randVals.begin(), randVals.end(), Is05555);
for (std::vector<float>::iterator it=randVals.begin(); it!=bound; ++it)
sample2.push_back(*it);
cout<<sample.size()<<","<<sample2.size()<<endl;
cout<<"Random Values between 1 and 2 Subset of 6 only: "<<endl;
for (int n = 0; n < sample.size(); ++n) {
cout<<sample.at(n)<<" ";
}
cout << '\n';
cout<<"Random Values between 1 and 2 - Remaining: "<<endl;
for (int n = 0; n < sample2.size(); ++n) {
cout<<sample2.at(n)<<" ";
}
cout << '\n';
return 0;
}

Given a requirement for an N% sample, with order irrelevant, it's probably easiest to just do something like:
std::random_shuffle(randVals.begin(), randVals.end());
int num = randVals.size() * percent / 100.0;
auto pos = randVals.begin() + randVals.size() - num;
// get our sample
auto sample1{pos, randVals.end()};
// remove sample from original collection
randVals.erase(pos, randVals.end());
For some types of items in the array, you could improve this by moving items from the original array to the sample array, but for simple types like float or double, that won't accomplish anything.

Related

Algorithm for creating an array of 5 unique integers between 1 and 20 [duplicate]

This question already has answers here:
Unique (non-repeating) random numbers in O(1)?
(22 answers)
Closed 1 year ago.
My goal is creating an array of 5 unique integers between 1 and 20. Is there a better algorithm than what I use below?
It works and I think it has a constant time complexity due to the loops not being dependent on variable inputs, but I want to find out if there is a more efficient, cleaner, or simpler way to write this.
int * getRandom( ) {
static int choices[5] = {};
srand((unsigned)time(NULL));
for (int i = 0; i < 5; i++) {
int generated = 1 + rand() % 20;
for (int j = 0; j < 5; j++){
if(choices[j] == generated){
i--;
}
}
choices[i] = generated;
cout << choices[i] << endl;
}
return choices;
}
Thank you so much for any feedback. I am new to algorithms.

The simplest I can think about is just create array of all 20 numbers, with choices[i] = i+1, shuffle them with std::random_shuffle and take 5 first elements. Might be slower, but hard to introduce bugs, and given small fixed size - might be fine.
BTW, your version has a bug. You execute line choices[i] = generated; even if you find the generated - which might create a copy of generated value. Say, i = 3, generated is equal to element at j = 0, now your decrement i and assign choices[2] - which becomes equal to choices[0].

C++17 code with explanation of why and what.
If you have any questions left don't hesitate to ask, I'm happy to help
#include <iostream>
#include <array>
#include <string>
#include <random>
#include <type_traits>
// container for random numbers.
// by putting the random numbers + generator inside a class
// we get better control over the lifecycle.
// e.g. what gets called when.
// Now we know the generation gets called at constructor time.
class integer_random_numbers
{
public:
// use std::size_t for things used in loops and must be >= 0
integer_random_numbers(std::size_t number, int minimum, int maximum)
{
// initialize the random generator to be trully random
// look at documentation for <random>, it is the C++ way for random numbers
std::mt19937 generator(std::random_device{}());
// make sure all numbers have an equal chance. range is inclusive
std::uniform_int_distribution<int> distribution(minimum, maximum);
// m_values is a std::vector, which is an array of which
// the length be resized at runtime.
for (auto n = 0; n < number; ++n)
{
int new_random_value{};
// generate unique number
do
{
new_random_value = distribution(generator);
} while (std::find(m_values.begin(), m_values.end(), new_random_value) != m_values.end());
m_values.push_back(new_random_value);
}
}
// give the class an array index operator
// so we can use it as an array later
int& operator[](const std::size_t index)
{
// use bounds checking from std::vector
return m_values.at(index);
}
// reutnr the number of numbers we generated
std::size_t size() const noexcept
{
return m_values.size();
}
private:
// use a vector, since we specify the size at runtime.
std::vector<int> m_values;
};
// Create a static instance of the class, this will
// run the constructor only once (at start of program)
static integer_random_numbers my_random_numbers{ 5, 1, 20 };
int main()
{
// And now we can use my_random_numbers as an array
for (auto n = 0; n < my_random_numbers.size(); ++n)
{
std::cout << my_random_numbers[n] << std::endl;
}
}

Generate 5 random numbers from 1 to 16, allowing duplicates
Sort them
Add 1 to the 2nd number, 2 to the 3rd, 3 to 4th, and 4 to the 5th.
The last step transforms the range from [1,16] to [1,20] by remapping the possible sequences with duplicates into sequences with unique integers. [1,2,10,10,16], for example, becomes [1,3,12,13,20]. The transformation is completely bijective, so you never need to discard and resample.

Efficiently use Eigen for repeated sparse matrix assembly in nonlinear finite element code

I am trying to use Eigen to efficiently assemble a Stiffness matrix for non-linear finite element computations.
From my finite element discretization I can exactly extract my sparsity pattern. Therefore I can just use:
mat.reserve(nnz);
mat.setFromTriplets(TripletList.begin(), TripletList.end());
as proposed in http://eigen.tuxfamily.org/dox/group__SparseQuickRefPage.html.
My questions that arise here are:
Due to the non-linear nature I have to refill my matrix very often. Therefore should I store than again all contribution in a triplet and reuse mat.setFromTriplets(...) again and again?
If I reuse mat.setFromTriplets(...) can I somehow exploit the fact that I evaluated my element matrices for the assembly always in the same order and therefore my indices in the triplet never change but only the value. Therefore the "search in memory" can be circumvented since I can maybe store the place where to put it in a new Array?
If mat.coeffRef(i,j) is faster can I maybe exploit the aforementioned fact?
One extra question: (Lower priority) Is it possible to store and assemble efficiently 3 matrices with the same sparsity pattern, i.e. if I have to do it in a loop? For example a matrix wrapper where i have one SparseMatrix to get the matrices as M1=mat[0], M2=mat[1], M3=mat[2], where mat[i] return the first matrix and M1,M2 and M3 are e.g. SparseMatrix<double> M1(1000,1000).-
The general setup is the following (for question 1.-3. only M1 appears):
std::vector< Eigen::Triplet<double> > tripletListA; // triplets differ only in the values and not in the indices
std::vector< Eigen::Triplet<double> > tripletListB;
std::vector< Eigen::Triplet<double> > tripletListC;
SparseMatrix<double> M1(1000,1000);
SparseMatrix<double> M2(1000,1000);
SparseMatrix<double> M3(1000,1000);
//Reserve space in triplets
tripletListA.reserve(nnz);
tripletListB.reserve(nnz);
tripletListC.reserve(nnz);
//Reserve space in matrices
M1.reserve(nnz);
M2.reserve(nnz);
M3.reserve(nnz);
//fill triplet list with zeros
M1.setFromTriplets(tripletListA.begin(), tripletListA.end());
M2.setFromTriplets(tripletListB.begin(), tripletListB.end());
M3.setFromTriplets(tripletListC.begin(), tripletListC.end());
for (int i=0; i<1000; i++) {
//Fill triplets
M1.setFromTriplets(tripletListA.begin(), tripletListA.end()); //or use coeffRef?
M2.setFromTriplets(tripletListB.begin(), tripletListB.end());
M3.setFromTriplets(tripletListC.begin(), tripletListC.end());
//solve
//update
}
Thank you and regards,
Alex
UPDATE:
Thank you for your answers. Initially the order of my access to the nonzeros is quite arbitrary. But since i'm interested in an iterative scheme i think about documenting this random sorting and construct an operator which takes care of this. This operator can be constructed (at least in my mind) from the initially constructed triplet.
SparseMatrix<double> mat(rows,cols);
std::vector<double> valuevector(nnz);
//Initially construction
std::vector< Eigen::Triplet<double> > tripletList;
//naive fill of tripletList
//Sorting of entries and identifying double entries in tripletList from col and row values
//generating from this information operator P
for (int i=0; i<1000; i++)
{
//naive refill of tripletList
valuevector= P*tripletList.value(); //constructing vector in efficient ordering from values of triplets (tripletList.value() call does not makes since for std::vector but i hope it is clear what i have in mind
for (int k=0; k<mat.outerSize(); ++k)
for (SparseMatrix<double>::InnerIterator it(mat,k); it; ++it)
it.valueRef() =valuevector(it);
}
I think about the operator P just as a matrix with ones and zeros at the appropiate places.
The question remains if this is even a more efficient procedure?
UPDATE-2: Benchmark:
I tried to construct my ideas in a code snippet. I first generate a random triplet list. This list is constructed to get a sparsity of 95% and additionally some values in the list are duplicated to mimic dubplicates in the triplet list whic hwrite on the same position in the sparse matrix. These values are then inserted based on different concepts. The first one is the setfromtriplet approach and the second and third tries to exploit the known structure.
The second and third approach documents the ordering of the triplet list. This information is then exploited to directly write the values in the pure mat1.coeffs() vector.
#include <iostream>
#include <Eigen/Sparse>
#include <random>
#include <fstream>
#include <chrono>
using namespace std::chrono;
using namespace Eigen;
using namespace std;
typedef Eigen::Triplet<double> T;
void findDuplicates(vector<pair<int, int> > &dummypair, Ref<VectorXi> multiplicity) {
// Iterate over the vector and store the frequency of each element in map
int pairCount = 0;
pair<int, int> currentPair;
for (int i = 0; i < multiplicity.size(); ++i) {
currentPair = dummypair[pairCount];
while (currentPair == dummypair[pairCount + multiplicity[i]]) {
multiplicity[i]++;
}
pairCount += multiplicity[i];
}
}
typedef Matrix<duration<double, std::milli>, Dynamic, Dynamic> MatrixXtime;
int main() {
//init random generators
std::default_random_engine gen;
std::uniform_real_distribution<double> dist(0.0, 1.0);
int sizesForTest = 5;
int measures = 6;
MatrixXtime timeArray(sizesForTest, measures);
cout << "TripletTime NestetTime LNestedTime " << endl;
for (int m = 0; m < sizesForTest; ++m) {
int rows = pow(10, m + 1);
int cols = rows;
std::uniform_int_distribution<int> distentryrow(0, rows - 1);
std::uniform_int_distribution<int> distentrycol(0, cols - 1);
std::vector<T> tripletList;
SparseMatrix<double> mat1(rows, cols);
// SparseMatrix<double> mat2(rows,cols);
// SparseMatrix<double> mat3(rows,cols);
//generate sparsity pattern of matrix with 10% fill-in
tripletList.emplace_back(3, 0, 15);
for (int i = 0; i < rows; ++i)
for (int j = 0; j < cols; ++j) {
auto value = dist(gen); //generate random number
auto value2 = dist(gen); //generate random number
auto value3 = dist(gen); //generate random number
if (value < 0.05) {
auto rowindex = distentryrow(gen);
auto colindex = distentrycol(gen);
tripletList.emplace_back(rowindex, colindex, value); //if larger than treshold, insert it
//dublicate every third entry to mimic entries which appear more then once
if (value2 < 0.3333333333333333333333)
tripletList.emplace_back(rowindex, colindex, value);
//triple every forth entry to mimic entries which appear more then once
if (value3 < 0.25)
tripletList.emplace_back(rowindex, colindex, value);
}
}
tripletList.emplace_back(3, 0, 9);
int numberOfValues = tripletList.size();
//initially set all matrices from triplet to allocate space and sparsity pattern
mat1.setFromTriplets(tripletList.begin(), tripletList.end());
// mat2.setFromTriplets(tripletList.begin(), tripletList.end());
// mat3.setFromTriplets(tripletList.begin(), tripletList.end());
int nnz = mat1.nonZeros();
//reset all entries back to zero to fill in later
mat1.coeffs().setZero();
// mat2.coeffs().setZero();
// mat3.coeffs().setZero();
//document sorting of entries for repetative insertion
VectorXi internalIndex(numberOfValues);
vector<pair<int, int> > dummypair(numberOfValues);
VectorXd valuelist(numberOfValues);
for (int l = 0; l < numberOfValues; ++l) {
valuelist(l) = tripletList[l].value();
}
//init internalindex and dummy pair
internalIndex = Eigen::VectorXi::LinSpaced(numberOfValues, 0.0, numberOfValues - 1);
for (int i = 0; i < numberOfValues; ++i) {
dummypair[i].first = tripletList[i].col();
dummypair[i].second = tripletList[i].row();
}
auto start = high_resolution_clock::now();
// sort the vector internalIndex based on the dummypair
sort(internalIndex.begin(), internalIndex.end(), [&](int i, int j) {
return dummypair[i].first < dummypair[j].first ||
(dummypair[i].first == dummypair[j].first && dummypair[i].second < dummypair[j].second);
});
auto stop = high_resolution_clock::now();
timeArray(m, 3) = (stop - start) / 1000;
start = high_resolution_clock::now();
sort(dummypair.begin(), dummypair.end());
stop = high_resolution_clock::now();
timeArray(m, 4) = (stop - start) / 1000;
start = high_resolution_clock::now();
VectorXi dublicatecount(nnz);
dublicatecount.setOnes();
findDuplicates(dummypair, dublicatecount);
stop = high_resolution_clock::now();
timeArray(m, 5) = (stop - start) / 1000;
dummypair.clear();
//calculate vector containing all indices of triplet
//therefore vector[k] is the vectorXi containing the entries of triples which should be written at dof k
int indextriplet = 0;
int multiplicity = 0;
vector<VectorXi> listofentires(mat1.nonZeros());
for (int k = 0; k < mat1.nonZeros(); ++k) {
multiplicity = dublicatecount[k];
listofentires[k] = internalIndex.segment(indextriplet, multiplicity);
indextriplet += multiplicity;
}
//========================================
//Here the nonlinear analysis should start and everything beforehand is prepocessing
//Test1 from triplets
start = high_resolution_clock::now();
mat1.setFromTriplets(tripletList.begin(), tripletList.end());
stop = high_resolution_clock::now();
timeArray(m, 0) = (stop - start) / 1000;
mat1.coeffs().setZero();
//Test2 use internalIndex but calculate listofentires on the fly
indextriplet = 0;
start = high_resolution_clock::now();
for (int k = 0; k < mat1.nonZeros(); ++k) {
multiplicity = dublicatecount[k];
mat1.coeffs()[k] += valuelist(internalIndex.segment(indextriplet, multiplicity)).sum();
indextriplet += multiplicity;
}
stop = high_resolution_clock::now();
timeArray(m, 1) = (stop - start) / 1000;
mat1.coeffs().setZero();
//Test3 directly use listofentires
start = high_resolution_clock::now();
for (int k = 0; k < mat1.nonZeros(); ++k)
mat1.coeffs()[k] += valuelist(listofentires[k]).sum();
stop = high_resolution_clock::now();
timeArray(m, 2) = (stop - start) / 1000;
std::ofstream file("test.txt");
if (file.is_open()) {
file << mat1 << '\n';
}
cout << "Size: " << rows << ": ";
for (int n = 0; n < measures; ++n)
cout << timeArray(m, n).count() << " ";
cout << endl;
}
return 0;
}
If i run this example on my i5-6600K 3.5Ghz and 16GB ram i end up with the following results. which are the times in seconds.
Size Triplet Nested LessNested Sort_intIndex Sort_dum_pair findDuplica
10 1e-06 1e-06 2e-06 1e-06 1e-06 1e-06
100 2.8e-05 4e-06 1.4e-05 5e-05 4.2e-05 1e-05
1000 0.003 0.000416 0.001489 0.01012 0.00627 0.000635
10000 0.426 0.093911 0.48912 1.5389 0.780676 0.061881
100000 337.799 99.0801 37.3656 292.397 87.4488 0.79996
The first three columns denote the calculation time of the different approaches and column 4 to 6 denote the times for different preprocessing steps.
For the size of 100000 rowsand coloumns my Ram gets full relatively fast and therefore the last table entry should be taken with care. Here the fastest method changes from 2 to three.
My questions here are is this approach going in the correct direction to improve the efficiency? Is this a complete wrong direction because for example for the case of a size of 10000 an assemble time of 0.48s seems a bit high?
Additionally the preprocessing steps are getting expensive very fast and is there a better way to construct the ordering of the matrix? Finally as last question is the benchmarking done in the correct way?
Thanks for your time,
Alex

Picking 6 random unique numbers

I have a problem trying to get this to work. I am meant to be picking 6 unique numbers between 1 & 49. I have a function doing this correctly but struggling to check the array for the duplicate and replacing.
srand(static_cast<unsigned int>(time(NULL))); // Seeds a random number
int picked[6];
int number,i,j;
const int MAX_NUMBERS = 6;
for (i = 0; i < MAX_NUMBERS; i++)
{
number = numberGen();
for (int j = 0; j < MAX_NUMBERS; j++)
{
if (picked[i] == picked[j])
{
picked[j] = numberGen();
}
}
}
My number generator just creates a random number between 1 & 49 which i think works ok. I have just started on C++ and any help would be great
int numberGen()
{
int number = rand();
int target = (number % 49) + 1;
return target;
}

C++17 sample
C++17 provides an algorithm for exactly this (go figure):
std::sample
template< class PopulationIterator, class SampleIterator,
class Distance, class UniformRandomBitGenerator >
SampleIterator sample( PopulationIterator first, PopulationIterator last,
SampleIterator out, Distance n,
UniformRandomBitGenerator&& g);
(since C++17)
Selects n elements from the sequence [first; last) such that each
possible sample has equal probability of appearance, and writes those
selected elements into the output iterator out. Random numbers are
generated using the random number generator g. [...]
constexpr int min_value = 1;
constexpr int max_value = 49;
constexpr int picked_size = 6;
constexpr int size = max_value - min_value + 1;
// fill array with [min value, max_value] sequence
std::array<int, size> numbers{};
std::iota(numbers.begin(), numbers.end(), min_value);
// select 6 radom
std::array<int, picked_size> picked{};
std::sample(numbers.begin(), numbers.end(), picked.begin(), picked_size,
std::mt19937{std::random_device{}()});
C++11 shuffle
If you can't use C++17 yet then the way to do this is to generate all the numbers in an array, shuffle the array and then pick the first 6 numbers in the array:
// fill array with [min value, max_value] sequence
std::array<int, size> numbers{};
std::iota(numbers.begin(), numbers.end(), min_value);
// shuffle the array
std::random_device rd;
std::mt19937 e{rd()};
std::shuffle(numbers.begin(), numbers.end(), e);
// (optional) copy the picked ones:
std::array<int, picked_size> picked{};
std::copy(numbers.begin(), numbers.begin() + picked_size, picked.begin());
A side note: please use the new C++11 random library. And prefer std::array to bare C arrays. They don't decay to pointers and provide begin, end, size etc. methods.

Let's break this code down.
for (i = 0; i < MAX_NUMBERS; i++)
We're doing a for-loop with 6 iterations.
number = numberGen();
We're generating a new number, and storing it into the variable number. This variable isn't used anywhere else.
for (int j = 0; j < MAX_NUMBERS; j++)
We're looping through the array again...
if (picked[i] == picked[j])
Checking to see if the two values match (fyi, picked[n] == picked[n] will always match)
picked[j] = numberGen();
And assigning a new random number to the existing value if they do match.
A better approach here would be to eliminate a duplicate value if one exists, then assign it to your array. For example:
for (i = 0; i < MAX_NUMBERS; i++)
{
bool isDuplicate = false;
do
{
number = numberGen(); // Generate the number
// Check for duplicates
for (int j = 0; j < MAX_NUMBERS; j++)
{
if (number == picked[j])
{
isDuplicate = true;
break; // Duplicate detected
}
}
}
while (isDuplicate); // equivalent to while(isDuplicate == true)
picked[j] = number;
}
Here, we run a do-while loop. The first iteration of the loop will generate a random number, and checks to see if it's a duplicate already in the array. If it is, it re-runs the loop until a non-duplicate is found. Once the loop breaks, we have a valid, non-duplicate number available, and then we assign it to the array.
There are going to be better solutions available as you progress through your course.

Efficient approach: Limited Fisher–Yates shuffle
For drawing n numbers from a pool of m you need n calls to random for this approach (6 in your case) instead of m-1 (49 in your case) used when simply shuffling the whole array or vector. So the approach shown below is much more efficient than simply shuffling the whole array and does not require any duplicate checking.
random numbers can get really expensive, so I thought it might be a good idea never to generate more random numbers than necessary. Simply running rand() multiple times until a fitting number comes out seems no good idea.
repetitive double check drawing gets especially expensive in the case that nearly all of the available numbers need to be drawn
I wanted to do it stateful, so it doesn´t matter how many numbers of the 49 you actually request
The solution below does not do any duplicate checking and calls rand() exactly n times for n random numbers. A slight modification of your numberGen was necessary therefore. Albeit you really should use the random library functions instead of rand().
The code below draws all numbers, just to verify that everything works fine, but its easy to see how you would draw only 6 numbers :-)
If you need repetitive draws you can simply add a reset() member function that sets drawn = 0 again. The vector is in shuffled state then, but that doesn´t do any harm.
If you can´t afford the range checking in std::vector.at() you can of course easily replace it by the index access operator[]. But I thought for experimenting with the code at() is a better choice and in this way you get error checking for the case that too many numbers are drawn.
Usage:
Create a class instance of n_out_of_m using the constructor which takes as an argument the amount of available numbers.
Call draw() repetitively to draw numbers.
If you call draw() more often then numbers are available the std::vector.at() will throw an out_of_range exception, if you don´t like that you need to add a check for that case.
I hope someone likes this approach.
#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>
size_t numberGen(size_t limit)
{
size_t number = rand();
size_t target = (number % limit) + 1;
return target;
}
class n_out_of_m {
public:
n_out_of_m(int m) {numbers.reserve(m); for(int i=1; i<=m; ++i) numbers.push_back(i);}
int draw();
private:
std::vector<int> numbers;
size_t drawn = 0;
};
int n_out_of_m::draw()
{
size_t index = numberGen(numbers.size()-drawn) - 1;
std::swap(numbers.at(index), numbers.at(numbers.size()-drawn-1));
drawn++;
return numbers.at(numbers.size()-drawn);
};
int main(int argc, const char * argv[]) {
n_out_of_m my_gen(49);
for(int n=0; n<49; ++n)
std::cout << n << "\t" << my_gen.draw() << "\n";
return 0;
}

C++ algorithm optimization: find K combination from N elements

I am pretty noobie with C++ and am trying to do some HackerRank challenges as a way to work on that.
Right now I am trying to solve Angry Children problem: https://www.hackerrank.com/challenges/angry-children
Basically, it asks to create a program that given a set of N integer, finds the smallest possible "unfairness" for a K-length subset of that set. Unfairness is defined as the difference between the max and min of a K-length subset.
The way I'm going about it now is to find all K-length subsets and calculate their unfairness, keeping track of the smallest unfairness.
I wrote the following C++ program that seems to the problem correctly:
#include <cmath>
#include <cstdio>
#include <iostream>
using namespace std;
int unfairness = -1;
int N, K, minc, maxc, ufair;
int *candies, *subset;
void check() {
ufair = 0;
minc = subset[0];
maxc = subset[0];
for (int i = 0; i < K; i++) {
minc = min(minc,subset[i]);
maxc = max(maxc, subset[i]);
}
ufair = maxc - minc;
if (ufair < unfairness || unfairness == -1) {
unfairness = ufair;
}
}
void process(int subsetSize, int nextIndex) {
if (subsetSize == K) {
check();
} else {
for (int j = nextIndex; j < N; j++) {
subset[subsetSize] = candies[j];
process(subsetSize + 1, j + 1);
}
}
}
int main() {
cin >> N >> K;
candies = new int[N];
subset = new int[K];
for (int i = 0; i < N; i++)
cin >> candies[i];
process(0, 0);
cout << unfairness << endl;
return 0;
}
The problem is that HackerRank requires the program to come up with a solution within 3 seconds and that my program takes longer than that to find the solution for 12/16 of the test cases. For example, one of the test cases has N = 50 and K = 8; the program takes 8 seconds to find the solution on my machine. What can I do to optimize my algorithm? I am not very experienced with C++.

All you have to do is to sort all the numbers in ascending order and then get minimal a[i + K - 1] - a[i] for all i from 0 to N - K inclusively.
That is true, because in optimal subset all numbers are located successively in sorted array.

One suggestion I'd give is to sort the integer list before selecting subsets. This will dramatically reduce the number of subsets you need to examine. In fact, you don't even need to create subsets, simply look at the elements at index i (starting at 0) and i+k, and the lowest difference for all elements at i and i+k [in valid bounds] is your answer. So now instead of n choose k subsets (factorial runtime I believe) you just have to look at ~n subsets (linear runtime) and sorting (nlogn) becomes your bottleneck in performance.

Generate uniqe 'random' int values for given interval [duplicate]

This question already has answers here:
Generating m distinct random numbers in the range [0..n-1]
(11 answers)
Closed 8 years ago.
I need to generate 'random' int values that will be used as array indexes so they need to be uniqe for given interval.
LFSR seems to be perfect for this task but theres a catch: either array size ought to have size 2^n (in some cases it forces to allocate much more memory than required one(eg. data size 2100 - array size 4096)) or to skip generated numbers until proper value is found (waste of LFSR capabilities, in some cases generation time of index can be noticeable).
I have tried to create some formula to compute array indexes but I've failed, especially for small (<120) array sizes.
Is there any optimal (in terms of resources and computing time) solution to this problem?
Thanks in advance for answers!

May be it help you.
#include <iostream>
#include <vector>
#include <cstdlib>
#include <cstring>
#include <ctime>
std::vector<int> random_interval_values(int b, int e)
{
int n = e - b;
std::vector<int>result(n);
for(int i = 1; i < n ; ++i)
{
int t = rand() % i; // t - is random value in [0..i)
result[i] = result[t]; // i-th element assigned random index-th value
result[t] =i; // and, random position assigned i value
}
// increment all values to b.
for(int i = 0; i < n; ++i) result[i] += b;
return result;
}
int main()
{
srand( time (NULL )) ;
int interval_begin = 7;
int interval_end = 15;
// [ interval_begin ... interval_end )
std::vector<int> v = random_interval_values( interval_begin, interval_end);
for(int i= 0; i < v.size(); ++i)
std::cout << v[i] << ' ';
std::cout << std::endl;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sampling Data into two Groups - c++

Related

Algorithm for creating an array of 5 unique integers between 1 and 20 [duplicate]

Efficiently use Eigen for repeated sparse matrix assembly in nonlinear finite element code

Picking 6 random unique numbers

C++ algorithm optimization: find K combination from N elements

Generate uniqe 'random' int values for given interval [duplicate]

Categories

Resources