How to get random and unique values from a vector? [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Unique random numbers in O(1)?
Unique random numbers in an integer array in the C programming language
I have a std::vector of unique elements of some undetermined size. I want to fetch 20 unique and random elements from this vector. By 'unique' I mean that I do not want to fetch the same index more than once. Currently the way I do this is to call std::random_shuffle. But this requires me to shuffle the entire vector (which may contain over 1000 elements). I don't mind mutating the vector (I prefer not to though, as I won't need to use thread locks), but most important is that I want this to be efficient. I shouldn't be shuffling more than I need to.
Note that I've looked into passing in a partial range to std::random_shuffle but it will only ever shuffle that subset of elements, which would mean that the elements outside of that range never get used!
Help is appreciated. Thank you!
Note: I'm using Visual Studio 2005, so I do not have access to C++11 features and libraries.

You can use Fisher Yates http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
The Fisher–Yates shuffle (named after Ronald Fisher and Frank Yates), also known as the Knuth shuffle (after Donald Knuth), is an algorithm for generating a random permutation of a finite set—in plain terms, for randomly shuffling the set. A variant of the Fisher–Yates shuffle, known as Sattolo's algorithm, may be used to generate random cycles of length n instead. Properly implemented, the Fisher–Yates shuffle is unbiased, so that every permutation is equally likely. The modern version of the algorithm is also rather efficient, requiring only time proportional to the number of items being shuffled and no additional storage space.
The basic process of Fisher–Yates shuffling is similar to randomly picking numbered tickets out of a hat, or cards from a deck, one after another until there are no more left. What the specific algorithm provides is a way of doing this numerically in an efficient and rigorous manner that, properly done, guarantees an unbiased result.
I think this pseudocode should work (there is a chance of an off-by-one mistake or something so double check it!):
std::list chosen; // you don't have to use this since the chosen ones will be in the back of the vector
for(int i = 0; i < num; ++i) {
int index = rand_between(0, vec.size() - i - 1);
chosen.push_back(vec[index]);
swap(vec[index], vec[vec.size() - i - 1]);
}

You want a random sample of size m from an n-vector:
Let rand(a) return 0..a-1 uniform
for (int i = 0; i < m; i++)
swap(X[i],X[i+rand(n-i)]);
X[0..m-1] is now a random sample.

Use a loop to put random index numbers into a std::set and stop when the size() reaches 20.
std::set<int> indexes;
std::vector<my_vector::value_type> choices;
int max_index = my_vector.size();
while (indexes.size() < min(20, max_index))
{
int random_index = rand() % max_index;
if (indexes.find(random_index) == indexes.end())
{
choices.push_back(my_vector[random_index]);
indexes.insert(random_index);
}
}
The random number generation is the first thing that popped into my head, feel free to use something better.

#include <iostream>
#include <vector>
#include <algorithm>
template<int N>
struct NIntegers {
int values[N];
};
template<int N, int Max, typename RandomGenerator>
NIntegers<N> MakeNRandomIntegers( RandomGenerator func ) {
NIntegers<N> result;
for(int i = 0; i < N; ++i)
{
result.values[i] = func( Max-i );
}
std::sort(&result.values[0], &result.values[0]+N);
for(int i = 0; i < N; ++i)
{
result.values[i] += i;
}
return result;
};
Use example:
// use a better one:
int BadRandomNumberGenerator(int Max) {
return Max>4?4:Max/2;
}
int main() {
NIntegers<100> result = MakeNRandomIntegers<100, 500>( BadRandomNumberGenerator );
for (int i = 0; i < 100; ++i) {
std::cout << i << ":" << result.values[i] << "\n";
}
}
make each number 1 smaller in max than the last. Sort them, then bump up each value by the number of integers before it.
template stuff is just trade dress.

Related

How to choose a random number excluding those which were previously chosen? [duplicate]

I'd like to make a number generator that does not repeat the number it has given out
already (C++).
All I know is:
int randomgenerator(){
int random;
srand(time(0));
random = rand()%11;
return(random);
} // Added this on edition
That function gives me redundant numbers.
I'm trying to create a questionnaire program that gives out 10 questions in a random order and I don't want any of the questions to reappear.
Does anyone know the syntax?
What I would do:
Generate a vector of length N and fill it with values 1,2,...N.
Use std::random_shuffle.
If you have say 30 elements and only want 10, use the first 10 out the vector.
EDIT: I have no idea how the questions are being stored, so.. :)
I am assuming the questions are being stored in a vector or somesuch with random access. Now I have generated 10 random numbers which don't repeat: 7, 4, 12, 17, 1, 13, 9, 2, 3, 10.
I would use those as indices for the vector of questions:
std::vector<std::string> questions;
//fill with questions
for(int i = 0; i < number_of_questions; i++)
{
send_question_and_get_answer(questions[i]);
}
You are trying to solve the problem "the wrong way".
Try this instead (supposing you have a vector<int> with question ids, but the same idea will work with whatever you have):
Get a random R from 0 to N-1 where N is the number of questions in the container
Add question R to another collection of "selected" questions
If the "selected questions" collection has enough items, you 're done
Remove question R from your original container (now N has decreased by 1)
Go to 1
Sounds like you essentially want to shuffle a deck of cards (in this case, the "cards" being the questions, or question numbers).
In C++, I would do:
#include <vector>
#include <algorithms>
std::vector<int> question_numbers;
for (unsigned int i = 0; i < 10; ++i)
question_numbers.push_back(i+1);
std::random_shuffle(question_numbers.begin(), question_numbers.end());
// now dole out the questions based on the shuffled numbers
You do not have to hand out all of the questions, any more than you have to deal out a whole deck of cards every time you play a game. You can, of course, but there's no such requirement.
Create a vector of 10 elements (numbers 1-10), then shuffle it, with std::random_shuffle. Then just iterate through it.
Should look more like this: (Note: does not solve your original problem).
int randomgenerator(){
int random;
// I know this looks re-dunand compared to %11
// But the bottom bits of rand() are less random than the top
// bits do you get a better distribution like this.
random = rand() / (RAND_MAX / 11);
return random;
}
int main()
{
// srand() goes here.
srand(time(0));
while(true)
{
std::cout << randomgenerator() << "\n";
}
}
A better way to solve the original problem is to pre-generate the numbers so you know that each number will appear only once. Then shuffle the order randomly.
int main()
{
int data[] = { 0,1,2,3,4,5,6,7,8,9,10,11};
int size = sizeof(data)/sizeof(data[0]);
std::random_shuffle(data, data + size);
for(int loop = 0; loop < size; ++loop)
{
std::cout << data[loop] << "\n";
}
}
Why not use some STL to perform the checks for you? The idea:
Create an (initially empty) set of 10 integers that will be the indices of the random questions (they will be distinct as a set forbids duplicate items). Keep pushing random numbers in [0, num_of_questions-1] in there until it grows to a size of 10 (duplicates will get rejected automatically). When you have that set ready, iterate over it and output the questions of the corresponding indexes:
std::vector<std::string> questions = /* I am assuming questions are stored in here */
std::set<int> random_indexes;
/* loop here until you get 10 distinct integers */
while (random_indexes.size() < 10) random_indexes.insert(rand() % questions.size());
for (auto index: random_indexes){
std::cout << questions[index] <<std::endl;
}
I may be missing something, but it seems to me the answers that use shuffling of either questions or indexes perform more computations or use an unnecessary memory overhead.
//non repeating random number generator
for (int derepeater = 0; derepeater < arraySize; derepeater++)
{
for (int j = 0; j < arraySize; j++)
{
for (int i = arraySize; i > 0; i--)
{
if (Compare[j] == Compare[i] && j != i)
{
Compare[j] = rand() % upperlimit + 1;
}
}
}
}

How do you: fill & sort the newly-filled vector?

I want to fill a vector with random elements that appear 2 or more times besides one, then sort the said vector.
To try and explain what I meant by this question, I am going to leave you with an example of this type of vector:
vector<int> myVec = {1, 1, 4, 4, 8, 8, 11, 13, 13}
Fill it with random elements (1, 4, 8, 11, 13 for example) seem pretty random
Make every element besides one appear two times (so see how there's only a single "iteration" of 11)
Sort it from the smallest number to the biggest
I've already managed to do step 3 in this way:
sort(myVec.begin(), myVec.end());
for(int i = 0; i < 9; ++i) {
printf("%d", myVec[i]);
}
How would you do step 1 & 2? Some sort of myVec.insert or myVec.push_back trickery that I can't think of or is there a completely different way?
I was originally thinking about myVec.push_back & two for loops (int i = 0; i < nr of elements; ++i) and another loop inside of that (int k = 0; k <= i; ++k) but I must've messed something up (I think that way I would've been able to have the duplicate part done, not sure).
Take an empty vector.
fill it(push_back) with random numbers(see random function online)
now take a for loop and except the last one push_back remaining existing
elements in the vector
so now you can sort it.
Since you want to generate the values first, we can be a bit more efficient and use insertion-sort instead of sorting at the end.
#include <algorithm>
#include <random>
#include <vector>
// Constant to make the code flexible. Doesn't need to be constexpr.
constexpr int num_values = 10;
// First, create the source of randomness.
std::random_device rand_device;
// Then, build an engine for generating the random values.
std::mt19937 mersenne_engine{rand_device()};
// Finally, specify the distribution of values to generate.
std::uniform_int_distribution<int> value_dist{1, 50};
// Now we're finally ready to fill the vector!
std::vector<int> myVec;
// Reserve the space required for all of the values.
const int capacity = (num_values * 2) - 1;
// NOTE: Actual capacity not guaranteed to be equal, might be greater.
myVec.reserve(capacity);
// Pick the random unique value to place into the vector.
myVec.push_back(value_dist(mersenne_engine));
// Loop until enough values are generated.
while (myVec.size() < capacity) {
// Choose a random value.
const int value = value_dist(mersenne_engine);
// Find the insertion position of the new value.
const auto it = std::lower_bound(myVec.begin(), myVec.end(), value);
// Make sure the value doesn't exist yet.
if (it == myVec.end() || *it != value) {
// Then insert it twice.
myVec.insert(it, value);
myVec.insert(it, value);
}
}
Demo
Note that this strategy will loop infinitely if the value distribution is smaller than the number of elements you're looking to insert. Hopefully, the code is clear enough for you to make changes to handle that situation.

STL algorithms for pairwise comparison and tracking max/longest sequence

Consider this fairly easy algorithmic problem:
Given an array of (unsorted) numbers, find the length of the longest sequence of adjacent numbers that are increasing. For example, if we have {1,4,2,3,5}, we expect the result to be 3 since {2,3,5} gives the longest increasing sequence of adjacent/contiguous elements. Note that for non-empty arrays, such as {4,3,2,1}, the minimum result will be 1.
This works:
#include <algorithm>
#include <iostream>
#include <vector>
template <typename T, typename S>
T max_adjacent_length(const std::vector<S> &nums) {
if (nums.size() == 0) {
return 0;
}
T maxLength = 1;
T currLength = 1;
for (size_t i = 0; i < nums.size() - 1; i++) {
if (nums[i + 1] > nums[i]) {
currLength++;
} else {
currLength = 1;
}
maxLength = std::max(maxLength, currLength);
}
return maxLength;
}
int main() {
std::vector<double> nums = {1.2, 4.5, 3.1, 2.7, 5.3};
std::vector<int> ints = {4, 3, 2, 1};
std::cout << max_adjacent_length<int, double>(nums) << "\n"; // 2
std::cout << max_adjacent_length<int, int>(ints) << "\n"; // 1
return 0;
}
As an exercise for myself, I was wondering if there is/are STL algorithm(s) that achieve the same effect, thereby (ideally) avoiding the raw for-loop I have. The motivation behind doing this is to learn more about STL algorithms, and practice using abstracted algorithms to make my code more general and reusable.
Here are my ideas, but they don't quite achieve what I'd like.
std::adjacent_find achieves the pairwise comparisons and can be used to find the index of a non-increasing pair, but doesn't easily facilitate the ability to keep a current and maximum length and compare the two. It could be possible to have those state variables as part of my predicate function, but that seems a bit wrong since ideally you'd like your predicate function to not have any side effects, right?
std::adjacent_difference is interesting. One could use it to construct a vector of the differences between adjacent numbers. Then, starting from the second element, depending on if the difference is positive or negative, we could again track the maximum number of consecutive positive differences seen. This is actually quite close to achieving what we'd like. See the example code below:
#include <numeric>
#include <vector>
template <typename T, typename S> T max_adjacent_length(std::vector<S> &nums) {
if (nums.size() == 0) {
return 0;
}
std::adjacent_difference(nums.begin(), nums.end(), nums.begin());
nums.erase(std::begin(nums)); // keep only differences
T maxLength = 1, currLength = 1;
for (auto n : nums) {
currLength = n > 0 ? (currLength + 1) : 1;
maxLength = std::max(maxLength, currLength);
}
return maxLength;
}
The problem here is that we lose out the const-ness of nums if we want to compute the difference, or we have to sacrifice space and create a copy of nums, which is a no-no given the original solution is O(1) space complexity already.
Is there an idea/solution that I have overlooked that achieves what I want in a succinct and readable manner?
In both your code snippets, you are iterating through a range (in the first version, with an index-based-loop, and in the second with a range-for loop). This is not really the kind of code you should be writing if you want to use the standard algorithms, which work with iterators into the range. Instead of thinking of a range as a collection of elements, if you start thinking in terms of pairs of iterators, choosing the right algorithms becomes easier.
For this problem, here's a reasonable way to write this code:
auto max_adjacent_length = [](auto const & v)
{
long max = 0;
auto begin = v.begin();
while (begin != v.end()) {
auto next = std::is_sorted_until(begin, v.end());
max = std::max(std::distance(begin, next), max);
begin = next;
}
return max;
};
Here's a demo.
Note that you were already on the right track in terms of picking a reasonable algorithm. This could be solved with adjacent_find as well, with just a little more work.

Optimize C++ function to generate combinations

I'm trying to get a function to generate all possible combinations of numbers but my problem is the too long elaboration time. So I think I've to optimize it.
Problem: Generate all set of "r" size with 1 to n elements without repeat it in reverse order (1,2 is equal to 2, 1).
Example:
n = 3 //elements: 1,2,3
r = 2 //size of set
Output:
2 3
1 3
1 2
The code I'm using is the following:
void func(int n, int r){
vector <vector <int>> reas;
vector<bool> v(n);
fill(v.end() - r, v.end(), true);
int a = 0;
do {
reas.emplace_back();
for (int i = 0; i < n; ++i) {
if (v[i]) {
reas[a].push_back(i+1);
}
}
a++;
} while (next_permutation(v.begin(), v.end()));
}
If n = 3 and r = 2 the output will be the same of the example upside.
My problem is that if I put n = 50 and r = 5 the elaboration time is too high and I need to work with a range of n = 50...100 and r= 1..5;
Is there a way to optimize this function?
Thank's a lot
Yes, there are several things you can improve significantly. However, you should keep in mind that the number of combinations you are calculating is so large, that it has to be slow if it is to enumerate all subsets. On my machine and with my personal patience budget (100,5) is out of reach.
Given that, here are the things you can improve without completely rewriting your entire algorithm.
First: Cache locality
A vector<vector<T>> will not be contiguous. The nested vector is rather small, so even with preallocation this will always be bad, and iterating over it will be slow because each new sub-vector (and there are a lot) will likely cause a cache miss.
Hence, use a single vector<T>. Your kth subset will then not sit at location k but at k*r. But this is a significant speedup on my machine.
Second: Use a cpu-friendly permutation vector
Your idea to use next_permutation is not bad. But the fact that you use a vector<bool> makes this extremely slow. Paradoxically, using a vector<size_t> is much faster, because it is easier to load a size_t and check it than it is to do the same with a bool.
So, if you take these together the code looks something like this:
auto func2(std::size_t n, std::size_t r){
std::vector<std::size_t> reas;
reas.reserve((1<<r)*n);
std::vector<std::size_t> v(n);
std::fill(v.end() - r, v.end(), 1);
do {
for (std::size_t i = 0; i < n; ++i) {
if (v[i]) {
reas.push_back(i+1);
}
}
} while (std::next_permutation(v.begin(), v.end()));
return reas;
}
Third: Don't press the entire result into one huge buffer
Use a callback to process each sub-set. Thereby you avoid having to return one huge vector. Instead you call a function for each individual sub-set that you found. If you really really need to have one huge set, this callback can still push the sub-sets into a vector, but it can also operate on them in-place.
std::size_t func3(std::size_t n, std::size_t r,
std::function<void(std::vector<std::size_t> const&)> fun){
std::vector<std::size_t> reas;
reas.reserve(r);
std::vector<std::size_t> v(n);
std::fill(v.end() - r, v.end(), 1);
std::size_t num = 0;
do {
reas.clear(); // does not shrink capacity to 0
for (std::size_t i = 0; i < n; ++i) {
if (v[i]) {
reas.push_back(i+1);
}
}
++num;
fun(reas);
} while (std::next_permutation(v.begin(), v.end()));
return num;
}
This yields a speedup of well over 2x in my experiments. But the speedup goes up the more you crank up n and r.
Also: Use compiler optimisation
Use your compiler options to speed up the compilation as much as possible. On my system the jump from -O0 to -O1 is a speedup of well more than 10x. The jump to -O3 from -O1 is much smaller but still there (about x1.1).
Unrelated to performance, but still relevant: Why is "using namespace std;" considered bad practice?

Writing two versions of a function, one for "clarity" and one for "speed"

My professor assigned homework to write a function that takes in an array of integers and sorts all zeros to the end of the array while maintaining the current order of non-zero ints. The constraints are:
Cannot use the STL or other templated containers.
Must have two solutions: one that emphasizes speed and another that emphasizes clarity.
I wrote up this function attempting for speed:
#include <iostream>
#include <cstdio>
#include <cstdlib>
using namespace std;
void sortArray(int array[], int size)
{
int i = 0;
int j = 1;
int n = 0;
for (i = j; i < size;)
{
if (array[i] == 0)
{
n++;
i++;
}
else if (array[i] != 0 && j != i)
{
array[j++] = array[i++];
}
else
{
i++;
n++;
}
}
while (j < size)
{
array[j++] = 0;
}
}
int main()
{
//Example 1
int array[]{20, 0, 0, 3, 14, 0, 5, 11, 0, 0};
int size = sizeof(array) / sizeof(array[0]);
sortArray(array, size);
cout << "Result :\n";
for (int i = 0; i < size; i++)
{
cout << array[i] << " ";
}
cout << endl << "Press any key to exit...";
cin.get();
return 0;
}
It outputs correctly, but;
I don't know what the speed of it actually is, can anyone help me figure out how to calculate that?
I have no idea how to go about writing a function for "clarity"; any ideas?
I my experience, unless you have very complicated algorithm, speed and clarity come together:
void sortArray(int array[], int size)
{
int item;
int dst = 0;
int src = 0;
// collect all non-zero elements
while (src < size) {
if (item = array[src++]) {
array[dst++] = item;
}
}
// fill the rest with zeroes
while (dst < size) {
array[dst++] = 0;
}
}
Speed comes from a good algorithm. Clarity comes from formatting, naming variables and commenting.
Speed as in complexity?
Since you are, and need, to look at all the elements in the array — and as such have a single loop going through the indexes in the range [0, N)—where N denotes the size of the input—your solution is O(N).
Further reading:
Plain English explanation of big O
Determining big O Notation
Regarding clearity
In my honest opinion there shouldn't need to be two alternatives when implementing such functionality as you are presenting. If you rename your variables to more suitable (descriptive) names your current solution should be clear enough to count as both performant and clear.
Your current approach can be written in plain english in a very clear fashion:
pseudo-explanation
set write_index to 0
set number_of_zeroes to 0
For each element in array
If element is 0
increase number_of_zeros by one
otherwise
write element value to position denoted by write_index
increase write_index by one
write number_of_zeroes 0s at the end of array
Having stated the explanation above we can quickly see that sortArray is not a descriptive name for your function, a more suitable name would probably be partition_zeroes or similar.
Adding comments could improve readability, but you current focus should lie in renaming your variables to better express the intent of the code.
(I feel your question is almost off-topic; I am answering it from a Linux perspective; I recommend using Linux to learn C++ programming; you'll adapt my advices to your operating system if you are using something else....)
speed
Regarding speed, you should have two complementary approaches.
The first (somehow "theoretical") is to analyze (i.e. think on) your algorithm and give (with some proof) its asymptotic time complexity.
The second approach (only "practical", and often pragmatical) is to benchmark and profile your program. Don't forget to compile with optimizations enabled (e.g. using g++ -Wall -O2 with GCC). Have a benchmark which runs for more than half of a second (so processes a large amount of data, e.g. several million numbers) and repeat it several times (e.g. using time(1) command on Linux). You could also measure some time inside your program using e.g. <chrono> in C++11, or just clock(3) (if you read a large array from some file, or build a large array of pseudo-random numbers with <random> or with random(3) you certainly want to measure separately the time to read or fill the array with the time to move zeros out of it). See also time(7).
(You need to process a large amount of data - more than a million items, perhaps many millions of them - because computer are very fast; a typical "elementary" operation -a machine instruction- takes less than a nanosecond, and you have lot of uncertainty on a single run, see this)
clarity
Regarding clarity, it is a bit subjective, but you might try to make your code readable and concise. Adding a few good comments could also help.
Be careful about naming: sorting is not exactly what your program is doing (it is more moving zeros than sorting the array)...
I think this is the best - Of course you may wish to use doxygen or some other
// Shift the non-zeros to the front and put zero in the rest of the array
void moveNonZerosTofront(int *list, unsigned int length)
{
unsigned int from = 0, to = 0;
// This will move the non-zeros
for (; from < length; ++from) {
if (list[from] != 0) {
list[to] = list[from];
to++;
}
}
// So the rest of the array needs to be assigned zero (as we found those on the way)
for (; to < length; +=to) {
list[to] = 0;
}
}