Create Array as Vector with 10 Million elements and assign random numbers without duplicates - c++

I try to code myself a Table with random generated Numbers. While that is simple as it is, causing that Vector not having any duplicates isn't as easy as I thought. So far my Code looks like that:
QStringList generatedTable;
srand (QTime::currentTime().msec());
std::vector<int> array(10000000);
for(std::size_t i = 0; i < array.size(); i++){
array[i] = (rand() % 10000000000)+1;
}
It generates numbers just fine, but because I'm generating a large amount of array elements (10 Million), even though I'm using 10 Billion possible numbers, it will create duplicates. I already browsed a bit and found something that seems handy to use, but doesn't work properly in my Program. The code is from another stackoverflow User:
#include<iostream>
#include<algorithm>
#include<functional>
#include<set>
int main()
{
int arr[] = {0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1};
std::set<int> duplicates;
auto it = std::remove_if(std::begin(arr), std::end(arr), [&duplicates](int i) {
return !duplicates.insert(i).second;
});
size_t n = std::distance(std::begin(arr), it);
for (size_t i = 0; i < n; i++)
std::cout << arr[i] << " ";
return 0;
}
It basically moves all the duplicates to the end of the Array, but for some reason does it not work anymore when the array gets bigger. The code will always place the iterator n at 32.768 as long the Array stays above a Million. Under a Million it drops slightly to ~31.000. So while the code is nice it doesn't really help me alot. Does someone have a better option I could use? Since I'm still a Qt and C++ beginner do I not know how to solve that problem properly.

If you want to sample N integers without replacement from the range [low, high) you can write this:
std::vector<int> array(N); // or reserve space for N elements up front
auto gen = std::mt19937{std::random_device{}()};
std::ranges::sample(std::views::iota(low, high),
array.begin(),
N,
gen);
std::ranges::shuffle(array, gen); // only if you want the samples in random order
Here's a demo.
Note that this requires C++20, otherwise the range to be sampled from can't be generated lazily, which would require it to be stored in memory. If you want to write something similar before C++20, you can use the range-v3 library.

The simplest but at the same time most efficient thing is to implement a binary search tree. Generate the random number in your range and check if it's not already there. Note that the operations are performed in a time O(n)

Related

How to uniform spread every k values over a collection of n values with k <= n?

I've a collection of k elements. I need to spread them uniformly random into a collection of n elements, where k <= n.
So for example, with this k-collection (with k = 3):
{ 3, 5, 6 }
and give n = 7, a valid permutation result (with n = 7 elements) could be:
{ 6, 5, 6, 3, 3, 6, 5}
Notice that every item within the k-collection must be used into the permutation.
So this is not a valid result:
{ 6, 3, 6, 3, 3, 6, 6} // it lacks "5"
What's the fast way to accomplish this?
The simplest way I can think of.
Add one of each item to the array. So with your example, your initial array is [3,5,6]. This guarantees that every element is represented at least once.
Then, successively pick an element at random, and add it to the array. Do this n-3 times. (i.e. fill the array with randomly selected items from the list of elements)
Shuffle the array.
This takes O(n) to fill the array, and O(n) to shuffle it.
Let's assume you have a
std::vector<int> input;
that contains the k elements you need to spread and
std::vector<int> output;
that will be filled with n elements.
I used the following approach for a similiar problem. (Edit: Thinking about it, here is a simpler and probably faster version than the original)
First we satisfy the condition that every item from input must occurr at least once in output. Therefore we put every element from input once into output.
output.resize(n); // fill with n 0's
std::copy(input.begin(), input.end(), output.begin()); // fill k first items
Now we can fill up the remaining n - k slots with random elements from input:
std::random_device rd;
std::mt19937 rand(rd()); // get seed from random device
std::uniform_int_distribution<> dist(0, k - 1); // for random numbers in [0, k-1]
for(size_t i = k; i < n; i++) {
output[i] = input[dist(rand)];
}
At the end shuffle the whole thing, to randomize the position of the first k elements:
std::random_shuffle(output.begin(), output.end(), rand);
I hope this is what you wanted.
You can try just randomly put values to ur n-collection, then verify if it contains all k-collection values if not try again. However it's not always fast xd u can also put missing values in a random place of n-collection, but remember to verify again.
Simply make an array of the k elements, say {3,5,6} in the given example. Make a variable counter, which is zero initially. If you want to spread it over n elements, simply iterate over n elements of array with the counter incrementing as
counter=(counter+1)%k;

Performance optimization nested loops

I am implementing a rather complicated code and in one of the critical sections I need to basically consider all the possible strings of numbers following a certain rule. The naive implementation to explain what I do would be such a nested loop implementation:
std::array<int,3> max = { 3, 4, 6};
for(int i = 0; i <= max.at(0); ++i){
for(int j = 0; j <= max.at(1); ++j){
for(int k = 0; k <= max.at(2); ++k){
DoSomething(i, j, k);
}
}
}
Obviously I actually need more nested for and the "max" rule is more complicated but the idea is clear I think.
I implemented this idea using a recursive function approach:
std::array<int,3> max = { 3, 4, 6};
std::array<int,3> index = {0, 0, 0};
int total_depth = 3;
recursive_nested_for(0, index, max, total_depth);
where
void recursive_nested_for(int depth, std::array<int,3>& index,
std::array<int,3>& max, int total_depth)
{
if(depth != total_depth){
for(int i = 0; i <= max.at(depth); ++i){
index.at(depth) = i;
recursive_nested_for(depth+1, index, max, total_depth);
}
}
else
DoSomething(index);
}
In order to save as much as possible I declare all the variable I use global in the actual code.
Since this part of the code takes really long is it possible to do anything to speed it up?
I would also be open to write 24 nested for if necessary to avoid the overhead at least!
I thought that maybe an approach like expressions templates to actually generate at compile time these nested for could be more elegant. But is it possible?
Any suggestion would be greatly appreciated.
Thanks to all.
The recursive_nested_for() is a nice idea. It's a bit inflexible as it is currently written. However, you could use std::vector<int> for the array dimensions and indices, or make it a template to handle any size std::array<>. The compiler might be able to inline all recursive calls if it knows how deep the recursion is, and then it will probably be just as efficient as the three nested for-loops.
Another option is to use a single for loop for incrementing the indices that need incrementing:
void nested_for(std::array<int,3>& index, std::array<int,3>& max)
{
while (index.at(2) < max.at(2)) {
DoSomething(index);
// Increment indices
for (int i = 0; i < 3; ++i) {
if (++index.at(i) >= max.at(i))
index.at(i) = 0;
else
break;
}
}
}
However, you can also consider creating a linear sequence that visits all possible combinations of the iterators i, j, k and so on. For example, with array dimensions {3, 4, 6}, there are 3 * 4 * 6 = 72 possible combinations. So you can have a single counter going from 0 to 72, and then "split" that counter into the three iterator values you need, like so:
for (int c = 0; c < 72; c++) {
int k = c % 6;
int j = (c / 6) % 4;
int i = c / 6 / 4;
DoSomething(i, j, k);
}
You can generalize this to as many dimensions as you want. Of course, the more dimensions you have, the higher the cost of splitting the linear iterator. But if your array dimensions are powers of two, it might be very cheap to do so. Also, it might be that you don't need to split it at all; for example if you are calculating the sum of all elements of a multidimensional array, you don't care about the actual indices i, j, k and so on, you just want to visit all elements once. If the array is layed out linearly in memory, then you just need a linear iterator.
Of course, if you have 24 nested for loops, you'll notice that the product of all the dimension's sizes will become a very large number. If it doesn't fit in a 32 bit integer, your code is going to be very slow. If it doesn't fit into a 64 bit integer anymore, it will never finish.

Improving the time complexity in priority queue in c++

In the code below, I am getting time out for larger vector length, though it is working for smaller length vector.
long priceCalculate(vector < int > a, long k) {
long price = 0;
priority_queue<int>pq(a.begin(),a.end());
while(--k>=0){
int x = pq.top();
price = price + x;
pq.pop();
pq.push(x-1);
}
return price;
}
I have an array of numbers. I have to add the maximum number to price and then decrement that number by 1. Again find the maximum number and so on. I have to repeat this process for k times.
Is there any better data structure than priority queue which has less time complexity?
Below is the code using vector sort:
struct mclass {
public: bool operator()(int x, int y) {
return (x > y);
}
}
compare;
long priceCalculate(vector < int > a, long k) {
long price = 0;
sort(a.begin(), a.end(), compare);
while (--k >= 0) {
if (a[0] > 0) {
price = price + a[0];
a[0] = a[0] - 1;
sort(a.begin(), a.end(), compare);
}
}
return price;
}
But this is also giving timeout on large input length.
The sorting code has two performance problems:
You are resorting the vector<> in every iteration. Even if your sorting algorithm is insertion sort (which would be best in this case), it still needs to touch every position in the vector before it can declare the vector<> sorted.
To make matters worse, you are sorting the values you want to work with to the front of the vector, requiring the subsequent sort() call to shift almost all elements.
Consequently, you can achieve huge speedups by
Reversing the sort order, so that you are only interacting with the end of the vector<>.
Sort only once, then update the vector<> by scanning to the right position from the end, and inserting the new value there.
You can also take a closer look at what your algorithm is doing: It only ever operates on the tail of the vector<> which has constant value, removing entries from it, and reinserting them, decremented by one, in front of it. I think you should be able to significantly simplify your algorithm with that knowledge, leading to even more significant speedups. In the end, you can remove that tail from the vector<> entirely: It's completely described by its length and its value, and all its elements can be manipulated in a single operation. Your algorithm should take no time at all once you are through optimizing it...
For the vector solution you should be able to gain performance by avoiding sort inside the loop.
After
a[0] = a[0] - 1;
you can do something like the (pseudo) code below instead of calling sort:
tmp = 0;
for j = 1 to end-1
{
if a[0] < a[j]
++tmp
else
break
}
swap a[0], a[tmp]
to place the decremented value correctly in the sorted vector, i.e. since the vector is sorted from start, you'll only need to find the first element which is less or equal to the decremented value and swap the element just before with [0]. This should be faster than sort that has to go through the whole vector.
Examples of algorithm
// Vector after decremt
9, 10, 9, 5, 3, 2
^
tmp = 1
// Vector after swap
10, 9, 9, 5, 3, 2
// Vector after decremt
9, 10, 10, 5, 3, 2
^
tmp = 2
// Vector after swap
10, 10, 9, 5, 3, 2
Performance
I compared my approach with the vector example from OP:
k = 1000
vector.size = 10000000
vector filled with random numbers in range 0..9999
compiled with g++ -O3
My approach:
real 0.83
user 0.78
sys 0.05
OPs vector approach
real 119.42
user 119.42
sys 0.04

How to increase value of a k consecutive elements in an vector in c++?

Suppose we have an vector in c++ of size 8 with elements {0, 1, 1, 0, 0, 0, 1, 1} and i want to increase the size of a specific portion of vector by one, for example, lets say the portion of vector which needs to be increase by 1 is 0 to 5, then our final result is {1, 2, 2, 1, 1, 0, 0, 1, 1}.
Is it possible to do this in constant time using standard method of vectors (like we a memset in c), without running any loop?
No... and by the way with memset you don't have a guaranteed constant-time operation either (in most implementation is just very fast but still linear in the number of elements).
If you need to do this kind of operation (addition/subtraction of a constant over a range) on a very huge vector a lot of times and you need to get the final result then you can get O(1) per update using a different algorithm:
Step 1: convert the data to its "derivative"
This mean replacing each element with the difference from previous one.
// O(n) on the size of the vector, but done only once
for (int n=v.size()-1; i>0; i--) {
v[i] -= v[i-1];
}
Step 2: do all the interval operations (each in constant time)
With this representation adding a constant to a range simply means adding it to the first element and subtracting it from the element past the ending one. In code:
// intervals contains structures with start/stop/value fields
// Operation is O(n) on the **number of intervals**, and does
// not depend on the size of them
for (auto r : intervals) {
v[r.start] += r.value;
v[r.stop+1] -= r.value;
}
Step 3: Collect the results
Finally you just need to un-do the initial processing, getting back to the normal values on each cell by integrating. In code:
// O(n) on the size of vector, but done only once
for (int i=1,n=v.size(); i<n; i++) {
v[i] += v[i-1];
}
Note that both step 1 and 3 (derivation and integration) can be done in parallel on N cores with perfect efficiency if the size is large enough, even if how this is possible may be not obvious at a first sight (it wasn't for me, at least).

Pick a unique random subset from a set of unique values

C++. Visual Studio 2010.
I have a std::vector V of N unique elements (heavy structs). How can efficiently pick M random, unique, elements from it?
E.g. V contains 10 elements: { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } and I pick three...
4, 0, 9
0, 7, 8
But NOT this: 0, 5, 5 <--- not unique!
STL is preferred. So, something like this?
std::minstd_rand gen; // linear congruential engine??
std::uniform_int<int> unif(0, v.size() - 1);
gen.seed((unsigned int)time(NULL));
// ...?
// Or is there a good solution using std::random_shuffle for heavy objects?
Create a random permutation of the range 0, 1, ..., N - 1 and pick the first M of them; use those as indices into your original vector.
A random permutation is easily made with the standard library by using std::iota together with std::random_shuffle:
std::vector<Heavy> v; // given
std::vector<unsigned int> indices(V.size());
std::iota(indices.begin(), indices.end(), 0);
std::random_shuffle(indices.begin(), indices.end());
// use V[indices[0]], V[indices[1]], ..., V[indices[M-1]]
You can supply random_shuffle with a random number generator of your choice; check the docuĀ­menĀ­tation for details.
Most of the time, the method provided by Kerrek is sufficient. But if N is very large, and M is orders of magnitude smaller, the following method may be preferred.
Create a set of unsigned integers, and add random numbers to it in the range [0,N-1] until the size of the set is M. Then use the elements at those indexes.
std::set<unsigned int> indices;
while (indices.size() < M)
indices.insert(RandInt(0,N-1));
Since you wanted it to be efficient, I think you can get an amortised O(M), assuming you have to perform that operation a lot of times. However, this approach is not reentrant.
First of all create a local (i.e. static) vector of std::vector<...>::size_type (i.e. unsigned will do) values.
If you enter your function, resize the vector to match N and fill it with values from the old size to N-1:
static std::vector<unsigned> indices;
if (indices.size() < N) {
indices.reserve(N);
for (unsigned i = indices.size(); i < N; i++) {
indices.push_back(i);
}
}
Then, randomly pick M unique numbers from that vector:
std::vector<unsigned> result;
result.reserver(M);
for (unsigned i = 0; i < M; i++) {
unsigned const r = getRandomNumber(0,N-i); // random number < N-i
result.push_back(indices[r]);
indices[r] = indices[N-i-1];
indices[N-i-1] = r;
}
Now, your result is sitting in the result vector.
However, you still have to repair your changes to indices for the next run, so that indices is monotonic again:
for (unsigned i = N-M; i < N; i++) {
// restore previously changed values
indices[indices[i]] = indices[i];
indices[i] = i;
}
But this approach is only useful, if you have to run that algorithm a lot and N doesn't grow so big that you cannot live with indices eating up RAM all the the time.