Randomly select index from a STL vector from truth value - c++

I have a vector that looks like:
vector<int> A = {0, 1, 1, 0, 0, 1, 0, 1};
I'd like to select a random index from the non-zero values of A. Using this example A, I want to randomly select an element from the array {1,2,5,7}.
Currently I do this by creating another array
vector<int> b;
for(int i=0;i<A.size();i++)
if(A[i])
b.push_back(i);
Once b is created, I find the index by using this answer:
get random element from container
Is there a more STL-like (or C++11) way of doing this, perhaps one that does not create an intermediate array? In this example A is small, but in my production code this selection process is in an inner-loop and A is non-static and thousands of elements long.

A great way to do this is Reservoir Sampling.
In short, you walk your array until you find the first non-zero value, and record that index as the first possible answer you might return.
Then, you continue to walk the array. Every time you find a non-zero value, you randomly might change which new index is your possible answer, with decreasing probability.
This algorithm also works great if you need M random index values from your array.
What's great about this, is that you walk each element only one time, and you don't need a separate memory structure to record the non-zero elements. It's O(N) in speed, and O(M) in memory, in your case it's O(1) in memory, since you only want 1 random value.
On the flip side, random number generators are traditionally quite slow. So, you might want to performance test this against any other ideas people come up with here, to see if the trade-off of speed-vs-memory is worth it for you.

With a single pass through the array, you can determine how many false (or true) values there are. If you are doing this kind of thing often, you can even write a class to keep track of this for you.
Regardless, you can then pick a random number i between 0 and num_false (or num_true). Then with another pass through the array, you can return the ith false (or true) index.

We can loop through each non-zero value and assign it a random number. The index with the largest random number is the one we select.
int value = 0;
int index = 0;
while(int i = 0; i < A.size(); i++) {
if(!A[i]) continue;
auto j = rand();
if(j > value) {
index = i;
value = j;
}
}

vector<int> A = {0,1,1,0,0,1,0,1};
random_shuffle(A.begin(),A.end());
auto it = find_if(A.begin(),A.end(),[](const int elem){return elem;});

Related

efficiently mask-out exactly 30% of array with 1M entries

My question's header is similar to this link, however that one wasn't answered to my expectations.
I have an array of integers (1 000 000 entries), and need to mask exactly 30% of elements.
My approach is to loop over elements and roll a dice for each one. Doing it in a non-interrupted manner is good for cache coherency.
As soon as I notice that exactly 300 000 of elements were indeed masked, I need to stop. However, I might reach the end of an array and have only 200 000 elements masked, forcing me to loop a second time, maybe even a third, etc.
What's the most efficient way to ensure I won't have to loop a second time, and not being biased towards picking some elements?
Edit:
//I need to preserve the order of elements.
//For instance, I might have:
[12, 14, 1, 24, 5, 8]
//Masking away 30% might give me:
[0, 14, 1, 24, 0, 8]
The result of masking must be the original array, with some elements set to zero
Just do a fisher-yates shuffle but stop at only 300000 iterations. The last 300000 elements will be the randomly chosen ones.
std::size_t size = 1000000;
for(std::size_t i = 0; i < 300000; ++i)
{
std::size_t r = std::rand() % size;
std::swap(array[r], array[size-1]);
--size;
}
I'm using std::rand for brevity. Obviously you want to use something better.
The other way is this:
for(std::size_t i = 0; i < 300000;)
{
std::size_t r = rand() % 1000000;
if(array[r] != 0)
{
array[r] = 0;
++i;
}
}
Which has no bias and does not reorder elements, but is inferior to fisher yates, especially for high percentages.
When I see a massive list, my mind always goes first to divide-and-conquer.
I won't be writing out a fully-fleshed algorithm here, just a skeleton. You seem like you have enough of a clue to take decent idea and run with it. I think I only need to point you in the right direction. With that said...
We'd need an RNG that can return a suitably-distributed value for how many masked values could potentially be below a given cut point in the list. I'll use the halfway point of the list for said cut. Some statistician can probably set you up with the right RNG function. (Anyone?) I don't want to assume it's just uniformly random [0..mask_count), but it might be.
Given that, you might do something like this:
// the magic RNG your stats homework will provide
int random_split_sub_count_lo( int count, int sub_count, int split_point );
void mask_random_sublist( int *list, int list_count, int sub_count )
{
if (list_count > SOME_SMALL_THRESHOLD)
{
int list_count_lo = list_count / 2; // arbitrary
int list_count_hi = list_count - list_count_lo;
int sub_count_lo = random_split_sub_count_lo( list_count, mask_count, list_count_lo );
int sub_count_hi = list_count - sub_count_lo;
mask( list, list_count_lo, sub_count_lo );
mask( list + sub_count_lo, list_count_hi, sub_count_hi );
}
else
{
// insert here some simple/obvious/naive implementation that
// would be ludicrous to use on a massive list due to complexity,
// but which works great on very small lists. I'm assuming you
// can do this part yourself.
}
}
Assuming you can find someone more informed on statistical distributions than I to provide you with a lead on the randomizer you need to split the sublist count, this should give you O(n) performance, with 'n' being the number of masked entries. Also, since the recursion is set up to traverse the actual physical array in constantly-ascending-index order, cache usage should be as optimal as it's gonna get.
Caveat: There may be minor distribution issues due to the discrete nature of the list versus the 30% fraction as you recurse down and down to smaller list sizes. In practice, I suspect this may not matter much, but whatever person this solution is meant for may not be satisfied that the random distribution is truly uniform when viewed under the microscope. YMMV, I guess.
Here's one suggestion. One million bits is only 128K which is not an onerous amount.
So create a bit array with all items initialised to zero. Then randomly select 300,000 of them (accounting for duplicates, of course) and mark those bits as one.
Then you can run through the bit array and, any that are set to one (or zero, if your idea of masking means you want to process the other 700,000), do whatever action you wish to the corresponding entry in the original array.
If you want to ensure there's no possibility of duplicates when randomly selecting them, just trade off space for time by using a Fisher-Yates shuffle.
Construct an collection of all the indices and, for each of the 700,000 you want removed (or 300,000 if, as mentioned, masking means you want to process the other ones) you want selected:
pick one at random from the remaining set.
copy the final element over the one selected.
reduce the set size.
This will leave you with a random subset of indices that you can use to process the integers in the main array.
You want reservoir sampling. Sample code courtesy of Wikipedia:
(*
S has items to sample, R will contain the result
*)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]

Random Permutations

I am having trouble figuring out a decent way of randomly shuffling the elements in an std::vector and, after some operations, restoring the original order. I know that this should be a rather trivial algorithm, but I guess I'm too tired...
Since I am constrained to use a custom random number generator class, I guess I can't use std::random_shuffle, which doesn't help anyway, because I also need to preserve the original order. So, my approach was to create an std::map which serves as a mapping between the original positions and the random ones, like this:
std::map<unsigned int, unsigned int> getRandomPermutation (const unsigned int &numberOfElements)
{
std::map<unsigned int, unsigned int> permutation;
//populate the map
for (unsigned int i = 0; i < numberOfElements; i++)
{
permutation[i] = i;
}
//randomize it
for (unsigned int i = 0; i < numberOfElements; i++)
{
//generate a random number in the interval [0, numberOfElements)
unsigned long randomValue = GetRandomInteger(numberOfElements - 1U);
//broken swap implementation
//permutation[i] = randomValue;
//permutation[randomValue] = i;
//use this instead:
std::swap(permutation[i], permutation[randomValue]);
}
return permutation;
}
I am not sure that the above algorithm is a proper implementation for a random permutation, so any improvements are welcome.
Now, here is how I've managed to make use of this permutation map:
std::vector<BigInteger> doStuff (const std::vector<BigInteger> &input)
{
/// Permute the values in a random order
std::map<unsigned int, unsigned int> permutation = getRandomPermutation(static_cast<unsigned int>(input.size()));
std::vector<BigInteger> temp;
//permute values
for (unsigned int i = 0; i < static_cast<unsigned int>(input.size()); ++i)
{
temp.push_back(input[permutation[i]]);
}
//do all sorts of stuff with temp
/// Reverse the permutation
std::vector<BigInteger> output;
for (unsigned int i = 0; i < static_cast<unsigned int>(input.size()); ++i)
{
output.push_back(temp[permutation[i]]);
}
return output;
}
Something tells me that I should be able to use only one std::vector<BigInteger> for this algorithm, but, right now, I just can't figure out the optimal solution. Honestly, I don't really care about the data in input, so I could even make it non-const, overwrite it, and skip creating a copy of it, but the question is how to implement the algorithm?
If I do something like this, I end up shooting myself in the foot, right? :)
for (unsigned int i = 0; i < static_cast<unsigned int>(input.size()); ++i)
{
BigInteger aux = input[i];
input[i] = input[permutation[i]];
input[permutation[i]] = aux;
}
EDIT: Following Steve's remark about using "Fisher-Yates" shuffle, I changed my getRandomPermutation function accordingly:
std::map<unsigned int, unsigned int> getRandomPermutation (const unsigned int &numberOfElements)
{
std::map<unsigned int, unsigned int> permutation;
//populate the map
for (unsigned int i = 0; i < numberOfElements; i++)
{
permutation[i] = i;
}
//randomize it
for (unsigned int i = numberOfElements - 1; i > 0; --i)
{
//generate a random number in the interval [0, numberOfElements)
unsigned long randomValue = GetRandomInteger(i);
std::swap(permutation[i], permutation[randomValue]);
}
return permutation;
}
If you're "randomising" a vector of n elements, you can create another std::vector<size_t> index(n), set index[x] = x for 0 <= x < n, then shuffle index. Then your lookups take the form: original_vector[index[i]]. The order of the original vector's never changed so no need to restore ordering.
...constrained to use a custom random number generator class, I guess I can't use std::random_shuffle...
Have you noticed this overload?
template <class RandomAccessIterator, class RandomNumberGenerator>
void random_shuffle ( RandomAccessIterator first, RandomAccessIterator last,
RandomNumberGenerator& rand );
For details of how to wrap your random number generator with a compatible object, see http://www.sgi.com/tech/stl/RandomNumberGenerator.html
If you're looking for specific errors in your code:
permutation[i] = randomValue;
permutation[randomValue] = i;
is wrong. Observe that once you're finished, each value does not necessarily appear exactly once among the values of the map. So it's not a permutation, let alone a uniformly-distributed random one.
The proper means to generate a random permutation is what Tony says, use std::random_shuffle on a vector that initially represents the identity permutation. Or if you want to know how a shuffle is properly performed, look up "Fisher-Yates". In general, any approach that makes N random selections uniformly from 0 .. N-1 is doomed to failure, because that means it has N^N possible ways it can run. But there are N! possible permutations of N items, and N^N is generally not divisible by N!. Hence it's impossible for each permutation to be the result of an equal number of random selections, i.e. the distribution is not uniform.
the question is how to implement the algorithm?
So, you have your permutation, and you want to re-order the elements of input in-place, according to that permutation.
The key thing to know is that every permutation is a composition of "cycles". That is to say, if you repeatedly follow the permutation from a given starting point, you come back to where you started (and this path is the cycle to which that starting point belongs). There may be more than one such cycle in a given permutation, and if permutation[i] == i for some i, then the cycle of i has length 1.
The cycles are all disjoint, that is to say each element appears in exactly one cycle. Because cycles don't "interfere" with each other, we can apply a permutation by applying each cycle, and we can do the cycles in any order. So, for each index i we need to:
check whether we've already done i. If so, move on to the next index.
set current = i
swap index[current] with index[permutation[current]]. So index[current] is set to its correct value (the next element in the cycle), and its old value is "pushed" forward along the cycle.
mark current as "done"
if permutuation[current] is i, we've finished the cycle. So the first value of the cycle ends up in the spot formerly occupied by the last element of the cycle, which is right. Move on to the next index.
set current = permutation[current] and go back to the swap step.
Depending on the types involved, you can optimize around the swaps - it may be better to copy/move to a temporary variable and the start of each cycle, then do a copy/move instead of a swap at each step of the cycle, and finally copy/move the temporary to the end of the cycle.
Reversing the process is the same, but using the "inverse" of the permutation. The inverse inv of a permutation perm, is the permutation such that inv[perm[i]] == i for each i. You can either compute the inverse and use the exact code above, or you can use code similar to the above, except move the elements in the opposite direction along each cycle.
An alternative to all that, since you implemented Fisher-Yates yourself -- as you're running Fisher-Yates, for each swap you perform record the two indices swapped in a vector<pair<size_t,size_t>>. Then you don't have to worry about cycles. You can apply the permutation to the vector by applying the same sequence of swaps. You can reverse the permutation by applying the reversed sequence of swaps.
Note that, depending on your application, if it is important that you have a truly uniformly distributed permutation, you cannot use any algorithm that calls a typical pseudo-random number generator more that once.
The reason is that most pseudo-random number generators, such as the one in clib, are Linear congruential. Those have a weakness where they'll generate numbers that cluster in planes - so your permutations will not be perfectly uniformly distributed. Using a higher-quality generator should get around that.
See http://en.wikipedia.org/wiki/Linear_congruential_generator
Alternatively, you could just generate a single random number in the range 0..(n!-1) and pass it to the unrank function for permutations. For small enough n, you can store those and get a constant time algorithm, but if n is too large for that, the best unrank function is O(n). Applying the resulting permutation is going to be O(n) anyway.
Given a ordered sequence of elements a,b,c,d,e you first create a new indexed sequence: X=(0,a),(1,b),(2,c),(3,d),(4,e). Then, you randomly shuffle that sequence and obtain the second element of each pair to get the random sequence. To restore the original sequence you sort the X set incrementally using the first element of each pair.

Given an array of integers, find the first integer that is unique

Given an array of integers, find the first integer that is unique.
my solution: use std::map
put integer (number as key, its index as value) to it one by one (O(n^2 lgn)), if have duplicate, remove the entry from the map (O(lg n)), after putting all numbers into the map, iterate the map and find the key with smallest index O(n).
O(n^2 lgn) because map needs to do sorting.
It is not efficient.
other better solutions?
I believe that the following would be the optimal solution, at least based on time / space complexity:
Step 1:
Store the integers in a hash map, which holds the integer as a key and the count of the number of times it appears as the value. This is generally an O(n) operation and the insertion / updating of elements in the hash table should be constant time, on the average. If an integer is found to appear more than twice, you really don't have to increment the usage count further (if you don't want to).
Step 2:
Perform a second pass over the integers. Look each up in the hash map and the first one with an appearance count of one is the one you were looking for (i.e., the first single appearing integer). This is also O(n), making the entire process O(n).
Some possible optimizations for special cases:
Optimization A: It may be possible to use a simple array instead of a hash table. This guarantees O(1) even in the worst case for counting the number of occurrences of a particular integer as well as the lookup of its appearance count. Also, this enhances real time performance, since the hash algorithm does not need to be executed. There may be a hit due to potentially poorer locality of reference (i.e., a larger sparse table vs. the hash table implementation with a reasonable load factor). However, this would be for very special cases of integer orderings and may be mitigated by the hash table's hash function producing pseudorandom bucket placements based on the incoming integers (i.e., poor locality of reference to begin with).
Each byte in the array would represent the count (up to 255) for the integer represented by the index of that byte. This would only be possible if the difference between the lowest integer and the highest (i.e., the cardinality of the domain of valid integers) was small enough such that this array would fit into memory. The index in the array of a particular integer would be its value minus the smallest integer present in the data set.
For example on modern hardware with a 64-bit OS, it is quite conceivable that a 4GB array can be allocated which can handle the entire domain of 32-bit integers. Even larger arrays are conceivable with sufficient memory.
The smallest and largest integers would have to be known before processing, or another linear pass through the data using the minmax algorithm to find out this information would be required.
Optimization B: You could optimize Optimization A further, by using at most 2 bits per integer (One bit indicates presence and the other indicates multiplicity). This would allow for the representation of four integers per byte, extending the array implementation to handle a larger domain of integers for a given amount of available memory. More bit games could be played here to compress the representation further, but they would only support special cases of data coming in and therefore cannot be recommended for the still mostly general case.
All this for no reason. Just using 2 for-loops & a variable would give you a simple O(n^2) algo.
If you are taking all the trouble of using a hash map, then it might as well be what #Micheal Goldshteyn suggests
UPDATE: I know this question is 1 year old. But was looking through the questions I answered and came across this. Thought there is a better solution than using a hashtable.
When we say unique, we will have a pattern. Eg: [5, 5, 66, 66, 7, 1, 1, 77]. In this lets have moving window of 3. first consider (5,5,66). we can easily estab. that there is duplicate here. So move the window by 1 element so we get (5,66,66). Same here. move to next (66,66,7). Again dups here. next (66,7,1). No dups here! take the middle element as this has to be the first unique in the set. The left element belongs to the dup so could 1. Hence 7 is the first unique element.
space: O(1)
time: O(n) * O(m^2) = O(n) * 9 ≈ O(n)
Inserting to a map is O(log n) not O(n log n) so inserting n keys will be n log n. also its better to use set.
Although it's O(n^2), the following has small coefficients, isn't too bad on the cache, and uses memmem() which is fast.
for(int x=0;x<len-1;x++)
if(memmem(&array[x+1], sizeof(int)*(len-(x+1)), array[x], sizeof(int))==NULL &&
memmem(&array[x+1], sizeof(int)*(x-1), array[x], sizeof(int))==NULL)
return array[x];
public static string firstUnique(int[] input)
{
int size = input.Length;
bool[] dupIndex = new bool[size];
for (int i = 0; i < size; ++i)
{
if (dupIndex[i])
{
continue;
}
else if (i == size - 1)
{
return input[i].ToString();
}
for (int j = i + 1; j < size; ++j)
{
if (input[i]==input[j])
{
dupIndex[j] = true;
break;
}
else if (j == size - 1)
{
return input[i].ToString();
}
}
}
return "No unique element";
}
#user3612419
Solution given you is good with some what close to O(N*N2) but further optimization in same code is possible I just added two-3 lines that you missed.
public static string firstUnique(int[] input)
{
int size = input.Length;
bool[] dupIndex = new bool[size];
for (int i = 0; i < size; ++i)
{
if (dupIndex[i])
{
continue;
}
else if (i == size - 1)
{
return input[i].ToString();
}
for (int j = i + 1; j < size; ++j)
{
if(dupIndex[j]==true)
{
continue;
}
if (input[i]==input[j])
{
dupIndex[j] = true;
dupIndex[i] = true;
break;
}
else if (j == size - 1)
{
return input[i].ToString();
}
}
}
return "No unique element";
}

Generate a new element different from 1000 elements of an array

I was asked this questions in an interview. Consider the scenario of punched cards, where each punched card has 64 bit pattern. I was suggested each card as an int since each int is a collection of bits.
Also, to be considered that I have an array which already contains 1000 such cards. I have to generate a new element everytime which is different from the previous 1000 cards. The integers(aka cards) in the array are not necessarily sorted.
Even more, how would that be possible the question was for C++, where does the 64 bit int comes from and how can I generate this new card from the array where the element to be generated is different from all the elements already present in the array?
There are 264 64 bit integers, a number that is so much
larger than 1000 that the simplest solution would be to just generate a
random 64 bit number, and then verify that it isn't in the table of
already generated numbers. (The probability that it is is
infinitesimal, but you might as well be sure.)
Since most random number generators do not generate 64 bit values, you
are left with either writing your own, or (much simpler), combining the
values, say by generating 8 random bytes, and memcpying them into a
uint64_t.
As for verifying that the number isn't already present, std::find is
just fine for one or two new numbers; if you have to do a lot of
lookups, sorting the table and using a binary search would be
worthwhile. Or some sort of a hash table.
I may be missing something, but most of the other answers appear to me as overly complicated.
Just sort the original array and then start counting from zero: if the current count is in the array skip it, otherwise you have your next number. This algorithm is O(n), where n is the number of newly generated numbers: both sorting the array and skipping existing numbers are constants. Here's an example:
#include <algorithm>
#include <iostream>
unsigned array[] = { 98, 1, 24, 66, 20, 70, 6, 33, 5, 41 };
unsigned count = 0;
unsigned index = 0;
int main() {
std::sort(array, array + 10);
while ( count < 100 ) {
if ( count > array[index] )
++index;
else {
if ( count < array[index] )
std::cout << count << std::endl;
++count;
}
}
}
Here's an O(n) algorithm:
int64 generateNewValue(list_of_cards)
{
return find_max(list_of_cards)+1;
}
Note: As #amit points out below, this will fail if INT64_MAX is already in the list.
As far as I'm aware, this is the only way you're going to get O(n). If you want to deal with that (fairly important) edge case, then you're going to have to do some kind of proper sort or search, which will take you to O(n log n).
#arne is almost there. What you need is a self-balancing interval tree, which can be built in O(n lg n) time.
Then take the top node, which will store some interval [i, j]. By the properties of an interval tree, both i-1 and j+1 are valid candidates for a new key, unless i = UINT64_MIN or j = UINT64_MAX. If both are true, then you've stored 2^64 elements and you can't possibly generate a new element. Store the new element, which takes O(lg n) worst-case time.
I.e.: init takes O(n lg n), generate takes O(lg n). Both are worst-case figures. The greatest thing about this approach is that the top node will keep "growing" (storing larger intervals) and merging with its successor or predecessor, so the tree will actually shrink in terms of memory use and eventually the time per operation decays to O(1). You also won't waste any numbers, so you can keep generating until you've got 2^64 of them.
This algorithm has O(N lg N) initialisation, O(1) query and O(N) memory usage. I assume you have some integer type which I will refer to as int64 and that it can represent the integers [0, int64_max].
Sort the numbers
Create a linked list containing intervals [u, v]
Insert [1, first number - 1]
For each of the remaining numbers, insert [prev number + 1, current number - 1]
Insert [last number + 1, int64_max]
You now have a list representing the numbers which are not used. You can simply iterate over them to generate new numbers.
I think the way to go is to use some kind of hashing. So you store your cards in some buckets based on lets say on MOD operation. Until you create some sort of indexing you are stucked with looping over the whole array.
IF you have a look on HashSet implementation in java you might get a clue.
Edit: I assume you wanted them to be random numbers, if you don't mind sequence MAX+1 below is good solution :)
You could build a binary tree of the already existing elements and traverse it until you find a node whose depth is not 64 and which has less than two child nodes. You can then construct a "missing" child node and have a new element. The should be fairly quick, in the order of about O(n) if I'm not mistaken.
bool seen[1001] = { false };
for each element of the original array
if the element is in the range 0..1000
seen[element] = true
find the index for the first false value in seen
Initialization:
Don't sort the list.
Create a new array 1000 long containing 0..999.
Iterate the list and, if any number is in the range 0..999, invalidate it in the new array by replacing the value in the new array with the value of the first item in the list.
Insertion:
Use an incrementing index to the new array. If the value in the new array at this index is not the value of the first element in the list, add it to the list, else check the value from the next position in the new array.
When the new array is used up, refill it using 1000..1999 and invalidating existing values as above. Yes, this is looping over the list, but it doesn't have to be done for each insertion.
Near O(1) until the list gets so large that occasionally iterating it for invalidation of the 'new' new array becomes significant. Maybe you could mitigate this by using a new array that grows, maybee always the size of the list?
Rgds,
Martin
Put them all into a hash table of size > 1000, and find the empty cell (this is the parking problem). Generate a key for that. This will of course work better for bigger table size. The table needs only 1-bit entries.
EDIT: this is the pigeonhole principle.
This needs "modulo tablesize" (or some other "semi-invertible" function) for a hash function.
unsigned hashtab[1001] = {0,};
unsigned long long long long numbers[1000] = { ... };
void init (void)
{
unsigned idx;
for (idx=0; idx < 1000; idx++) {
hashtab [ numbers[idx] % 1001 ] += 1; }
}
unsigned long long long long generate(void)
{
unsigned idx;
for (idx = 0; idx < 1001; idx++) {
if ( !hashtab [ idx] ) break; }
return idx + rand() * 1001;
}
Based on the solution here: question on array and number
Since there are 1000 numbers, if we consider their remainders with 1001, at least one remainder will be missing. We can pick that as our missing number.
So we maintain an array of counts: C[1001], which will maintain the number of integers with remainder r (upon dividing by 1001) in C[r].
We also maintain a set of numbers for which C[j] is 0 (say using a linked list).
When we move the window over, we decrement the count of the first element (say remainder i), i.e. decrement C[i]. If C[i] becomes zero we add i to the set of numbers. We update the C array with the new number we add.
If we need one number, we just pick a random element from the set of j for which C[j] is 0.
This is O(1) for new numbers and O(n) initially.
This is similar to other solutions but not quite.
How about something simple like this:
1) Partition the array into numbers equal and below 1000 and above
2) If all the numbers fit within the lower partition then choose 1001 (or any number greater than 1000) and we're done.
3) Otherwise we know that there must exist a number between 1 and 1000 that doesn't exist within the lower partition.
4) Create a 1000 element array of bools, or a 1000-element long bitfield, or whatnot and initialize the array to all 0's
5) For each integer in the lower partition, use its value as an index into the array/bitfield and set the corresponding bool to true (ie: do a radix sort)
6) Go over the array/bitfield and pick any unset value's index as the solution
This works in O(n) time, or since we've bounded everything by 1000, technically it's O(1), but O(n) time and space in general. There are three passes over the data, which isn't necessarily the most elegant approach, but the complexity remains O(n).
you can create a new array with the numbers that are not in the original array, then just pick one from this new array.
¿O(1)?

Fast way to pick randomly from a set, with each entry picked only once?

I'm working on a program to solve the n queens problem (the problem of putting n chess queens on an n x n chessboard such that none of them is able to capture any other using the standard chess queen's moves). I am using a heuristic algorithm, and it starts by placing one queen in each row and picking a column randomly out of the columns that are not already occupied. I feel that this step is an opportunity for optimization. Here is the code (in C++):
vector<int> colsleft;
//fills the vector sequentially with integer values
for (int c=0; c < size; c++)
colsleft.push_back(c);
for (int i=0; i < size; i++)
{
vector<int>::iterator randplace = colsleft.begin() + rand()%colsleft.size();
/* chboard is an integer array, with each entry representing a row
and holding the column position of the queen in that row */
chboard[i] = *randplace;
colsleft.erase(randplace);
}
If it is not clear from the code: I start by building a vector containing an integer for each column. Then, for each row, I pick a random entry in the vector, assign its value to that row's entry in chboard[]. I then remove that entry from the vector so it is not available for any other queens.
I'm curious about methods that could use arrays and pointers instead of a vector. Or <list>s? Is there a better way of filling the vector sequentially, other than the for loop? I would love to hear some suggestions!
The following should fulfill your needs:
#include <algorithm>
...
int randplace[size];
for (int i = 0; i < size; i ++)
randplace[i] = i;
random_shuffle(randplace, randplace + size);
You can do the same stuff with vectors, too, if you wish.
Source: http://gethelp.devx.com/techtips/cpp_pro/10min/10min1299.asp
Couple of random answers to some of your questions :):
As far as I know, there's no way to fill an array with consecutive values without iterating over it first. HOWEVER, if you really just need consecutive values, you do not need to fill the array - just use the cell indices as the values: a[0] is 0 and a[100] is 100 - when you get a random number, treat the number as the value.
You can implement the same with a list<> and remove cells you already hit, or...
For better performance, rather than removing cells, why not put an "already used" value in them (like -1) and check for that. Say you get a random number like 73, and a[73] contains -1, you just get a new random number.
Finally, describing item 3 reminded me of a re-hashing function. Perhaps you can implement your algorithm as a hash-table?
Your colsleft.erase(randplace); line is really inefficient, because erasing an element in the middle of the vector requires shifting all the ones after it. A more efficient approach that will satisfy your needs in this case is to simply swap the element with the one at index (size - i - 1) (the element whose index will be outside the range in the next iteration, so we "bring" that element into the middle, and swap the used one out).
And then we don't even need to bother deleting that element -- the end of the array will accumulate the "chosen" elements. And now we've basically implemented an in-place Knuth shuffle.