Fast generation of random derangements

Fast generation of random derangements - c++

I am looking to generate derangements uniformly at random. In other words: shuffle a vector so that no element stays in its original place.
Requirements:
uniform sampling (each derangement is generated with equal probability)
a practical implementation is faster than the rejection method (i.e. keep generating random permutations until we find a derangement)
None of the answers I found so far are satisfactory in that they either don't sample uniformly (or fail to prove uniformity) or do not make a practical comparison with the rejection method. About 1/e = 37% of permutations are derangements, which gives a clue about what performance one might expect at best relative to the rejection method.
The only reference I found which makes a practical comparison is in this thesis which benchmarks 7.76 s for their proposed algorithm vs 8.25 s for the rejection method (see page 73). That's a speedup by a factor of only 1.06. I am wondering if something significantly better (> 1.5) is possible.
I could implement and verify various algorithms proposed in papers, and benchmark them. Doing this correctly would take quite a bit of time. I am hoping that someone has done it, and can give me a reference.

Here is an idea for an algorithm that may work for you. Generate the derangement in cycle notation. So (1 2) (3 4 5) represents the derangement 2 1 4 5 3. (That is (1 2) is a cycle and so is (3 4 5).)
Put the first element in the first place (in cycle notation you can always do this) and take a random permutation of the rest. Now we just need to find out where the parentheses go for the cycle lengths.
As https://mathoverflow.net/questions/130457/the-distribution-of-cycle-length-in-random-derangement notes, in a permutation, a random cycle is uniformly distributed in length. They are not randomly distributed in derangements. But the number of derangements of length m is m!/e rounded up for even m and down for odd m. So what we can do is pick a length uniformly distributed in the range 2..n and accept it with the probability that the remaining elements would, proceeding randomly, be a derangement. This cycle length will be correctly distributed. And then once we have the first cycle length, we repeat for the next until we are done.
The procedure done the way I described is simpler to implement but mathematically equivalent to taking a random derangement (by rejection), and writing down the first cycle only. Then repeating. It is therefore possible to prove that this produces all derangements with equal probability.
With this approach done naively, we will be taking an average of 3 rolls before accepting a length. However we then cut the problem in half on average. So the number of random numbers we need to generate for placing the parentheses is O(log(n)). Compared with the O(n) random numbers for constructing the permutation, this is a rounding error. However it can be optimized by noting that the highest probability for accepting is 0.5. So if we accept with twice the probability of randomly getting a derangement if we proceeded, our ratios will still be correct and we get rid of most of our rejections of cycle lengths.
If most of the time is spent in the random number generator, for large n this should run at approximately 3x the rate of the rejection method. In practice it won't be as good because switching from one representation to another is not actually free. But you should get speedups of the order of magnitude that you wanted.

this is just an idea but i think it can produce a uniformly distributed derangements.
but you need a helper buffer with max of around N/2 elements where N is the size of the items to be arranged.
first is to choose a random(1,N) position for value 1.
note: 1 to N instead of 0 to N-1 for simplicity.
then for value 2, position will be random(1,N-1) if 1 fall on position 2 and random(1,N-2) otherwise.
the algo will walk the list and count only the not-yet-used position until it reach the chosen random position for value 2, of course the position 2 will be skipped.
for value 3 the algo will check if position 3 is already used. if used, pos3 = random(1,N-2), if not, pos3 = random(1,N-3)
again, the algo will walk the list and count only the not-yet-used position until reach the count=pos3. and then position the value 3 there.
this will goes for the next values until totally placed all the values in positions.
and that will generate a uniform probability derangements.
the optimization will be focused on how the algo will reach pos# fast.
instead of walking the list to count the not-yet-used positions, the algo can used a somewhat heap like searching for the positions not yet used instead of counting and checking positions 1 by 1. or any other methods aside from heap-like searching. this is a separate problem to be solved: how to reached an unused item given it's position-count in a list of unused-items.

I'm curious ... and mathematically uninformed. So I ask innocently, why wouldn't a "simple shuffle" be sufficient?
for i from array_size downto 1: # assume zero-based arrays
j = random(0,i-1)
swap_elements(i,j)
Since the random function will never produce a value equal to i it will never leave an element where it started. Every element will be moved "somewhere else."

Let d(n) be the number of derangements of an array A of length n.
d(n) = (n-1) * (d(n-1) + d(n-2))
The d(n) arrangements are achieved by:
1. First, swapping A[0] with one of the remaining n-1 elements
2. Next, either deranging all n-1 remaning elements, or deranging
the n-2 remaining that excludes the index
that received A[0] from the initial matrix.
How can we generate a derangement uniformly at random?
1. Perform the swap of step 1 above.
2. Randomly decide which path we're taking in step 2,
with probability d(n-1)/(d(n-1)+d(n-2)) of deranging all remaining elements.
3. Recurse down to derangements of size 2-3 which are both precomputed.
Wikipedia has d(n) = floor(n!/e + 0.5) (exactly). You can use this to calculate the probability of step 2 exactly in constant time for small n. For larger n the factorial can be slow, but all you need is the ratio. It's approximately (n-1)/n. You can live with the approximation, or precompute and store the ratios up to the max n you're considering.
Note that (n-1)/n converges very quickly.

Related

C++ iterate vector randomly

I'm working on a multithreaded program where all threads share some vector (read-only). The goal of each thread is to walk the entire vector. Nonetheless, all threads must visit this vector in a different way.
Since the vector is const and shared among all threads, i cannot use random_shuffle and just iterate over it. For now my solution is to build
a crossref vector that will contain indices over the shared vector and then
shuffle this vector, i.e.
std::vector<int> crossref(SIZE) ; // SIZE is the size of the shared vector
std::iota (std::begin(crossref), std::end(crossref), 0); // Fill with indices ref
std::mt19937 g(SEED); // each thread has it own seed.
std::shuffle (crossref_.begin(), crossref_.end(), g); // Shuffle it
Nonetheless, doing this reveal some problems (1) it is not very efficient since every thread needs to access its crossref vector before accessing the shared one, (2) i have some performances issue because of the amount of memory required : the shared vector is very big and i have a lot of thread and processors.
Does anyone has some improvement ideas that will avoid the need of extra memory?

You can use the algebraic notion of primitive root modulo n.
Basically
If n is a positive integer, the integers between 1 and n − 1 that are
coprime to n form the group of primitive classes modulo n. This group
is cyclic if and only if n is equal to 2, 4, p^k, or 2p^k where p^k is
a power of an odd prime number
Wikipedia displays how you can generate numbers below 7 using 3 as generator.
From this statement you derive an algorithm.
Take your number n
Find the next prime number m which is bigger than n
For each of your thread pick a unique random number F(0) between 2 and m
Compute the next index using F(i+1) = (F(i) * F(0)) mod m. If that index is within [0, n] range, access the element. If not go towards the next index.
Stop after m - 1 iterations (or when you obtain 1, it is the same thing).
Because m is prime, every number between 2 and m-1 is coprime to m so is a generator of the sequence {1 ... m}. You are guaranteed that no number will repeat in the first m - 1 steps, and that all m - 1 numbers will appear.
Complexity :
Step 2 : Done once, complexity equivalent to finding primes up to n, ie sieve of Eratosthenes
Step 3 : Done once, you can choose 2, 3 ,4, 5, etc... Which is as low as O(thread count)
Step 4 : O(m) time, O(1) in space per thread. You dont need to store the F(i). You only need to know first value and last value. This is the same properties as incrementation

If I understand well you want to generate a random permutation in a incremental way, i.e. you want to call n times a function f so that it generates all permuted numbers from 1 to n, so that function has constant memory.
I doubt it exists if you want to obtain an uniform distribution among the permutations, but you may be satisfied with a subset of the set of permutations.
If this is the case you can generate a permutation by taking a number p prime with n and calculate for each i in [1,n] : i.p (mod n).
For example, if you have n=5 and p=7, then 7%5=2, 14%5=4, 21%5=1, 28%5=3, 35%5=0. You may combine several such functions to obtain something satisfying for you...

If memory is your biggest problem then you'll have to swap CPU cycles for memory space.
E.g. c++'s std::vector<bool> (http://en.cppreference.com/w/cpp/container/vector_bool) is a bit-array so quite memory efficient.
Each thread could have its own vector<bool> indicating wether or not it has visited a particular index. Then you'd have to use CPU cycles to randomly choose an index that it hasn't visited yet and terminate when all bools are true.

It seems this guy solved your problem in a very nice way.
This is what he says in the first line of the post: In this post I’m going to show a way to make an iterator that will visit items in a list in a random order, only visit each item once, and tell you when it’s visited all items and is finished. It does this without storing a shuffled list, and it also doesn’t have to keep track of which items it has already visited.
He leverages the power of a variable bit-lenght block cipher algorithm to generate each and every index in the array.

This is not a complete answer but it should lead us to a correct solution.
You have written some things which we could take as assumptions:
(1) it is not very efficient since every thread needs to access its
crossref vector before accessing the shared one,
This is unlikely to be true. We're talking about one indirect lookup. Unless your reference data is really a vector of ints, this will represent an infinitesimal part of your execution time. If your reference data is a vector of ints, then just make N copies of it and shuffle them...
(2) i have some performances issue because of the amount of memory
required : the shared vector is very big and i have a lot of thread
and processors.
How big? Did you measure it? How many discrete objects are there in the vector? How big is each one?
How many threads?
How many processors?
How much memory do you have?
Have you profiled the code? Are you sure where the performance bottleneck is? Have you considered a more elegant algorithm?

Fast adding random variables in C++

Short version: how to most efficiently represent and add two random variables given by lists of their realizations?
Mildly longer version:
for a workproject, I need to add several random variables each of which is given by a list of values. For example, the realizations of rand. var. A are {1,2,3} and the realizations of B are {5,6,7}. Hence, what I need is the distribution of A+B, i.e. {1+5,1+6,1+7,2+5,2+6,2+7,3+5,3+6,3+7}. And I need to do this kind of adding several times (let's denote this number of additions as COUNT, where COUNT might reach 720) for different random variables (C, D, ...).
The problem: if I use this stupid algorithm of summing each realization of A with each realization of B, the complexity is exponential in COUNT. Hence, for the case where each r.v. is given by three values, the amount of calculations for COUNT=720 is 3^720 ~ 3.36xe^343 which will last till the end of our days to calculate:) Not to mention that in real life, the lenght of each r.v. is gonna be 5000+.
Solutions:
1/ The first solution is to use the fact that I am OK with rounding, i.e. having integer values of realizations. Like this, I can represent each r.v. as a vector and for at the index corresponding to a realization I have a value of 1 (when the r.v. has this realization once). So for a r.v. A and a vector of realizations indexed from 0 to 10, the vector representing A would be [0,1,1,1,0,0,0...] and the representation for B would be [0,0,0,0,0,1,1,1,0,0,10]. Now I create A+B by going through these vectors and do the same thing as above (sum each realization of A with each realization of B and codify it into the same vector structure, quadratic complexity in vector length). The upside of this approach is that the complexity is bound. The problem of this approach is that in real applications, the realizations of A will be in the interval [-50000,50000] with a granularity of 1. Hence, after adding two random variables, the span of A+B gets to -100K, 100K.. and after 720 additions, the span of SUM(A, B, ...) gets to [-36M, 36M] and even quadratic complexity (compared to exponential complexity) on arrays this large will take forever.
2/ To have shorter arrays, one could possibly use a hashmap, which would most likely reduce the number of operations (array accesses) involved in A+B as the assumption is that some non-trivial portion of the theoreical span [-50K, 50K] will never be a realization. However, with continuing summing of more and more random variables, the number of realizations increases exponentially while the span increases only linearly, hence the density of numbers in the span increases over time. And this would kill the hashmap's benefits.
So the question is: how can I do this problem efficiently? The solution is needed for calculating a VaR in electricity trading where all distributions are given empirically and are like no ordinary distributions, hence formulas are of no use, we can only simulate.
Using math was considered as the first option as half of our dept. are mathematicians. However, the distributions that we're going to add are badly behaved and the COUNT=720 is an extreme. More likely, we are going to use COUNT=24 for a daily VaR. Taking into account the bad behaviour of distributions to add, for COUNT=24 the central limit theorem would not hold too closely (the distro of SUM(A1, A2, ..., A24) would not be close to normal). As we're calculating possible risks, we'd like to get a number as precise as possible.
The intended use is this: you have hourly casflows from some operation. The distribution of cashflows for one hour is the r.v. A. For the next hour, it's r.v. B, etc. And your question is: what is the largest loss in 99 percent of cases? So you model the cashflows for each of those 24 hours and add these cashflows as random variables so as to get a distribution of the total casfhlow over the whole day. Then you take the 0.01 quantile.

Try to reduce the number of passes required to make the whole addition, possibly reducing it to a single pass for every list, including the final one.
I don't think you can cut down on the total number of additions.
In addition, you should look into parallel algorithms and multithreading, if applicable.
At this point, most processors are able to perform additions in parallel, given proper instrucions (SSE), which will make the additions many times faster(still not a cure for the complexity problem).

As you said in your question, you're going to need an awful lot of computation to get the exact answer. So it's not going to happen.
However, as you're dealing with random values, it would be possible to apply some mathmatics to the problem. Wouldn't the result of all these additions result in something that approaches the normal distribution? For example, consider rolling a single dice. Each number has equal probability so the realisations don't follow a normal distribution (actually, they probably do, there was a program on BBC4 last week about it and it showed that lottery balls had a normal distribution to their appearance). However, if you roll two dice and sum them, then the realisations do follow a normal distribution. So I think the result of your computation is going to approximate a normal distribution so it becomes a problem of finding the average value and the sigma value for a given set of inputs. You can workout the upper and lower bounds for each input as well as their averages and I'm sure a bit of Googling will provide methods for applying functions to normal distributions.
I guess there is a corollary question and that is what the results are used for? Knowing how the results are used will inform the decision on how the results are created.

Ignoring the programmatic solutions, you can cut down the total number of additions quite significantly as your data set grows.
If we define four groups W, X, Y and Z, each with three elements, by your own maths this leads to a large number of operations:
W + X => 9 operations
(W + X) + Y => 27 operations
(W + X + Y) + Z => 81 operations
TOTAL: 117 operations
However, if we assume a strictly-ordered definition of your "add" operation so that two sets {a,b} and {c,d} always result in {a+c,a+d,b+c,b+d} then your operation is associative. That means that you can do this:
W + X => 9 operations
Y + Z => 9 operations
(W + X) + (Y + Z) => 81 operations
TOTAL: 99 operations
This is a saving of 18 operations, for a simple case. If you extend the above to 6 groups of 3 members, the total number of operations can be dropped from 1089 to 837 - almost 20% saving. This improvement is more pronounced the more data you have (more sets or more elements will give more savings).
Further, this opens the problem to better parallelisation: if you have 200 groups to process, you can start by combining the 100 pairs in parallel, then the 50 pairs or results, then 25, etc. This will allow a large degree of parallelism that should give you much better performance. (For example, 720 sets would be added in ~10 parallel operations as each parallel add will allow increasing COUNT by a factor of 2.)
I'm absolutely no expert on this, but it would seem an ideal problem for using the parallel procesing capability of a typical GPU - my understanding is that something like CUDA would make short work of processing all these calculations in parallel.
EDIT: If your real question is "what's your largest loss" then this is a much easier problem. Given that every value in the ultimate set is the sum of one value from each "component" set, your biggest loss will generally be found by combining the lowest value from each component set. Finding these lower values (one value per set) is a much simpler job, and you then only need sum together that limited set of values.

There are basically two methods. An approximative one and an exact one...
Approximative method models the sum of random variables by a lot of samplings. Basically, having random variables A, B we randomly sample from each r.v. 50K times, add the sampled values (here SSE can help a lot) and we have a distribution of A+B. This is how mathematicians would do this in Mathematica.
Exact method utilizes something Dan Puzey proposed, namely summing only some small portion of each r.v.'s density. Let's say we have random variables with the following "densities" (where each value is of the same likelihood for simplicity sake)
A = {-5,-3,-2}
B = {+0,+1,+2}
C = {+7,+8,+9}
The sum of A+B+C is going to be
{2,3,3,4,4,4,4,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,8,8,8,9}
and if I want to know the whole distribution precisely, I have no other choice than summing each elem of A with each elem of B and then each elem of this sum with each elem of C. However, if I only want the 99% VaR of this sum, i.e. 1% percentile of this sum, I only have to sum the smallest elements of A,B,C.
More precisely, I will take nA,nB,nC smallest elements from each distribution. To determine nA,nB,nC let's set these to 1 first. Then, increase nA by one if A[nA] = min( A[nA], B[nB], C[nC]) (counting on that A,B,C are sorted). This way, I can get the nA, nB, nC smallest elements of A,B,C which I will have to sum together (each with each other) and take the X-th smallest sum (where X is 1% multiplied by total combination count of sums, i.e. 3*3*3 for A,B,C). This also tells when to stop increasing nA,nB,nC - stop when nA*nB*nC > X.
However, like this I am doing the same redundancy again, i.e. I am calculating the whole distribution of A+B+C left of the 1% percentile. Even this will be MUCH shorter than calculating the whole distro of A+B+C, however. But I believe there should be a simple iterative algo to tell exaclty the the given VaR number in O(a*b) where a is the number of added r.v.s and b is the max number of elements in the density of each r.v.
I will be glad for any comments on whether I am correct.

Fastest way for a random unique subset of C++ tr1 unordered_set

This question is related to
this one, and more precisely to this answer to it.
Here goes: I have a C++/TR1 unordered_set U of unsigned ints (rough cardinality 100-50000, rough value range 0 to 10^6).
Given a cardinality N, I want to as quickly as possible iterate over N random but
unique members of U. There is no typical value for N, but it should
work fast for small N.
In more detail, the notion of "randomness" here is
that two calls should produce somewhat different subsets -- the more different,
the better, but this is not too crucial. I would e.g. be happy with a continuous
(or wrapped-around continuous)
block of N members of U, as long as the start index of the block is random.
Non-continuous at the same cost is better, but the main concern is speed. U changes
mildly, but constantly between calls (ca. 0-10 elements inserted/erased between calls).
How far I've come:
Trivial approach: Pick random index i such that (i+N-1) < |U|.
Get an iterator it to U.begin(), advance it i times using it++, and then start
the actual loop over the subset. Advantage: easy. Disadvantage: waste of ++'es.
The bucket approach (and this I've "newly" derived from above link):
Pick i as above, find the bucket b in which the i-th element is in, get a local_iterator lit
to U.begin(b), advance lit via lit++ until we hit the i-th element of U, and from then on keep incrementing lit for N times. If we hit the end of the bucket,
we continue with lit from the beginning of the next bucket. If I want to make it
more random I can pick i completely random and wrap around the buckets.
My open questions:
For point 2 above, is it really the case that I cannot somehow get an
iterator into U once I've found the i-th element? This would spare me
the bucket boundary control, etc. For me as quite a
beginner, it seems unperceivable that the standard forward iterator should know how to
continue traversing U when at the i-th item, but when I found the i-th item myself,
it should not be possible to traverse U other than through point 2 above.
What else can I do? Do you know anything even much smarter/more random? If possible, I don't want to get involved in manual
control of bucket sizes, hash functions, and the like, as this is a bit over my head.

Depending on what runtime guarantees you want, there's a famous O(n) algorithm for picking k random elements out of a stream of numbers in one pass. To understand the algorithm, let's see it first for the case where we want to pick just one element out of the set, then we'll generalize it to work for picking k elements. The advantage of this approach is that it doesn't require any advance knowledge of the size of the input set and guarantees provably uniform sampling of elements, which is always pretty nice.
Suppose that we want to pick one element out of the set. To do this, we'll make a pass over all of the elements in the set and at each point will maintain a candidate element that we're planning on returning. As we iterate across the list of elements, we'll update our guess with some probability until at the very end we've chosen a single element with uniform probability. At each point, we will maintain the following invariant:
After seeing k elements, the probability that any of the first k elements is currently chosen as the candidate element is 1 / k.
If we maintain this invariant across the entire array, then after seeing all n elements, each of them has a 1 / n chance of being the candidate element. Thus the candidate element has been sampled with uniformly random probability.
To see how the algorithm works, let's think about what it has to do to maintain the invariant. Suppose that we've just seen the very first element. To maintain the above invariant, we have to choose it with probability 1, so we'll set our initial guess of the candidate element to be the first element.
Now, when we come to the second element, we need to hold the invariant that each element is chosen with probability 1/2, since we've seen two elements. So let's suppose that with probability 1/2 we choose the second element. Then we know the following:
The probability that we've picked the second element is 1/2.
The probability that we've picked the first element is the probability that we chose it the first time around (1) times the probability that we didn't just pick the second element (1/2). This comes out to 1/2 as well.
So at this point the invariant is still maintained! Let's see what happens when we come to the third element. At this point, we need to ensure that each element is picked with probability 1/3. Well, suppose that with probability 1/3 we choose the last element. Then we know that
The probability that we've picked the third element is 1/3.
The probability that we've picked either of the first two elements is the probability that it was chosen after the first two steps (1/2) times the probability that we didn't choose the third element (2/3). This works out to 1/3.
So again, the invariant holds!
The general pattern here looks like this: After we've seen k elements, each of the elements has a 1/k chance of being picked. When we see the (k + 1)st element, we choose it with probability 1 / (k + 1). This means that it's chosen with probability 1 / (k + 1), and all of the elements before it are chosen with probability equal to the odds that we picked it before (1 / k) and didn't pick the (k + 1)st element this time (k / (k + 1)), which gives those elements each a probability of 1 / (k + 1) of being chosen. Since this maintains the invariant at each step, we've got ourselves a great algorithm:
Choose the first element as the candidate when you see it.
For each successive element, replace the candidate element with that element with probability 1 / k, where k is the number of elements seen so far.
This runs in O(n) time, requires O(1) space, and gives back a uniformly-random element out of the data stream.
Now, let's see how to scale this up to work if we want to pick k elements out of the set, not just one. The idea is extremely similar to the previous algorithm (which actually ends up being a special case of the more general one). Instead of maintaining one candidate, we maintain k different candidates, stored in an array that we number 1, 2, ..., k. At each point, we maintain this invariant:
After seeing m > k elements, the probability that any of the first m elements is chosen is
k / m.
If we scan across the entire array, this means that when we're done, each element has probability k / n of being chosen. Since we're picking k different elements, this means that we sample the elements out of the array uniformly at random.
The algorithm is similar to before. First, choose the first k elements out of the set with probability 1. This means that when we've seen k elements, the probability that any of them have been picked is 1 = k / k and the invariant holds. Now, assume inductively that the invariant holds after m iterations and consider the (m + 1)st iteration. Choose a random number between 1 and (m + 1), inclusive. If we choose a number between 1 and k (inclusive), replace that candidate element with the next element. Otherwise, do not choose the next element. This means that we pick the next element with probability k / (m + 1) as required. The probability that any of the first m elements are chosen is then the probability that they were chosen before (k / m) times the probability that we didn't choose the slot containing that element (m / (m + 1)), which gives a total probability of being chosen of k / (m + 1) as required. By induction, this proves that the algorithm perfectly uniformly and randomly samples k elements out of the set!
Moreover, the runtime is O(n), which is proportional to the size of the set, which is completely independent of the number of elements you want to choose. It also uses only O(k) memory and makes no assumptions whatsoever about the type of the elements being stored.
Since you're trying to do this for C++, as a shameless self-promotion, I have an implementation of this algorithm (written as an STL algorithm) available here on my personal website. Feel free to use it!
Hope this helps!

Missing number(s) Interview Question Redux

The common interview problem of determining the missing value in a range from 1 to N has been done a thousand times over. Variations include 2 missing values up to K missing values.
Example problem: Range [1,10] (1 2 4 5 7 8 9 10) = {3,6}
Here is an example of the various solutions:
Easy interview question got harder: given numbers 1..100, find the missing number(s)
My question is that seeing as the simple case of one missing value is of O(n) complexity and that the complexity of the larger cases converge at roughly something larger than O(nlogn):
Couldn't it just be easier to answer the question by saying sort (mergesort) the range and iterate over it observing the missing elements?
This solution should take no more than O(nlogn) and is capable of solving the problem for ranges other than 1-to-N such as 10-to-1000 or -100 to +100 etc...
Is there any reason to believe that the given solutions in the above SO link will be better than the sorting based solution for larger number of missing values?
Note: It seems a lot of the common solutions to this problem, assume an only number theoretic approach. If one is being asked such a question in an S/E interview wouldn't it be prudent to use a more computer science/algorithmic approach, assuming the approach is on par with the number theoretic solution's complexity...
More related links:
https://mathoverflow.net/questions/25374/duplicate-detection-problem
How to tell if an array is a permutation in O(n)?

You are only specifying the time complexity, but the space complexity is also important to consider.
The problem complexity can be specified in term of N (the length of the range) and K (the number of missing elements).
In the question you link, the solution of using equations is O(K) in space (or perhaps a bit more ?), as you need one equation per unknown value.
There is also the preservation point: may you alter the list of known elements ? In a number of cases this is undesirable, in which case any solution involving reordering the elements, or consuming them, must first make a copy, O(N-K) in space.
I cannot see faster than a linear solution: you need to read all known elements (N-K) and output all unknown elements (K). Therefore you cannot get better than O(N) in time.
Let us break down the solutions
Destroying, O(N) space, O(N log N) time: in-place sort
Preserving, O(K) space ?, O(N log N) time: equation system
Preserving, O(N) space, O(N) time: counting sort
Personally, though I find the equation system solution clever, I would probably use either of the sorting solutions. Let's face it: they are much simpler to code, especially the counting sort one!
And as far as time goes, in a real execution, I think the "counting sort" would beat all other solutions hands down.
Note: the counting sort does not require the range to be [0, X), any range will do, as any finite range can be transposed to the [0, X) form by a simple translation.
EDIT:
Changed the sort to O(N), one needs to have all the elements available to sort them.
Having had some time to think about the problem, I also have another solution to propose. As noted, when N grows (dramatically) the space required might explode. However, if K is small, then we could change our representation of the list, using intervals:
{4, 5, 3, 1, 7}
can be represented as
[1,1] U [3,5] U [7,7]
In the average case, maintaining a sorted list of intervals is much less costly than maintaining a sorted list of elements, and it's as easy to deduce the missing numbers too.
The time complexity is easy: O(N log N), after all it's basically an insertion sort.
Of course what's really interesting is that there is no need to actually store the list, thus you can feed it with a stream to the algorithm.
On the other hand, I have quite a hard time figuring out the average space complexity. The "final" space occupied is O(K) (at most K+1 intervals), but during the construction there will be much more missing intervals as we introduce the elements in no particular order.
The worst case is easy enough: N/2 intervals (think odd vs even numbers). I cannot however figure out the average case though. My gut feeling is telling me it should be better than O(N), but I am not that trusting.

Whether the given solution is theoretically better than the sorting one depends on N and K. While your solution has complexity of O(N*log(N)), the given solution is O(N*K). I think that the given solution is (same as the sorting solution) able to solve any range [A, B] just by transforming the range [A, B] to [1, N].

What about this?
create your own set containing all the numbers
remove the given set of numbers from your set (no need to sort)
What's left in your set are the missing numbers.

My question is that seeing as the [...] cases converge at roughly
something larger than O(nlogn) [...]
In 2011 (after you posted this question) Caf posted a simple answer that solves the problem in O(n) time and O(k) space [where the array size is n - k].
Importantly, unlike in other solutions, Caf's answer has no hidden memory requirements (using bit array's, adding numbers to elements, multiplying elements by -1 - these would all require O(log(n)) space).
Note: The question here (and the original question) didn't ask about the streaming version of the problem, and the answer here doesn't handle that case.
Regarding the other answers: I agree that many of the proposed "solutions" to this problem have dubious complexity claims, and if their time complexities aren't better in some way than either:
count sort (O(n) time and space)
compare (heap) sort (O(n*log(n)) time, O(1) space)
...then you may as well just solve the problem by sorting.
However, we can get better complexities (and more importantly, genuinely faster solutions):

Because the numbers are taken from a small, finite range, they can be 'sorted' in linear time.
All we do is initialize an array of 100 booleans, and for each input, set the boolean corresponding to each number in the input, and then step through reporting the unset booleans.

If there are total N elements where each number x is such that 1 <= x <= N then we can solve this in O(nlogn) time complexity and O(1) space complexity.
First sort the array using quicksort or mergesort.
Scan through the sorted array and if the difference between previously scanned number, a and current number, b is equal to 2 (b - a = 2), then the missing number is a+1. This can be extended to condition where (b - a > 2).
Time complexity is O(nlogn)+O(n) almost equal to O(nlogn) when N > 100.

I already answered it HERE
You can also create an array of boolean of the size last_element_in_the_existing_array + 1.
In a for loop mark all the element true that are present in the existing array.
In another for loop print the index of the elements which contains false AKA The missing ones.
Time Complexity: O(last_element_in_the_existing_array)
Space Complexity: O(array.length)

If the range is given to you well ahead, in this case range is [1,10] you can perform XOR operation with your range and the numbers given to you. Since XOR is commutative operation. You will be left with {3,6}
(1 2 3 4 5 6 7 8 9 10) XOR (1 2 4 5 7 8 9 10) ={3,6}

How to ensure that randomly generated numbers are not being repeated? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Unique (non-repeating) random numbers in O(1)?
How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N
I want to generate random number in a certain diapason, and I must be sure, that each new number is not a duplicate of formers. One solution is to store formerly generated numbers in a container and each new number checks aginst the container. If there is such number in the container, then we generate agin, else we use and add it to the container. But with each new number this operation is becoming slower and slower. Is there any better approach, or any rand function that can work faster and ensure uniqueness of the generation?
EDIT: Yes, there is a limit (for example from 0 to 1.000.000.000). But I want to generate 100.000 unique numbers! (Would be great if the solution will be by using Qt features.)

Is there a range for the random numbers? If you have a limit for random numbers and you keep generating unique random numbers, then you'll end up with a list of all numbers from x..y in random order, where x-y is the valid range of your random numbers. If this is the case, you might improve speed greatly by simply generating the list of all numbers x..y and shuffling it, instead of generating the numbers.

I think there are 3 possible approaches, depending on range-size, and performance pattern needed you can use another algorithm.
Create a random number, see if it is in (a sorted) list. If not add and return, else try another.
Your list will grow and consume memory with every number you need. If every number is 32 bit, it will grow with at least 32 bits every time.
Every new random number increases the hit-ratio and this will make it slower.
O(n^2) - I think
Create an bit-array for every number in the range. Mark with 1/True if already returned.
Every number now only takes 1 bit, this can still be a problem if the range is big, but every number now only allocates 1 bit.
Every new random number increases the hit-ratio and this will make it slower.
O(n*2)
Pre-populate a list with all the numbers, shuffle it, and return the Nth number.
The list will not grow, returning numbers will not get slower,
but generating the list might take a long time, and a lot of memory.
O(1)
Depending on needed speed, you could store all lists in a database. There's no need for them to be in memory except speed.

Fill out a list with the numbers you need, then shuffle the list and pick your numbers from one end.

If you use a simple 32-bit linear congruential RNG (such as the so-called "Minimal Standard"), all you have to do is store the seed value you use and compare each generated number to it. If you ever reach that value again, your sequence is starting to repeat itself and you're out of values. This is O(1), but of course limited to 2^32-1 values (though I suppose you could use a 64-bit version as well).

There is a class of pseudo-random number generators that, I believe, has the properties you want: the Linear congruential generator. If defined properly, it will produce a list of integers from 0 to N-1, with no two numbers repeating until you've used all of the numbers in the list once.
#include <stdint.h>
/*
* Choose these values as follows:
*
* The MODULUS and INCREMENT must be relatively prime.
* The MULTIPLIER-1 must be divisible by all prime factors of the MODULUS.
* The MULTIPLIER-1 must be divisible by 4, if the MODULUS is divisible by 4.
*
* In addition, modulus must be <= 2**32 (0x0000000100000000ULL).
*
* A small example would be 8, 5, 3.
* A larger example would be 256, 129, 251.
* A useful example would be 0x0000000100000000ULL, 1664525, 1013904223.
*/
#define MODULUS (0x0000000100000000ULL)
#define MULTIPLIER (1664525)
#define INCREMENT (1013904223)
static uint64_t seed;
uint32_t lcg( void ) {
uint64_t temp;
temp = seed * MULTIPLIER + INCREMENT; // 64-bit intermediate product
seed = temp % MODULUS; // 32-bit end-result
return (uint32_t) seed;
}
All you have to do is choose a MODULUS such that it is larger than the number of numbers you'll need in a given run.

It wouldn't be random if there is such a pattern?
As far as I know you would have to store and filter all unwanted numbers...

unsigned int N = 1000;
vector <unsigned int> vals(N);
for(unsigned int i = 0; i < vals.size(); ++i)
vals[i] = i;
std::random_shuffle(vals.begin(), vals.end());
unsigned int random_number_1 = vals[0];
unsigned int random_number_2 = vals[1];
unsigned int random_number_3 = vals[2];
//etc

You could store the numbers in a vector, and get them by index (1..n-1). After each random generation, remove the indexed number from the vector, then generate the next number in the interval 1..n-2. etc.

If they can't be repeated, they aren't random.
EDIT:
Furthermore..
if they can't be repeated, they don't fit in a finite computer

How many random numbers do you need? Maybe you can apply a shuffle algorithm to a precalculated array of random numbers?

There is no way a random generator will output values depending on previously outputted values, because they wouldn't be random. However, you can improve performance by using different pools of random values each with values combined by a different salt value, which will divide the quantity of numbers to check by the quantity of pools you have.

If the range of the random number doesn't matter you could use a really large range of random numbers and hope you don't get any collisions. If your range is billions of times larger than the number of elements you expect to create your chances of a collision are small but still there. If the numbers don't to have an actual random distribution you could have a two part number {counter}{random x digits} that would ensure a unique number but it wouldn't be randomly distributed.

There's not going to be a pure functional approach that isn't O(n^2) on the number of results returned so far - every time a number is generated you will need to check against every result so far. Additionally, think about what happens when you're returning e.g. the 1000th number out of 1000 - you will require on average 1000 tries until the random algorithm comes up with the last unused number, with each attempt requiring an average of 499.5 comparisons with the already-generated numbers.
It should be clear from this that your description as posted is not quite exactly what you want. The better approach, as others have said, is to take a list of e.g. 1000 numbers upfront, shuffle it, and then return numbers from that list incrementally. This will guarantee you're not returning any duplicates, and return the numbers in O(1) time after the initial setup.

You can allocate enough memory for array of bits with 1 bit for each possible number. and check/set bits for every generated number. for example for numbers from 0 to 65535 you will need only 8192 (8kb) of memory.

Here's an interesting solution I came up with:
Assume you have numbers 1 to 1000 - and you don't have enough memory.
You could put all 1000 numbers into an array, and remove them one by one, but you'll get memory overflow error.
You could split the array in two, so you have an array of 1-500 and one empty array
You could then check if the number exists in array 1, or doesn't exist in the second array.
So assuming you have 1000 numbers, you can get a random number from 1-1000. If its less than 500, check array 1 and remove it if present. If it's NOT in array 2, you can add it.
This halves your memory usage.
If you propogate this using recursion, you can split your 500 array into a 250 and empty array.
Assuming empty arrays use no space, you can decrease your memory usage quite a bit.
Searching will be massively faster too, because if you break it down a lot, you generate a number such as 29. It's less than 500, less than 250, less than 125, less than 62, less than 31, greater than 15, so you do those 6 calculations, then check the array containing an average of 16/2 items - 8 in total.
I should patent this search, although I bet it already exists!

Especially given the desired number of values, you want a Linear Feedback Shift Register.
Why?
No shuffle step, nor a need to keep track of values you've already hit. As long as you go less than the full period, you should be fine.
It turns out that the Wikipedia article has some C++ code examples which are more tested than anything I would give you off the top of my head. Note that you'll want to be pulling values from inside the loops -- the loops just iterate the shift register through. You can see this in the snippet here.
(Yes, I know this was mentioned, briefly in the dupe -- saw it as I was revising. Given it hasn't been brought up here and is the best way to solve the poster's question, I think it should be brought up again.)

Let's say size=100.000 then create an array with this size. Create random numbers then put them into array.Problem is which index that number will be ? randomNumber%size will give you index.
When u put next number, use that function for index and check this value is exist or not. If not exist put it if exist then create new number and try that. U can create in fastest way with this way. Disadvange of this way is you will never find numbers which last section is same.
For example for last sections is
1231232444556
3458923444556
you will never have such numbers in your list even if they are totally different but last sections are same.

First off, there's a huge difference between random and pseudorandom. There's no way to generate perfectly random numbers from a deterministic process (such as a computer) without bringing in some physical process like latency between keystrokes or another entropy source.
The approach of saving all the numbers generated will slow down the computation rather quickly; the more numbers you have, the larger your storage needs, until you've filled up all available memory. A better method would be (as someone's already suggested) using a well known pseudorandom number generator such as the Linear Congruential Generator; it's super fast, requiring only modular multiplication and addition, and the theory behind it gets a lot of mention in Vol. 2 of Knuth's TAOCP. That way, the theory involved guarantees a rather large period before repetition, and the only storage needed are the parameters and seed used.

If you have no problem when a value can be calculated by the previous one, LFSR and LCG are fine. When you don't want that one output value can be calculated by another, you can use a block cipher in counter mode to generate the output sequence, given that the cipher block length is equal to the output length.

Use Hashset generic class . This class does not contain same values. You can put in all of your generated numbers then u can use them in Hashset.You can also check it if it is exist or not .Hashset can determine existence of items in fastest way.Hashset does not slow when list become bigger and this is biggest feature of it.
For example :
HashSet<int> array = new HashSet<int>();
array.Add(1);
array.Add(2);
array.Add(1);
foreach (var item in array)
{
Console.WriteLine(item);
}
Console.ReadKey();

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js