The amortized complexity of std::next_permutation? - c++

I just read this other question about the complexity of next_permutation and while I'm satisfied with the response (O(n)), it seems like the algorithm might have a nice amortized analysis that shows a lower complexity. Does anyone know of such an analysis?

So looks like I'm going to be answering my own question in the affirmative - yes, next_permutation runs in O(1) amortized time.
Before I go into a formal proof of this, here's a quick refresher on how the algorithm works. First, it scans backwards from the end of the range toward the beginning, identifying the longest contiguous decreasing subsequence in the range that ends at the last element. For example, in 0 3 4 2 1, the algorithm would identify 4 2 1 as this subsequence. Next, it looks at the element right before this subsequence (in the above example, 3), then finds the smallest element in the subsequence larger than it (in the above example, 4). Then, it exchanges the positions of those two elements and then reverses the identified sequence. So, if we started with 0 3 4 2 1, we'd swap the 3 and 4 to yield 0 4 3 2 1, and would then reverse the last three elements to yield 0 4 1 2 3.
To show that this algorithm runs in amortized O(1), we'll use the potential method. Define Φ to be three times the length of the longest contiguously decreasing subsequence at the end of the sequence. In this analysis, we will assume that all the elements are distinct. Given this, let's think about the runtime of this algorithm. Suppose that we scan backwards from the end of the sequence and find that the last m elements are part of the decreasing sequence. This requires m + 1 comparisons. Next, we find, of the elements of that sequence, which one is the smallest larger than the element preceding this sequence. This takes in the worst case time proportional to the length of the decreasing sequence using a linear scan for another m comparisons. Swapping the elements takes, say, 1 credit's worth of time, and reversing the sequence then requires at most m more operations. Thus the real runtime of this step is roughly 3m + 1. However, we have to factor in the change in potential. After we reverse this sequence of length m, we end up reducing the length of the longest decreasing sequence at the end of the range to be length 1, because reversing the decreasing sequence at the end makes the last elements of the range sorted in ascending order. This means that our potential changed from Φ = 3m to Φ' = 3 * 1 = 3. Consequently, the net drop in potential is 3 - 3m, so our net amortized time is 3m + 1 + (3 - 3m) = 4 = O(1).
In the preceding analysis I made the simplifying assumption that all the values are unique. To the best of my knowledge, this assumption is necessary in order for this proof to work. I'm going to think this over and see if the proof can be modified to work in the case where the elements can contain duplicates, and I'll post an edit to this answer once I've worked through the details.

I am not really sure of the exact implementation of std::next_permutation, but if it is the same as Narayana Pandita's algorithm as desribed in the wiki here: http://en.wikipedia.org/wiki/Permutation#Systematic_generation_of_all_permutations,
assuming the elements are distinct, looks like it is O(1) amortized! (Of course, there might be errors in the below)
Let us count the total number of swaps done.
We get the recurrence relation
T(n+1) = (n+1)T(n) + Θ(n2)
(n+1)T(n) comes from fixing the first element and doing the swaps for the remaining n.
Θ(n2) comes from changing the first element. At the point we change the first element, we do Θ(n) swaps. Do that n times, you get Θ(n2).
Now let X(n) = T(n)/n!
Then we get
X(n+1) = X(n) + Θ(n2)/(n+1)!
i.e there is some constant C such that
X(n+1) <= X(n) + Cn2/(n+1)!
Writing down n such inequalities gives us
X(n+1) - X(n) <= Cn2/(n+1)!
X(n) - X(n-1) <= C(n-1)2/(n)!
X(n-1) - X(n-2) <= C(n-2)2/(n-1)!
...
X(2) - X(1) <= C12/(1+1)!
Adding these up gives us X(n+1) - X(1) <= C(\sum j = 1 to n (j^2)/(j+1)!).
Since the infinite series \sum j = 1 to infinity j^2/(j+1)! converges to C', say, we get X(n+1) - X(1) <= CC'
Remember that X(n) counts the average number of swaps needed (T(n)/n!)
Thus the average number of swaps is O(1).
Since finding the elements to swap is linear with the number of swaps, it is O(1) amortized even if you take other operations into consideration.

Here n stands for the count of elements in the container, not the total count of possible permutations. The algorithm must iterate through an order of all elements at each call; it takes a pair of bidirectional iterators, which implies that to get to one element the algorithm must first visit the one before it (unless its the first or last element). A bidirectional iterator allows iterating backwards, so the algorithm can (must, in fact) perform half as many swaps as there are elements. I believe the standard could offer an overload for a forward iterator, which would support dumber iterators at the cost of n swaps rather than half n swaps. But alas, it didn't.
Of course, for n possible permutations the algorithm operates in O(1).

Related

Fast generation of random derangements

I am looking to generate derangements uniformly at random. In other words: shuffle a vector so that no element stays in its original place.
Requirements:
uniform sampling (each derangement is generated with equal probability)
a practical implementation is faster than the rejection method (i.e. keep generating random permutations until we find a derangement)
None of the answers I found so far are satisfactory in that they either don't sample uniformly (or fail to prove uniformity) or do not make a practical comparison with the rejection method. About 1/e = 37% of permutations are derangements, which gives a clue about what performance one might expect at best relative to the rejection method.
The only reference I found which makes a practical comparison is in this thesis which benchmarks 7.76 s for their proposed algorithm vs 8.25 s for the rejection method (see page 73). That's a speedup by a factor of only 1.06. I am wondering if something significantly better (> 1.5) is possible.
I could implement and verify various algorithms proposed in papers, and benchmark them. Doing this correctly would take quite a bit of time. I am hoping that someone has done it, and can give me a reference.
Here is an idea for an algorithm that may work for you. Generate the derangement in cycle notation. So (1 2) (3 4 5) represents the derangement 2 1 4 5 3. (That is (1 2) is a cycle and so is (3 4 5).)
Put the first element in the first place (in cycle notation you can always do this) and take a random permutation of the rest. Now we just need to find out where the parentheses go for the cycle lengths.
As https://mathoverflow.net/questions/130457/the-distribution-of-cycle-length-in-random-derangement notes, in a permutation, a random cycle is uniformly distributed in length. They are not randomly distributed in derangements. But the number of derangements of length m is m!/e rounded up for even m and down for odd m. So what we can do is pick a length uniformly distributed in the range 2..n and accept it with the probability that the remaining elements would, proceeding randomly, be a derangement. This cycle length will be correctly distributed. And then once we have the first cycle length, we repeat for the next until we are done.
The procedure done the way I described is simpler to implement but mathematically equivalent to taking a random derangement (by rejection), and writing down the first cycle only. Then repeating. It is therefore possible to prove that this produces all derangements with equal probability.
With this approach done naively, we will be taking an average of 3 rolls before accepting a length. However we then cut the problem in half on average. So the number of random numbers we need to generate for placing the parentheses is O(log(n)). Compared with the O(n) random numbers for constructing the permutation, this is a rounding error. However it can be optimized by noting that the highest probability for accepting is 0.5. So if we accept with twice the probability of randomly getting a derangement if we proceeded, our ratios will still be correct and we get rid of most of our rejections of cycle lengths.
If most of the time is spent in the random number generator, for large n this should run at approximately 3x the rate of the rejection method. In practice it won't be as good because switching from one representation to another is not actually free. But you should get speedups of the order of magnitude that you wanted.
this is just an idea but i think it can produce a uniformly distributed derangements.
but you need a helper buffer with max of around N/2 elements where N is the size of the items to be arranged.
first is to choose a random(1,N) position for value 1.
note: 1 to N instead of 0 to N-1 for simplicity.
then for value 2, position will be random(1,N-1) if 1 fall on position 2 and random(1,N-2) otherwise.
the algo will walk the list and count only the not-yet-used position until it reach the chosen random position for value 2, of course the position 2 will be skipped.
for value 3 the algo will check if position 3 is already used. if used, pos3 = random(1,N-2), if not, pos3 = random(1,N-3)
again, the algo will walk the list and count only the not-yet-used position until reach the count=pos3. and then position the value 3 there.
this will goes for the next values until totally placed all the values in positions.
and that will generate a uniform probability derangements.
the optimization will be focused on how the algo will reach pos# fast.
instead of walking the list to count the not-yet-used positions, the algo can used a somewhat heap like searching for the positions not yet used instead of counting and checking positions 1 by 1. or any other methods aside from heap-like searching. this is a separate problem to be solved: how to reached an unused item given it's position-count in a list of unused-items.
I'm curious ... and mathematically uninformed. So I ask innocently, why wouldn't a "simple shuffle" be sufficient?
for i from array_size downto 1: # assume zero-based arrays
j = random(0,i-1)
swap_elements(i,j)
Since the random function will never produce a value equal to i it will never leave an element where it started. Every element will be moved "somewhere else."
Let d(n) be the number of derangements of an array A of length n.
d(n) = (n-1) * (d(n-1) + d(n-2))
The d(n) arrangements are achieved by:
1. First, swapping A[0] with one of the remaining n-1 elements
2. Next, either deranging all n-1 remaning elements, or deranging
the n-2 remaining that excludes the index
that received A[0] from the initial matrix.
How can we generate a derangement uniformly at random?
1. Perform the swap of step 1 above.
2. Randomly decide which path we're taking in step 2,
with probability d(n-1)/(d(n-1)+d(n-2)) of deranging all remaining elements.
3. Recurse down to derangements of size 2-3 which are both precomputed.
Wikipedia has d(n) = floor(n!/e + 0.5) (exactly). You can use this to calculate the probability of step 2 exactly in constant time for small n. For larger n the factorial can be slow, but all you need is the ratio. It's approximately (n-1)/n. You can live with the approximation, or precompute and store the ratios up to the max n you're considering.
Note that (n-1)/n converges very quickly.

Algorithm on List and Maximum Product

a) with sequence X=(x1,x2,...,xn) of positive real numbers, we can find a sub-sequence that elements in this sub-sequence has a maximum product in O(n).
b) with an algorithm of order O(n) we can merge m=sqrt(n) sorted sequences that in whole we have n elements.
why my professor say these two sentence is false?
i read an O(n) algorithms for (a):
http://www.geeksforgeeks.org/maximum-product-subarray/
anyone can help me?
I don't know about the first statement, but the second statement can be said false by the following argument:
Since there are sqrt(n) sequences, with n elements each, the total number of elements in n*sqrt(n). In the worst case, you would need to check every element at least once to merge them all into a single list, and this would put the time complexity at at least n*sqrt(n). If there are sqrt(n) elements in each sequence, please read the edit.
I'm not really sure about the first one because the algorithm provided by you is for integers, while we are dealing with reals in your case.
EDIT: A merging algorithm for k sorted arrays and n total elements puts the time complexity at O(n*log(k)). Even if every sequence has sqrt(n) elements (as opposed to n each, as assumed in the previous paragraph) the time complexity would still be O(n*(log(sqrt(n)))).

Minimum comparisons needed to search an element in sorted array with more than half occurrences

Recently, I was given a question to find Minimum comparisons needed to search an element from n given elements, provided they are sorted, and with more than half(n/2) occurrences.
For eg. given sorted array as: 1,1,2,2,2,2,2,7,11. Size of this array is: 9. We need to find the minimum comparisons required to find 2(since it has more than n/2 occurrences(5).
What would be the best algorithm to do so and what would be the worst case Complexity?
Options provided were:
i) O(1)
ii) O(n)
iii) O(log(n))
iv) O(nlog(n))
provided they are sorted
In this case you have to check only one middle element, if fact that
with more than half(n/2) occurrences
is guaranteed
There can be two possible interpretations of the question. I'll explain both.
Firstly, if the question assumes that there is definitely a number which occurs n/2 or more times, then MBo's answer suffices.
However, if there is a chance that there is no element with n/2 occurrences, then the complexity is O(log(n)). We cannot merely check for the n/2th element. For example, in array 2, 4, 6, 6, 6, 8, 10, the middle element is 6, but it does not occur n/2 or more times. The algorithm for this case goes as follows:
Select the middle element (say x).
Find the index of x in the left sub-array using binary search (say lIndex).
Find the index of x in the right sub-array using binary search (say rIndex).
If rIndex - lIndex >= n/2, then the number occurs n/2 or more times. Otherwise, no such number is present.
Since we use binary search to find the number in left and right sub-arrays, the complexity of the above algorithm is O(log(n)).

Fastest way for a random unique subset of C++ tr1 unordered_set

This question is related to
this one, and more precisely to this answer to it.
Here goes: I have a C++/TR1 unordered_set U of unsigned ints (rough cardinality 100-50000, rough value range 0 to 10^6).
Given a cardinality N, I want to as quickly as possible iterate over N random but
unique members of U. There is no typical value for N, but it should
work fast for small N.
In more detail, the notion of "randomness" here is
that two calls should produce somewhat different subsets -- the more different,
the better, but this is not too crucial. I would e.g. be happy with a continuous
(or wrapped-around continuous)
block of N members of U, as long as the start index of the block is random.
Non-continuous at the same cost is better, but the main concern is speed. U changes
mildly, but constantly between calls (ca. 0-10 elements inserted/erased between calls).
How far I've come:
Trivial approach: Pick random index i such that (i+N-1) < |U|.
Get an iterator it to U.begin(), advance it i times using it++, and then start
the actual loop over the subset. Advantage: easy. Disadvantage: waste of ++'es.
The bucket approach (and this I've "newly" derived from above link):
Pick i as above, find the bucket b in which the i-th element is in, get a local_iterator lit
to U.begin(b), advance lit via lit++ until we hit the i-th element of U, and from then on keep incrementing lit for N times. If we hit the end of the bucket,
we continue with lit from the beginning of the next bucket. If I want to make it
more random I can pick i completely random and wrap around the buckets.
My open questions:
For point 2 above, is it really the case that I cannot somehow get an
iterator into U once I've found the i-th element? This would spare me
the bucket boundary control, etc. For me as quite a
beginner, it seems unperceivable that the standard forward iterator should know how to
continue traversing U when at the i-th item, but when I found the i-th item myself,
it should not be possible to traverse U other than through point 2 above.
What else can I do? Do you know anything even much smarter/more random? If possible, I don't want to get involved in manual
control of bucket sizes, hash functions, and the like, as this is a bit over my head.
Depending on what runtime guarantees you want, there's a famous O(n) algorithm for picking k random elements out of a stream of numbers in one pass. To understand the algorithm, let's see it first for the case where we want to pick just one element out of the set, then we'll generalize it to work for picking k elements. The advantage of this approach is that it doesn't require any advance knowledge of the size of the input set and guarantees provably uniform sampling of elements, which is always pretty nice.
Suppose that we want to pick one element out of the set. To do this, we'll make a pass over all of the elements in the set and at each point will maintain a candidate element that we're planning on returning. As we iterate across the list of elements, we'll update our guess with some probability until at the very end we've chosen a single element with uniform probability. At each point, we will maintain the following invariant:
After seeing k elements, the probability that any of the first k elements is currently chosen as the candidate element is 1 / k.
If we maintain this invariant across the entire array, then after seeing all n elements, each of them has a 1 / n chance of being the candidate element. Thus the candidate element has been sampled with uniformly random probability.
To see how the algorithm works, let's think about what it has to do to maintain the invariant. Suppose that we've just seen the very first element. To maintain the above invariant, we have to choose it with probability 1, so we'll set our initial guess of the candidate element to be the first element.
Now, when we come to the second element, we need to hold the invariant that each element is chosen with probability 1/2, since we've seen two elements. So let's suppose that with probability 1/2 we choose the second element. Then we know the following:
The probability that we've picked the second element is 1/2.
The probability that we've picked the first element is the probability that we chose it the first time around (1) times the probability that we didn't just pick the second element (1/2). This comes out to 1/2 as well.
So at this point the invariant is still maintained! Let's see what happens when we come to the third element. At this point, we need to ensure that each element is picked with probability 1/3. Well, suppose that with probability 1/3 we choose the last element. Then we know that
The probability that we've picked the third element is 1/3.
The probability that we've picked either of the first two elements is the probability that it was chosen after the first two steps (1/2) times the probability that we didn't choose the third element (2/3). This works out to 1/3.
So again, the invariant holds!
The general pattern here looks like this: After we've seen k elements, each of the elements has a 1/k chance of being picked. When we see the (k + 1)st element, we choose it with probability 1 / (k + 1). This means that it's chosen with probability 1 / (k + 1), and all of the elements before it are chosen with probability equal to the odds that we picked it before (1 / k) and didn't pick the (k + 1)st element this time (k / (k + 1)), which gives those elements each a probability of 1 / (k + 1) of being chosen. Since this maintains the invariant at each step, we've got ourselves a great algorithm:
Choose the first element as the candidate when you see it.
For each successive element, replace the candidate element with that element with probability 1 / k, where k is the number of elements seen so far.
This runs in O(n) time, requires O(1) space, and gives back a uniformly-random element out of the data stream.
Now, let's see how to scale this up to work if we want to pick k elements out of the set, not just one. The idea is extremely similar to the previous algorithm (which actually ends up being a special case of the more general one). Instead of maintaining one candidate, we maintain k different candidates, stored in an array that we number 1, 2, ..., k. At each point, we maintain this invariant:
After seeing m > k elements, the probability that any of the first m elements is chosen is
k / m.
If we scan across the entire array, this means that when we're done, each element has probability k / n of being chosen. Since we're picking k different elements, this means that we sample the elements out of the array uniformly at random.
The algorithm is similar to before. First, choose the first k elements out of the set with probability 1. This means that when we've seen k elements, the probability that any of them have been picked is 1 = k / k and the invariant holds. Now, assume inductively that the invariant holds after m iterations and consider the (m + 1)st iteration. Choose a random number between 1 and (m + 1), inclusive. If we choose a number between 1 and k (inclusive), replace that candidate element with the next element. Otherwise, do not choose the next element. This means that we pick the next element with probability k / (m + 1) as required. The probability that any of the first m elements are chosen is then the probability that they were chosen before (k / m) times the probability that we didn't choose the slot containing that element (m / (m + 1)), which gives a total probability of being chosen of k / (m + 1) as required. By induction, this proves that the algorithm perfectly uniformly and randomly samples k elements out of the set!
Moreover, the runtime is O(n), which is proportional to the size of the set, which is completely independent of the number of elements you want to choose. It also uses only O(k) memory and makes no assumptions whatsoever about the type of the elements being stored.
Since you're trying to do this for C++, as a shameless self-promotion, I have an implementation of this algorithm (written as an STL algorithm) available here on my personal website. Feel free to use it!
Hope this helps!

Missing number(s) Interview Question Redux

The common interview problem of determining the missing value in a range from 1 to N has been done a thousand times over. Variations include 2 missing values up to K missing values.
Example problem: Range [1,10] (1 2 4 5 7 8 9 10) = {3,6}
Here is an example of the various solutions:
Easy interview question got harder: given numbers 1..100, find the missing number(s)
My question is that seeing as the simple case of one missing value is of O(n) complexity and that the complexity of the larger cases converge at roughly something larger than O(nlogn):
Couldn't it just be easier to answer the question by saying sort (mergesort) the range and iterate over it observing the missing elements?
This solution should take no more than O(nlogn) and is capable of solving the problem for ranges other than 1-to-N such as 10-to-1000 or -100 to +100 etc...
Is there any reason to believe that the given solutions in the above SO link will be better than the sorting based solution for larger number of missing values?
Note: It seems a lot of the common solutions to this problem, assume an only number theoretic approach. If one is being asked such a question in an S/E interview wouldn't it be prudent to use a more computer science/algorithmic approach, assuming the approach is on par with the number theoretic solution's complexity...
More related links:
https://mathoverflow.net/questions/25374/duplicate-detection-problem
How to tell if an array is a permutation in O(n)?
You are only specifying the time complexity, but the space complexity is also important to consider.
The problem complexity can be specified in term of N (the length of the range) and K (the number of missing elements).
In the question you link, the solution of using equations is O(K) in space (or perhaps a bit more ?), as you need one equation per unknown value.
There is also the preservation point: may you alter the list of known elements ? In a number of cases this is undesirable, in which case any solution involving reordering the elements, or consuming them, must first make a copy, O(N-K) in space.
I cannot see faster than a linear solution: you need to read all known elements (N-K) and output all unknown elements (K). Therefore you cannot get better than O(N) in time.
Let us break down the solutions
Destroying, O(N) space, O(N log N) time: in-place sort
Preserving, O(K) space ?, O(N log N) time: equation system
Preserving, O(N) space, O(N) time: counting sort
Personally, though I find the equation system solution clever, I would probably use either of the sorting solutions. Let's face it: they are much simpler to code, especially the counting sort one!
And as far as time goes, in a real execution, I think the "counting sort" would beat all other solutions hands down.
Note: the counting sort does not require the range to be [0, X), any range will do, as any finite range can be transposed to the [0, X) form by a simple translation.
EDIT:
Changed the sort to O(N), one needs to have all the elements available to sort them.
Having had some time to think about the problem, I also have another solution to propose. As noted, when N grows (dramatically) the space required might explode. However, if K is small, then we could change our representation of the list, using intervals:
{4, 5, 3, 1, 7}
can be represented as
[1,1] U [3,5] U [7,7]
In the average case, maintaining a sorted list of intervals is much less costly than maintaining a sorted list of elements, and it's as easy to deduce the missing numbers too.
The time complexity is easy: O(N log N), after all it's basically an insertion sort.
Of course what's really interesting is that there is no need to actually store the list, thus you can feed it with a stream to the algorithm.
On the other hand, I have quite a hard time figuring out the average space complexity. The "final" space occupied is O(K) (at most K+1 intervals), but during the construction there will be much more missing intervals as we introduce the elements in no particular order.
The worst case is easy enough: N/2 intervals (think odd vs even numbers). I cannot however figure out the average case though. My gut feeling is telling me it should be better than O(N), but I am not that trusting.
Whether the given solution is theoretically better than the sorting one depends on N and K. While your solution has complexity of O(N*log(N)), the given solution is O(N*K). I think that the given solution is (same as the sorting solution) able to solve any range [A, B] just by transforming the range [A, B] to [1, N].
What about this?
create your own set containing all the numbers
remove the given set of numbers from your set (no need to sort)
What's left in your set are the missing numbers.
My question is that seeing as the [...] cases converge at roughly
something larger than O(nlogn) [...]
In 2011 (after you posted this question) Caf posted a simple answer that solves the problem in O(n) time and O(k) space [where the array size is n - k].
Importantly, unlike in other solutions, Caf's answer has no hidden memory requirements (using bit array's, adding numbers to elements, multiplying elements by -1 - these would all require O(log(n)) space).
Note: The question here (and the original question) didn't ask about the streaming version of the problem, and the answer here doesn't handle that case.
Regarding the other answers: I agree that many of the proposed "solutions" to this problem have dubious complexity claims, and if their time complexities aren't better in some way than either:
count sort (O(n) time and space)
compare (heap) sort (O(n*log(n)) time, O(1) space)
...then you may as well just solve the problem by sorting.
However, we can get better complexities (and more importantly, genuinely faster solutions):
Because the numbers are taken from a small, finite range, they can be 'sorted' in linear time.
All we do is initialize an array of 100 booleans, and for each input, set the boolean corresponding to each number in the input, and then step through reporting the unset booleans.
If there are total N elements where each number x is such that 1 <= x <= N then we can solve this in O(nlogn) time complexity and O(1) space complexity.
First sort the array using quicksort or mergesort.
Scan through the sorted array and if the difference between previously scanned number, a and current number, b is equal to 2 (b - a = 2), then the missing number is a+1. This can be extended to condition where (b - a > 2).
Time complexity is O(nlogn)+O(n) almost equal to O(nlogn) when N > 100.
I already answered it HERE
You can also create an array of boolean of the size last_element_in_the_existing_array + 1.
In a for loop mark all the element true that are present in the existing array.
In another for loop print the index of the elements which contains false AKA The missing ones.
Time Complexity: O(last_element_in_the_existing_array)
Space Complexity: O(array.length)
If the range is given to you well ahead, in this case range is [1,10] you can perform XOR operation with your range and the numbers given to you. Since XOR is commutative operation. You will be left with {3,6}
(1 2 3 4 5 6 7 8 9 10) XOR (1 2 4 5 7 8 9 10) ={3,6}