Number of swaps in a permutation [duplicate] - c++

This question already has answers here:
Counting the adjacent swaps required to convert one permutation into another
(6 answers)
Closed 8 years ago.
Is there an efficient algorithm (efficient in terms of big O notation) to find number of swaps to convert a permutation P into identity permutation I? The swaps do not need to be on adjacent elements, but on any elements.
So for example:
I = {0, 1, 2, 3, 4, 5}, number of swaps is 0
P = {0, 1, 5, 3, 4, 2}, number of swaps is 1 (2 and 5)
P = {4, 1, 3, 5, 0, 2}, number of swaps is 3 (2 with 5, 3 with 5, 4 with 0)
One idea is to write an algorithm like this:
int count = 0;
for(int i = 0; i < n; ++ i) {
for(; P[i] != i; ++ count) { // could be permuted multiple times
std::swap(P[P[i]], P[i]);
// look where the number at hand should be
}
}
But it is not very clear to me whether that is actually guaranteed to terminate or whether it finds a correct number of swaps. It works on the examples above. I tried generating all permutation on 5 and on 12 numbers and it always terminates on those.
This problem arises in numerical linear algebra. Some matrix decompositions use pivoting, which effectively swaps row with the greatest value for the next row to be manipulated, in order to avoid division by small numbers and improve numerical stability. Some decompositions, such as the LU decomposition can be later used to calculate matrix determinant, but the sign of the determinant of the decomposition is opposite to that of the original matrix, if the number of permutations is odd.
EDIT: I agree that this question is similar to Counting the adjacent swaps required to convert one permutation into another. But I would argue that this question is more fundamental. Converting permutation from one to another can be converted to this problem by inverting the target permutation in O(n), composing the permutations in O(n) and then finding the number of swaps from there to identity. Solving this question by explicitly representing identity as another permutation seems suboptimal. Also, the other question had, until yesterday, four answers where only a single one (by |\/|ad) was seemingly useful, but the description of the method seemed vague. Now user lizusek provided answer to my question there. I don't agree with closing this question as duplicate.
EDIT2: The proposed algorithm actually seems to be rather optimal, as pointed out in a comment by user rcgldr, see my answer to Counting the adjacent swaps required to convert one permutation into another.

I believe the key is to think of the permutation in terms of the cycle decomposition.
This expresses any permutation as a product of disjoint cycles.
Key facts are:
Swapping elements in two disjoint cycles produces one longer cycle
Swapping elements in the same cycle produces one fewer cycle
The number of permutations needed is n-c where c is the number of cycles in the decomposition
Your algorithm always swaps elements in the same cycle so will correctly count the number of swaps needed.
If desired, you can also do this in O(n) by computing the cycle decomposition and returning n minus the number of cycles found.
Computing the cycle decomposition can be done in O(n) by starting at the first node and following the permutation until you reach the start again. Mark all visited nodes, then start again at the next unvisited node.

I believe the following are true:
If S(x[0], ..., x[n-1]) is the minimum number of swaps needed to convert x to {0, 1, ..., n - 1}, then:
If x[n - 1] == n - 1, then S(x) == S(x[0],...,x[n-2]) (ie, cut off the last element)
If x[-1] != n - 1, then S(x) == S(x[0], ..., x[n-1], ..., x[i], ... x[n-2]) + 1, where x[i] == n - 1.
S({}) = 0.
This suggests a straightforward algorithm for computing S(x) that runs in O(n) time:
int num_swaps(int[] x, int n) {
if (n == 0) {
return 0;
} else if (x[n - 1] == n - 1) {
return num_swaps(x, n - 1);
} else {
int* i = std::find(x, x + n, n - 1);
std::swap(*i, x[n - 1])
return num_swaps(x, n - 1) + 1;
}
}

Related

Does this recursive algorithm for finding the largest sum in a continuous sub array have any advantages?

Objective: Evaluating the algorithm for finding the largest sum in a continuous subarray below.
Note: written in C++
As I was looking into the problem that Kadane successfully solved using dynamic programming, I thought I would find my own way of solving it. I did so by using a series of recursive calls depending on whether the sum can be larger by shorting the ends of the array. See below.
int corbins_largest_sum_continuous_subarray(int n, int* array){
int sum = 0; // calculate the sum of the current array given
for(int i=0; i<n; i++){sum += array[i];}
if(sum-array[0]>sum && sum-array[n-1]>sum){
return corbins_largest_sum_continuous_subarray(n-2, array+1);
}else if(sum-array[0]<sum && sum-array[n-1]>sum){
return corbins_largest_sum_continuous_subarray(n-1, array);
}else if(sum-array[0]>sum && sum-array[n-1]<sum){
return corbins_largest_sum_continuous_subarray(n-1, array+1);
}else{
return sum; // this is the largest subarray sum, can not increase any further
}
}
I understand that Kadane's algorithm takes O(n) time. I am having trouble calculating the Big O of my algorithm. Would it also be O(n)? Since it calculates the sum using O(n) and all calls after that use the same time. Does my algorithm provide any advantage over Kadane's? In what ways is Kadane's algorithm better?
First of all, the expression sum-array[0]>sum is equivalent to array[0]<0. A similar observation applies to those other conditions you have in your code.
Your algorithm is incorrect. The comment you have here is not true:
}else{
return sum // this is the largest subarray sum, can not increase any further
}
When you get at that point you know that the outer two values are both positive, but there might be a negative-sum subarray somewhere else in the array, which -- when removed -- would give two remaining subarrays, of which one (or both) could have a sum that is greater than the total sum.
For instance, the following input would be such a case:
[1, -4, 1]
Your algorithm will conclude that the maximum sum is achieved by taking the complete array (sum is -2), yet the subarray [1] represents a greater sum.
Other counter examples:
[1, 2, -2, 1]
[1, -3, -3, 1, 1]

Random generation algorithm in C++

Suppose you need to generate a random permutation of the first N integers. For example, {4, 3, 1, 5, 2} and {3, 1, 4, 2, 5} are legal permutations, but {5, 4, 1, 2, 1} is not, because one number (1) is duplicated and another (3) is missing. This routine is often used in simulation of algorithms. We assume the existence of a random number generator, RandInt(i,j), that generates between i and j with equal probability. Here is the algorithm:
Fill the array A from A[0] to A[N-1] as follows: To fill A[i], generate random numbers until you get one that is not already in A[0], A[1],…, A[i-1].
Implement this algorithm in C++ and find the complexity. This is my code:
int a;
bool b = false;
A[0] = RandInt(1,n);
for (int i=1;i<n;i++) {
do {
b = false;
a = RandInt(1,n);
for (int j=0;j<i;j++)
if(A[j] == a)
b = true;
} while(b);
A[i] = a;
}
Is this code correct? And how can I find the complexity of the algorithm? Since, RandInt(i,j) generates random numbers, I don't know how many times the do while loop will be repeated.
This algorithm will produce correct results, selecting a permutation uniformly at random from all possible permutations.
The running time is not bounded above by any deterministic function since, as you point out, it could run literally forever. In the best case, this algorithm runs in O(n^2) and selects a random permutation without having to repeat any selection. On average, you'd expect to have to try n/n=1 time to get the first unique random, n/(n-1) times to get the second, and so on down to an expected value of n/1=n times to get the last one. Adding those together gives you n*H(n), where H(n) is the nth harmonic number. It turns out H(N) is Theta(log n) so this algorithm is O(n^2 log n) in the average case.
There is a better way to do what you're trying to do: you can start with any permutation and shuffle it into another one using an algorithm that is O(n) in the worst case. The algorithm is the Fisher-Yates algorithm and works as follows:
FisherYates(array[1...n])
1. if n == 1 then return
2. r = random(2, n)
3. temp = array[1]
4. array[1] = array[r]
5. array[r] = temp
6. FisherYates(array[2...n])
This is a recursive formulation but an iterative one is straightforward. It calls random exactly n times, where n is the size of the array at the topmost invocation.

How to erase elements more efficiently from a vector or set?

Problem statement:
Input:
First two inputs are integers n and m. n is the number of knights fighting in the tournament (2 <= n <= 100000, 1 <= m <= n-1). m is the number of battles that will take place.
The next line contains n power levels.
The next m lines contain two integers l and r, indicating the range of knight positions to compete in the ith battle.
After each battle, all nights apart from the one with the highest power level will be eliminated.
The range for each battle is given in terms of the new positions of the knights, not the original positions.
Output:
Output m lines, the ith line containing the original positions (indices) of the knights from that battle. Each line is in ascending order.
Sample Input:
8 4
1 0 5 6 2 3 7 4
1 3
2 4
1 3
0 1
Sample Output:
1 2
4 5
3 7
0
Here is a visualisation of this process.
1 2
[(1,0),(0,1),(5,2),(6,3),(2,4),(3,5),(7,6),(4,7)]
-----------------
4 5
[(1,0),(6,3),(2,4),(3,5),(7,6),(4,7)]
-----------------
3 7
[(1,0),(6,3),(7,6),(4,7)]
-----------------
0
[(1,0),(7,6)]
-----------
[(7,6)]
I have solved this problem. My program produces the correct output, however, it is O(n*m) = O(n^2). I believe that if I erase knights more efficiently from the vector, efficiency can be increased. Would it be more efficient to erase elements using a set? I.e. erase contiguous segments rather that individual knights. Is there an alternative way to do this that is more efficient?
#define INPUT1(x) scanf("%d", &x)
#define INPUT2(x, y) scanf("%d%d", &x, &y)
#define OUTPUT1(x) printf("%d\n", x);
int main(int argc, char const *argv[]) {
int n, m;
INPUT2(n, m);
vector< pair<int,int> > knights(n);
for (int i = 0; i < n; i++) {
int power;
INPUT(power);
knights[i] = make_pair(power, i);
}
while(m--) {
int l, r;
INPUT2(l, r);
int max_in_range = knights[l].first;
for (int i = l+1; i <= r; i++) if (knights[i].first > max_in_range) {
max_in_range = knights[i].first;
}
int offset = l;
int range = r-l+1;
while (range--) {
if (knights[offset].first != max_in_range) {
OUTPUT1(knights[offset].second));
knights.erase(knights.begin()+offset);
}
else offset++;
}
printf("\n");
}
}
Well, removing from vector wouldn't be efficient for sure. Removing from set, or unordered set would be more effective (use iterators instead of indexes).
Yet the problem will still remain O(n^2), because you have two nested whiles running n*m times.
--EDIT--
I believe I understand the question now :)
First let's calculate the complexity of your code above. Your worst case would be the case that max range in all battles is 1 (two nights for each battle) and the battles are not ordered with respect to the position. Which means you have m battles (in this case m = n-1 ~= O(n))
The first while loop runs n times
For runs for once every time which makes it n*1 = n in total
The second while loop runs once every time which makes it n again.
Deleting from vector means n-1 shifts that makes it O(n).
Thus with the complexity of the vector total complexity is O(n^2)
First of all, you don't really need the inner for loop. Take the first knight as the max in range, compare the rest in the range one-by-one and remove the defeated ones.
Now, i believe it can be done in O(nlogn) with using std::map. The key to the map is the position and the value is the level of the knight.
Before proceeding, finding and removing an element in map is logarithmic, iterating is constant.
Finally, your code should look like:
while(m--) // n times
strongest = map.find(first_position); // find is log(n) --> n*log(n)
for (opponent = next of strongest; // this will run 1 times, since every range is 1
opponent in range;
opponent = next opponent) // iterating is constant
// removing from map is log(n) --> n * 1 * log(n)
if strongest < opponent
remove strongest, opponent is the new strongest
else
remove opponent, (be careful to remove it after iterating to next)
Ok, now the upper bound would be O(2*nlogn) = O(nlogn). If the ranges increases, that makes the run time of upper loop decrease but increases the number of remove operations. I'm sure the upper bound won't change, let's make it a homework for you to calculate :)
A solution with a treap is pretty straightforward.
For each query, you need to split the treap by implicit key to obtain the subtree that corresponds to the [l, r] range (it takes O(log n) time).
After that, you can iterate over the subtree and find the knight with the maximum strength. After that, you just need to merge the [0, l) and [r + 1, end) parts of the treap with the node that corresponds to this knight.
It's clear that all parts of the solution except for the subtree traversal and printing work in O(log n) time per query. However, each operation reinserts only one knight and erase the rest from the range, so the size of the output (and the sum of sizes of subtrees) is linear in n. So the total time complexity is O(n log n).
I don't think you can solve with standard stl containers because there'no standard container that supports getting an iterator by index quickly and removing arbitrary elements.

There is an array of n numbers. One number is repeated n/2 times and other n/2 numbers are distinct

There is an array of n numbers. One number is repeated n/2 times and other n/2 numbers are distinct.Find the repeated number. (Best soln is o(n) exactly n/2+1 comparisons.)
the main problem here is n/2+1 comparisons.
i have two solutions for O(n) but they are taking more than n/2+1 comparisons.
1> divide the numbers of array in groups of three.compare those n/3 groups for any same elements.
e.g array is (1 10 3) (4 8 1) (1 1)....so number of comparisons required is 7 which is >n/2+1
i.e 8/2+1=5
2> compare a[i] with a[i+1] and a[i+2]
e.g array is 8 10 3 4 1 1 1 1
total 9 comparisons
i appreciate even a little help.
thank you
space complexity is O(1).
of course if all other are distinct you only have to compare all pairs. If you find one pair whit two equal numbers you have this number
lets say you have numbers like this (it is just about indexing)
[1,2,3,4,5,6,7,8,9,10]
you then make n/2 + 1 comparisons like this
(1,2),(3,4),(5,6),(7,8),(9,7),(9,8)
if all pairs are distinct you return 10.
Point is then when you compare last 4 remaining numbers (7,8,9,10) you know that among then are at least two same numbers and you have 3 comparisons.
You just need to find the number that exists twice in the array.
You just start from the beginning, keep a hash or something of numbers you've already seen, when you get to a number that appears twice just stop.
worst cat scenario: you see all the n/2 distinct numbers first, and then the next number is a repeat.... n/2+2 (because the number you're looking for isn't part of the n/2 unique numbers)
Read the part about O(1) space complexity too late, but anyway, here is my solution:
#include <iterator>
#include <unordered_set>
template <typename ForwardIterator>
ForwardIterator find_repeated_element(ForwardIterator begin, ForwardIterator end)
{
typedef typename std::iterator_traits<ForwardIterator>::value_type value_type;
std::unordered_set<value_type> visited_elements;
for (; begin != end; ++begin)
{
bool could_insert = visited_elements.insert(*begin).second;
if (!could_insert) return begin;
}
return end;
}
#include <iostream>
int main()
{
int test[] = {8, 10, 3, 4, 1, 1, 1, 1};
int* end = test + sizeof test / sizeof *test;
int* p = find_repeated_element(test, end);
if (p == end)
{
std::cout << "the was no repeated element\n";
}
else
{
std::cout << "repeated element: " << *p << "\n";
}
}
Due to Pigeon hole principle, you only need to test the first n/2+1 members of the array since the repeated number for certain will be repeated at least twice. Loop through each member, using a hash table to keep track, and stop when there is a member that is repeated twice.
Another solution for O(n) (but not exactly n/2+1), but with O(1) space:
Because you have n/2 of that number, then if you look at it as a sorted array, there are to scenarios for its position:
Either it's the lowest number, so it will take positions 1-n/2 .. or it's not, and then for sure it's in position n/2+1 .
So, you can use a selection algorithm, and retrieve 4 elements: the range [(n/2-1),(n/2+1)] in size
We want then number k in size, so that's ok with the algorithm.
Then the repeated number has to be at least twice in those 4 numbers (simple check)
So total complexity: 4*O(n) + O(1) = O(n)
Regarding complexity O(n/2+1) and space complexity O(1) you can (almost) meet the requirements with this approach:
Compare tuples:
a[x] == a[x+1], a[x+2] == a[x+3] ... a[n-1] == a[n]
If no match is found increase step:
a[x] == a[x+2], a[x+1] == a[x+3]
This will in worst case run in O(n/2+2) (but always in O(1) space) when you have an array like this: [8 1 10 1 3 1 4 1]
qsort( ) the array then scan for first repeat.

Dynamic programming algorithm N, K problem

An algorithm which will take two positive numbers N and K and calculate the biggest possible number we can get by transforming N into another number via removing K digits from N.
For ex, let say we have N=12345 and K=3 so the biggest possible number we can get by removing 3 digits from N is 45 (other transformations would be 12, 15, 35 but 45 is the biggest). Also you cannot change the order of the digits in N (so 54 is NOT a solution). Another example would be N=66621542 and K=3 so the solution will be 66654.
I know this is a dynamic programming related problem and I can't get any idea about solving it. I need to solve this for 2 days, so any help is appreciated. If you don't want to solve this for me you don't have to but please point me to the trick or at least some materials where i can read up more about some similar issues.
Thank you in advance.
This can be solved in O(L) where L = number of digits. Why use complicated DP formulas when we can use a stack to do this:
For: 66621542
Add a digit on the stack while there are less than or equal to L - K digits on the stack:
66621. Now, remove digits from the stack while they are less than the currently read digit and put the current digit on the stack:
read 5: 5 > 2, pop 1 off the stack. 5 > 2, pop 2 also. put 5: 6665
read 4: stack isnt full, put 4: 66654
read 2: 2 < 4, do nothing.
You need one more condition: be sure not to pop off more items from the stack than there are digits left in your number, otherwise your solution will be incomplete!
Another example: 12345
L = 5, K = 3
put L - K = 2 digits on the stack: 12
read 3, 3 > 2, pop 2, 3 > 1, pop 1, put 3. stack: 3
read 4, 4 > 3, pop 3, put 4: 4
read 5: 5 > 4, but we can't pop 4, otherwise we won't have enough digits left. so push 5: 45.
Well, to solve any dynamic programming problem, you need to break it down into recurring subsolutions.
Say we define your problem as A(n, k), which returns the largest number possible by removing k digits from n.
We can define a simple recursive algorithm from this.
Using your example, A(12345, 3) = max { A(2345, 2), A(1345, 2), A(1245, 2), A(1234, 2) }
More generally, A(n, k) = max { A(n with 1 digit removed, k - 1) }
And you base case is A(n, 0) = n.
Using this approach, you can create a table that caches the values of n and k.
int A(int n, int k)
{
typedef std::pair<int, int> input;
static std::map<input, int> cache;
if (k == 0) return n;
input i(n, k);
if (cache.find(i) != cache.end())
return cache[i];
cache[i] = /* ... as above ... */
return cache[i];
}
Now, that's the straight forward solution, but there is a better solution that works with a very small one-dimensional cache. Consider rephrasing the question like this: "Given a string n and integer k, find the lexicographically greatest subsequence in n of length k". This is essentially what your problem is, and the solution is much more simple.
We can now define a different function B(i, j), which gives the largest lexicographical sequence of length (i - j), using only the first i digits of n (in other words, having removed j digits from the first i digits of n).
Using your example again, we would have:
B(1, 0) = 1
B(2, 0) = 12
B(3, 0) = 123
B(3, 1) = 23
B(3, 2) = 3
etc.
With a little bit of thinking, we can find the recurrence relation:
B(i, j) = max( 10B(i-1, j) + ni , B(i-1, j-1) )
or, if j = i then B(i, j) = B(i-1, j-1)
and B(0, 0) = 0
And you can code that up in a very similar way to the above.
The trick to solving a dynamic programming problem is usually to figuring out what the structure of a solution looks like, and more specifically if it exhibits optimal substructure.
In this case, it seems to me that the optimal solution with N=12345 and K=3 would have an optimal solution to N=12345 and K=2 as part of the solution. If you can convince yourself that this holds, then you should be able to express a solution to the problem recursively. Then either implement this with memoisation or bottom-up.
The two most important elements of any dynamic programming solution are:
Defining the right subproblems
Defining a recurrence relation between the answer to a sub-problem and the answer to smaller sub-problems
Finding base cases, the smallest sub-problems whose answer does not depend on any other answers
Figuring out the scan order in which you must solve the sub-problems (so that you never use the recurrence relation based on uninitialized data)
You'll know that you have the right subproblems defined when
The problem you need the answer to is one of them
The base cases really are trivial
The recurrence is easy to evaluate
The scan order is straightforward
In your case, it is straightforward to specify the subproblems. Since this is probably homework, I will just give you the hint that you might wish that N had fewer digits to start off with.
Here's what i think:
Consider the first k + 1 digits from the left. Look for the biggest one, find it and remove the numbers to the left. If there exists two of the same biggest number, find the leftmost one and remove the numbers to the left of that. store the number of removed digits ( name it j ).
Do the same thing with the new number as N and k+1-j as K. Do this until k+1 -j equals to 1 (hopefully, it will, if i'm not mistaken).
The number you end up with will be the number you're looking for.