Next lexical "permutation" algorithm

Next lexical "permutation" algorithm - c++

I wrote a program that solves a generalized version of 24(link for those curious). That is, given a set of n numbers, is there a way to perform binary operations on them such that they compute to a target number.
To do this, I viewed possible expressions as a char array consisting of either 'v' or 'o', where 'v' is a placeholder for a value and 'o' is a placeholder for an operation. Note that if there are n values, there must be n-1 operations.
How the program currently works is it checks every permutation of {'o','o',...,'v',v'...} in lexicographical order and sees if the prefix expression is valid. For example, when n = 4, the following expressions are considered valid:
{‘o’,’o’,’o’,’v’,’v’,’v’,’v’}
{‘o’, ‘v’, ‘o’, ‘v’, ‘o’, ‘v’, ‘v’}
The following expressions are not valid:
{‘v’,’o’,’v’,’o’,’o’,’v’,’v’}
{‘o’,’o’,’v’,’v’,’v’,’o’,’v’}
My question is does there exist an efficient algorithm to get the next permutation that is valid in some sort of ordering? The goal is to eliminate having to check if an expression is valid for every permutation.
Moreover, if such an algorithm exists, does there exist an O(1) time to compute the kth valid permutation?
What I have so far
I hypothesize that an prefix expression A of length 2n-1 is considered valid if and only if
number of operations < number of values for each A[i:2n-1)
where 0<=i<2n-1(the subarray starting at i and ending (non-inclusive) at 2n-1)
Moreover, that implies there are exactly (1/n)C(2n-2,n-1) valid permutations where C(n,k) is n choose k.

Here's how to generate the ov-patterns. The details behind the code below are in Knuth Volume 4A (or at least alluded to; I might have worked one of the exercises). You can use the existing permutation machinery to permute the values every which way before changing patterns.
The code
#include <cstdio>
namespace {
void FirstTree(int f[], int n) {
for (int i = n; i >= 0; i--) f[i] = 2 * i + 1;
}
bool NextTree(int f[], int n) {
int i = 0;
while (f[i] + 1 == f[i + 1]) i++;
f[i]++;
FirstTree(f, i - 1);
return i + 1 < n;
}
void PrintTree(int f[], int n) {
int i = 0;
for (int j = 0; j < 2 * n; j++) {
if (j == f[i]) {
std::putchar('v');
i++;
} else {
std::putchar('o');
}
}
std::putchar('v');
std::putchar('\n');
}
}
int main() {
constexpr int kN = 4;
int f[1 + kN];
FirstTree(f, kN);
do {
PrintTree(f, kN);
} while (NextTree(f, kN));
}
generates the output
ovovovovv
oovvovovv
ovoovvovv
oovovvovv
ooovvvovv
ovovoovvv
oovvoovvv
ovoovovvv
oovovovvv
ooovvovvv
ovooovvvv
oovoovvvv
ooovovvvv
oooovvvvv
There's a way to get the kth tree, but in time O(n) rather than O(1). The magic words are unranking binary trees.

Related

Given two string S and T. Determine a substring of S that has minimum difference with T?

I have two string S and T where length of S >= length of T. I have to determine a substring of S which has same length as T and has minimum difference with T. Here difference between two strings of same length means, the number of indexes where they differ. For example: "ABCD" and "ABCE" differ at 3rd index, so their difference is 1.
I know I can use KMP(Knuth Morris Pratt) Pattern Searching algorithm to search T within S. But, what if S doesn't contain T as a substring? So, I have coded a brute force approach to solve this:
int main() {
string S, T;
cin >> S >> T;
int SZ_S = S.size(), SZ_T = T.size(), MinDifference = INT_MAX;
string ans;
for (int i = 0; i + SZ_T <= SZ_S; i++) { // I generate all the substring of S
int CurrentDifference = 0; // and check their difference with T
for (int j = 0; j < SZ_T; j++) { // and store the substring with minimum difference
if (S[i + j] != T[j])
CurrentDifference++;
}
if (CurrentDifference < MinDifference) {
ans = S.substr (i, SZ_T);
MinDifference = CurrentDifference;
}
}
cout << ans << endl;
}
But, my approach only works when S and T has shorter length. But, the problem is S and T can have length as large as 2 * 10^5. How can I approach this?

Let's maximize the number of characters that match. We can solve the problem for each character of the alphabet separately, and then sum up the results for
substrings. To solve the problem for a particular character, give string S and T as sequences 0 and 1 and multiply them using the FFT https://en.wikipedia.org/wiki/Fast_Fourier_transform.
Complexity O(|A| * N log N) where |A| size of the alphabet (for an uppercase letter is 26).

Runtime of KMP algorithm and LPS table construction

I recently came across the KMP algorithm, and I have spent a lot of time trying to understand why it works. While I do understand the basic functionality now, I simply fail to understand the runtime computations.
I have taken the below code from the geeksForGeeks site: https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/
This site claims that if the text size is O(n) and pattern size is O(m), then KMP computes a match in max O(n) time. It also states that the LPS array can be computed in O(m) time.
// C++ program for implementation of KMP pattern searching
// algorithm
#include <bits/stdc++.h>
void computeLPSArray(char* pat, int M, int* lps);
// Prints occurrences of txt[] in pat[]
void KMPSearch(char* pat, char* txt)
{
int M = strlen(pat);
int N = strlen(txt);
// create lps[] that will hold the longest prefix suffix
// values for pattern
int lps[M];
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int i = 0; // index for txt[]
int j = 0; // index for pat[]
while (i < N) {
if (pat[j] == txt[i]) {
j++;
i++;
}
if (j == M) {
printf("Found pattern at index %d ", i - j);
j = lps[j - 1];
}
// mismatch after j matches
else if (i < N && pat[j] != txt[i]) {
// Do not match lps[0..lps[j-1]] characters,
// they will match anyway
if (j != 0)
j = lps[j - 1];
else
i = i + 1;
}
}
}
// Fills lps[] for given patttern pat[0..M-1]
void computeLPSArray(char* pat, int M, int* lps)
{
// length of the previous longest prefix suffix
int len = 0;
lps[0] = 0; // lps[0] is always 0
// the loop calculates lps[i] for i = 1 to M-1
int i = 1;
while (i < M) {
if (pat[i] == pat[len]) {
len++;
lps[i] = len;
i++;
}
else // (pat[i] != pat[len])
{
// This is tricky. Consider the example.
// AAACAAAA and i = 7. The idea is similar
// to search step.
if (len != 0) {
len = lps[len - 1];
// Also, note that we do not increment
// i here
}
else // if (len == 0)
{
lps[i] = 0;
i++;
}
}
}
}
// Driver program to test above function
int main()
{
char txt[] = "ABABDABACDABABCABAB";
char pat[] = "ABABCABAB";
KMPSearch(pat, txt);
return 0;
}
I am really confused why that is the case.
For LPS computation, consider: aaaaacaaac
In this case, when we try to compute LPS for the first c, we would keep going back until we hit LPS[0], which is 0 and stop. So, essentially, we would travel back atleast the length of the pattern until that point. If this happens multiple times, how will time complexity be O(m)?
I have similar confusion on runtime of KMP to be O(n).
I have read other threads in stack overflow before posting, and also various other sites on the topic. I am still very confused. I would really appreciate if someone can help me understand the best and worse case scenarios for these algorithms and how their runtime is computed using some examples. Again, please don't suggest I google this, I have done it, spent a whole week trying to gain any insight, and failed.

One way to establish an upper bound on the runtime for construction of the LPS array is to consider a pathological case - how can we maximize the number of times we have to execute len = lps[len - 1]? Consider the following string, ignoring spaces: x1 x2 x1x3 x1x2x1x4 x1x2x1x3x1x2x1x5 ...
The second term needs to be compared to the first term as if it ended in 1 instead of 2, it would match the first term. Similarly the third term needs to be compared to the first two terms as if it ended in 1 or 2 instead of 3, it would match those partial terms. And so forth.
In the example string, it is clear that only every 1/2^n characters can match n times, so the total runtime will be m+m/2+m/4+..=2m=O(m), the length of the pattern string. I suspect it's impossible to construct a string with worse runtime than the example string and this can probably be formally proven.

Find an element in an array with limited comparisons?

Given X, M, N where X = element to be searched in an array and N = access only first N elements in an array and M = array size, how do we find an element in an array with maximum (N+1) comparisons?
For example,
A = [3,5,2,9,8,4,1,6,7] here M = 9
Let's have N = 6 and X = 5 => So for this case, access only first 6 elements of an array and try to find whether X is present in it or not? Here answer will return true. But for X = 6 answer will be false.
This problem is not about time complexity. it's about number of comparisons you make. For example, Brute force method looks like this.
void search(vector<int> A){
for(int i=0; i<N; i++){ // [i < N is also comparison which is N times]
if(A[i] != X) continue; // [N comparisons ]
else return true;
}
return false;
}
Time complexity is O(n) but number of comparisons will be 2*N. Reduce this comparisons to (N+1). I tried to solve it but did not get solution. Is there any solution actually for this?

Idea
Modify N+1-th element to have X value and eliminate range check. Then once you have found element with X value (which is going to be true if M < N), check it's index (this is a last check that you can perform). If it's equal to N+1 then you haven't found one.
Analysis
Despite that the approach eliminates comparisons duplication, it's still has one "extra" comparison:
bool search(int* a, int n, int x)
{
a[n] = x;
int idx = 0;
while (a[idx] != x) // n + 1 comparisons in case if value hasn't been found
++idx;
return idx < n; // (n + 2)-th comparison in case if value hasn't been found
}
Solution (not perfect, though)
I can see only one way to cut that extra comparison with this approach: is to use the fact that zero integer value converts to false and any integer value not equal to zero converts to true. Using this the code is going to look like this:
bool search(int* a, int n, int x)
{
a[n] = x;
int idx = 0;
while (a[idx] != x) // n + 1 comparisons in case if value hasn't been found
++idx;
return idx - n; // returns 0 only when idx == n, which means that value is not found
}

Hash functions and random permutation

After reading this question. I was wondering is it possible using O(1) space can we generate a random permutation of the sequence [1...n] with a uniform distribution using something like double hashing?
I tried this with a small example for the sequence [1,2,3,4,5] and it works. But it fails for scale for larger sets.
int h1(int k) {
return 5 - (k % 7);
}
int h2(int k) {
return (k % 3) + 1;
}
int hash(int k, int i) {
return (h1(k) + i*h2(k)) % size;
}
int main() {
for(int k = 0; k < 10; k++) {
std::cout << "k=" << k << std::endl;
for(int i = 0; i < 5; i++) {
int q = hash(k, i);
if(q < 0) q += 5;
std::cout << q;
}
std::cout << std::endl;
}
}

You can try another approach.
Take arbitrary integer number P that GCD(P, N) == 1 where GCD(P,
N) is greatest common divisor of P and N (e.g. GCD(70, 42) == 14,
GCD(24, 35) == 1).
Get sequence K[i] ::= (P * i) mod N + 1, i from 1 to N
It's proven that sequence K[i] enumerates all numbers between
1 and N with no repeats (actually K[N + 1] == K[1] but that is not a problem because we need only first N numbers).
If you can efficiently generate such numbers P with uniform distribution (e.g. with a good random function) with using Euclidean algorithm to calculate GCD in O(log(N)) complexity you'll get what you want.

It is not possible to generate a "random" permutation without some randomness. It doesn't even make sense. Your code will generate the same permutation every time.
I suspect you intend that you pick a different two random hash functions every time. But even that won't work using hash functions like you have (a +/- k%b for a,b chosen at random), as you need O(n log n) bits of randomness to specify a permutation.

I'm not sure what the question is. If you want a random permutation,
you want a random number generator, not a hash function. A hash
function is (and must be) deterministic, so it cannot be used for a
"random" permutation. And a hash is not a permutation of anything.
I don't think that a random permutation can be O(1) space. You've got
to keep track somehow of the elements which have already been used.

What is the fastest way to find longest 'consecutive numbers' streak in vector ?

I have a sorted std::vector<int> and I would like to find the longest 'streak of consecutive numbers' in this vector and then return both the length of it and the smallest number in the streak.
To visualize it for you :
suppose we have :
1 3 4 5 6 8 9
I would like it to return: maxStreakLength = 4 and streakBase = 3
There might be occasion where there will be 2 streaks and we have to choose which one is longer.
What is the best (fastest) way to do this ? I have tried to implement this but I have problems with coping with more than one streak in the vector. Should I use temporary vectors and then compare their lengths?

No you can do this in one pass through the vector and only storing the longest start point and length found so far. You also need much fewer than 'N' comparisons. *
hint: If you already have say a 4 long match ending at the 5th position (=6) and which position do you have to check next?
[*] left as exercise to the reader to work out what's the likely O( ) complexity ;-)

It would be interesting to see if the fact that the array is sorted can be exploited somehow to improve the algorithm. The first thing that comes to mind is this: if you know that all numbers in the input array are unique, then for a range of elements [i, j] in the array, you can immediately tell whether elements in that range are consecutive or not, without actually looking through the range. If this relation holds
array[j] - array[i] == j - i
then you can immediately say that elements in that range are consecutive. This criterion, obviously, uses the fact that the array is sorted and that the numbers don't repeat.
Now, we just need to develop an algorithm which will take advantage of that criterion. Here's one possible recursive approach:
Input of recursive step is the range of elements [i, j]. Initially it is [0, n-1] - the whole array.
Apply the above criterion to range [i, j]. If the range turns out to be consecutive, there's no need to subdivide it further. Send the range to output (see below for further details).
Otherwise (if the range is not consecutive), divide it into two equal parts [i, m] and [m+1, j].
Recursively invoke the algorithm on the lower part ([i, m]) and then on the upper part ([m+1, j]).
The above algorithm will perform binary partition of the array and recursive descent of the partition tree using the left-first approach. This means that this algorithm will find adjacent subranges with consecutive elements in left-to-right order. All you need to do is to join the adjacent subranges together. When you receive a subrange [i, j] that was "sent to output" at step 2, you have to concatenate it with previously received subranges, if they are indeed consecutive. Or you have to start a new range, if they are not consecutive. All the while you have keep track of the "longest consecutive range" found so far.
That's it.
The benefit of this algorithm is that it detects subranges of consecutive elements "early", without looking inside these subranges. Obviously, it's worst case performance (if ther are no consecutive subranges at all) is still O(n). In the best case, when the entire input array is consecutive, this algorithm will detect it instantly. (I'm still working on a meaningful O estimation for this algorithm.)
The usability of this algorithm is, again, undermined by the uniqueness requirement. I don't know whether it is something that is "given" in your case.
Anyway, here's a possible C++ implementation
typedef std::vector<int> vint;
typedef std::pair<vint::size_type, vint::size_type> range;
class longest_sequence
{
public:
const range& operator ()(const vint &v)
{
current = max = range(0, 0);
process_subrange(v, 0, v.size() - 1);
check_record();
return max;
}
private:
range current, max;
void process_subrange(const vint &v, vint::size_type i, vint::size_type j);
void check_record();
};
void longest_sequence::process_subrange(const vint &v,
vint::size_type i, vint::size_type j)
{
assert(i <= j && v[i] <= v[j]);
assert(i == 0 || i == current.second + 1);
if (v[j] - v[i] == j - i)
{ // Consecutive subrange found
assert(v[current.second] <= v[i]);
if (i == 0 || v[i] == v[current.second] + 1)
// Append to the current range
current.second = j;
else
{ // Range finished
// Check against the record
check_record();
// Start a new range
current = range(i, j);
}
}
else
{ // Subdivision and recursive calls
assert(i < j);
vint::size_type m = (i + j) / 2;
process_subrange(v, i, m);
process_subrange(v, m + 1, j);
}
}
void longest_sequence::check_record()
{
assert(current.second >= current.first);
if (current.second - current.first > max.second - max.first)
// We have a new record
max = current;
}
int main()
{
int a[] = { 1, 3, 4, 5, 6, 8, 9 };
std::vector<int> v(a, a + sizeof a / sizeof *a);
range r = longest_sequence()(v);
return 0;
}

I believe that this should do it?
size_t beginStreak = 0;
size_t streakLen = 1;
size_t longest = 0;
size_t longestStart = 0;
for (size_t i=1; i < len.size(); i++) {
if (vec[i] == vec[i-1] + 1) {
streakLen++;
}
else {
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
beginStreak = i;
streakLen = 1;
}
}
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}

You can't solve this problem in less than O(N) time. Imagine your list is the first N-1 even numbers, plus a single odd number (chosen from among the first N-1 odd numbers). Then there is a single streak of length 3 somewhere in the list, but worst case you need to scan the entire list to find it. Even on average you'll need to examine at least half of the list to find it.

Similar to Rodrigo's solutions but solving your example as well:
#include <vector>
#include <cstdio>
#define len(x) sizeof(x) / sizeof(x[0])
using namespace std;
int nums[] = {1,3,4,5,6,8,9};
int streakBase = nums[0];
int maxStreakLength = 1;
void updateStreak(int currentStreakLength, int currentStreakBase) {
if (currentStreakLength > maxStreakLength) {
maxStreakLength = currentStreakLength;
streakBase = currentStreakBase;
}
}
int main(void) {
vector<int> v;
for(size_t i=0; i < len(nums); ++i)
v.push_back(nums[i]);
int lastBase = v[0], currentStreakBase = v[0], currentStreakLength = 1;
for(size_t i=1; i < v.size(); ++i) {
if (v[i] == lastBase + 1) {
currentStreakLength++;
lastBase = v[i];
} else {
updateStreak(currentStreakLength, currentStreakBase);
currentStreakBase = v[i];
lastBase = v[i];
currentStreakLength = 1;
}
}
updateStreak(currentStreakLength, currentStreakBase);
printf("maxStreakLength = %d and streakBase = %d\n", maxStreakLength, streakBase);
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Next lexical "permutation" algorithm - c++

Related

Given two string S and T. Determine a substring of S that has minimum difference with T?

Runtime of KMP algorithm and LPS table construction

Find an element in an array with limited comparisons?

Hash functions and random permutation

What is the fastest way to find longest 'consecutive numbers' streak in vector ?

Categories

Resources