Strategy to modify permutation algorithm to prevent duplicate printouts - c++

I've been reviewing algorithms for practice, and I'm currently looking at a permutation algorithm that I quite like:
void permute(char* set, int begin, int end) {
int range = end - begin;
if (range == 1)
cout << set << endl;
else {
for(int i = 0; i < range; ++i) {
swap(&set[begin], &set[begin+i]);
permute(set, begin+1, end);
swap(&set[begin], &set[begin+i]);
}
}
}
I actually wanted to apply this to a situation where there will be many repeated characters though, so I need to be able to modify it to prevent the printing of duplicate permutations.
How would I go about detecting that I was generating a duplicate? I know I could store this in a hash or something similar, but that's not an optimal solution - I'd prefer one that didn't require extra storage. Can someone give me a suggestion?
PS: I don't want to use the STL permutation mechanisms, and I don't want a reference to another "unique permutation algorithm" somewhere. I'd like to understand the mechanism used to prevent duplication so I can build it into this in learn, if possible.

There is no general way to prevent arbitrary functions from generating duplicates. You can always filter out the duplicates, of course, but you don't want that, and for very good reasons. So you need a special way to generate only non-duplicates.
One way would be to generate the permutations in increasing lexicographical order. Then you can just compare if a "new" permatutation is the same as the last one, and then skip it. It gets even better: the algorithm for generating permutations in increasing lexicographical order given at http://en.wikipedia.org/wiki/Permutations#Generation_in_lexicographic_order doesn't even generate the duplicates at all!
However, that is not an answer to your question, as it is a different algorithm (although based on swapping, too).
So, let's look at your algorithm a little closer. One key observation is:
Once a character is swapped to position begin, it will stay there for all nested calls of permute.
We'll combine this with the following general observation about permutations:
If you permute a string s, but only at positions where there's the same character, s will remain the same. In fact, all duplicate permutations have a different order for the occurences of some character c, where c occurs at the same positions.
OK, so all we have to do is to make sure that the occurences of each character are always in the same order as in the beginning. Code follows, but... I don't really speak C++, so I'll use Python and hope to get away with claiming it's pseudo code.
We start by your original algorithm, rewritten in 'pseudo code':
def permute(s, begin, end):
if end == begin + 1:
print(s)
else:
for i in range(begin, end):
s[begin], s[i] = s[i], s[begin]
permute(s, begin + 1, end)
s[begin], s[i] = s[i], s[begin]
and a helper function that makes calling it easier:
def permutations_w_duplicates(s):
permute(list(s), 0, len(s)) # use a list, as in Python strings are not mutable
Now we extend the permute function with some bookkeeping about how many times a certain character has been swapped to the begin position (i.e. has been fixed), and we also remember the original order of the occurences of each character (char_number). Each character that we try to swap to the begin position then has to be the next higher in the original order, i.e. the number of fixes for a character defines which original occurence of this character may be fixed next - I call this next_fixable.
def permute2(s, next_fixable, char_number, begin, end):
if end == begin + 1:
print(s)
else:
for i in range(begin, end):
if next_fixable[s[i]] == char_number[i]:
next_fixable[s[i]] += 1
char_number[begin], char_number[i] = char_number[i], char_number[begin]
s[begin], s[i] = s[i], s[begin]
permute2(s, next_fixable, char_number, begin + 1, end)
s[begin], s[i] = s[i], s[begin]
char_number[begin], char_number[i] = char_number[i], char_number[begin]
next_fixable[s[i]] -= 1
Again, we use a helper function:
def permutations_wo_duplicates(s):
alphabet = set(s)
next_fixable = dict.fromkeys(alphabet, 0)
count = dict.fromkeys(alphabet, 0)
char_number = [0] * len(s)
for i, c in enumerate(s):
char_number[i] = count[c]
count[c] += 1
permute2(list(s), next_fixable, char_number, 0, len(s))
That's it!
Almost. You can stop here and rewrite in C++ if you like, but if you are interested in some test data, read on.
I used a slightly different code for testing, because I didn't want to print all permutations. In Python, you would replace the print with a yield, with turns the function into a generator function, the result of which can be iterated over with a for loop, and the permutations will be computed only when needed. This is the real code and test I used:
def permute2(s, next_fixable, char_number, begin, end):
if end == begin + 1:
yield "".join(s) # join the characters to form a string
else:
for i in range(begin, end):
if next_fixable[s[i]] == char_number[i]:
next_fixable[s[i]] += 1
char_number[begin], char_number[i] = char_number[i], char_number[begin]
s[begin], s[i] = s[i], s[begin]
for p in permute2(s, next_fixable, char_number, begin + 1, end):
yield p
s[begin], s[i] = s[i], s[begin]
char_number[begin], char_number[i] = char_number[i], char_number[begin]
next_fixable[s[i]] -= 1
def permutations_wo_duplicates(s):
alphabet = set(s)
next_fixable = dict.fromkeys(alphabet, 0)
count = dict.fromkeys(alphabet, 0)
char_number = [0] * len(s)
for i, c in enumerate(s):
char_number[i] = count[c]
count[c] += 1
for p in permute2(list(s), next_fixable, char_number, 0, len(s)):
yield p
s = "FOOQUUXFOO"
A = list(permutations_w_duplicates(s))
print("%s has %s permutations (counting duplicates)" % (s, len(A)))
print("permutations of these that are unique: %s" % len(set(A)))
B = list(permutations_wo_duplicates(s))
print("%s has %s unique permutations (directly computed)" % (s, len(B)))
print("The first 10 permutations :", A[:10])
print("The first 10 unique permutations:", B[:10])
And the result:
FOOQUUXFOO has 3628800 permutations (counting duplicates)
permutations of these that are unique: 37800
FOOQUUXFOO has 37800 unique permutations (directly computed)
The first 10 permutations : ['FOOQUUXFOO', 'FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUXOOF', 'FOOQUUXOFO', 'FOOQUUFXOO', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX']
The first 10 unique permutations: ['FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX', 'FOOQUUOFXO', 'FOOQUUOFOX', 'FOOQUUOXFO', 'FOOQUUOXOF']
Note that the permutations are computed in the same order than the original algorithm, just without the duplicates. Note that 37800 * 2! * 2! * 4! = 3628800, just like you would expect.

You could add an if statement to prevent the swap code from executing if it would swap two identical characters. The for loop is then
for(int i = 0; i < range; ++i) {
if(i==0 || set[begin] != set[begin+i]) {
swap(&set[begin], &set[begin+i]);
permute(set, begin+1, end);
swap(&set[begin], &set[begin+i]);
}
}
The reason for allowing the case i==0 is make sure the recursive call happens exactly once even if all the characters of the set are the same.

A simple solution is to change the duplicate characters randomly to characters that aren't already present. Then after permutation, change the characters back. Only accept a permutation if its characters are in order.
e.g. if you have "a,b,b"
you would have had the following:
a b b
a b b
b a b
b a b
b b a
b b a
But, if we start with a,b,b and note the duplicate b's, then we can change the second b to a c
now we have a b c
a b c - accept because b is before c. change c back to b to get a b b
a c b - reject because c is before b
b a c - accept as b a b
b c a - accept as b b a
c b a - reject as c comes before b.
c a b - reject as c comes before b.

OPTION 1
One option would be to use 256 bits of storage on the stack to store a bitmask of which characters you had tried in the for loop, and only to recurse for new characters.
OPTION 2
A second option is to use the approach suggested in the comments ( http://n1b-algo.blogspot.com/2009/01/string-permutations.html) and change the for loop to:
else {
char last=0;
for(int i = 0; i < range; ++i) {
if (last==set[begin+i])
continue;
last = set[begin+i];
swap(&set[begin], &set[begin+i]);
permute(set, begin+1, end);
swap(&set[begin], &set[begin+i]);
}
}
However, to use this approach you will also have to sort the characters set[begin],set[begin+1],...set[end-1] at the entry to the function.
Note that you have to sort every time the function is called. (The blog post does not seem to mention this, but otherwise you will generate too many results for an input string "aabbc". The problem is that the string does not stay sorted after swap is used.)
This is still not very efficient. For example, for a string containing 1 'a' and N 'b's this approach will end up calling the sort N times for an overall complexity of N^2logN
OPTION 3
A more efficient approach for long strings containing lots of repeats would be to maintain both the string "set" and a dictionary of how many of each type of character you have left to use. The for loop would change to a loop over the keys of the dictonary as these would be the unique characters that are allowed at that position.
This would have complexity equal to the number of output strings, and only a very small extra amount of storage to hold the dictionary.

Simply insert each element to a set. It automatically removes duplicates. Declare set s as global variable.
set <string>s;
void permute(string a, int l, int r) {
int i;
if (l == r)
s.insert(a);
else
{
for (i = l; i <= r; i++)
{
swap((a[l]), (a[i]));
permute(a, l+1, r);
swap((a[l]), (a[i])); //backtrack
}
}
}
Finally print using the function
void printr()
{
set <string> ::iterator itr;
for (itr = s.begin(); itr != s.end(); ++itr)
{
cout << '\t' << *itr;
}
cout << '\t' << *itr;
}

The key is not to swap the same character twice. So, you could use an unordered_set to memorize which characters have been swapped.
void permute(string& input, int begin, vector<string>& output) {
if (begin == input.size()){
output.push_back(input);
}
else {
unordered_set<char> swapped;
for(int i = begin; i < input.size(); i++) {
// Do not swap a character that has been swapped
if(swapped.find(input[i]) == swapped.end()){
swapped.insert(input[i]);
swap(input[begin], input[i]);
permute(input, begin+1, output);
swap(input[begin], input[i]);
}
}
}
}
You can go through your original code by hand, and you will find those cases where duplicates occur are "swapping with the character which has been swapped."
Ex: input = "BAA"
index = 0, i = 0, input = "BAA"
----> index = 1, i = 1, input = "BAA"
----> index = 1, i = 2, input = "BAA" (duplicate)
index = 0, i = 1, input = "ABA"
----> index = 1, i = 1, input = "ABA"
----> index = 1, i = 2, input = "AAB"
index = 0, i = 2, input = "AAB"
----> index = 1, i = 1, input = "AAB" (duplicate)
----> index = 1, i = 2, input = "ABA" (duplicate)

Related

Given an integer K and a matrix of size t x t. construct a string s consisting of first t lowercase english letters such that the total cost of s is K

I'm solving this problem and stuck halfway through, looking for help and a better method to tackle such a problem:
problem:
Given an integer K and a matrix of size t x t. we have to construct a string s consisting of the first t lowercase English letters such that the total cost of s is exactly K. it is guaranteed that there exists at least one string that satisfies given conditions. Among all possible string s which is lexicographically smallest.
Specifically the cost of having the ith character followed by jth character of the English alphabet is equal to cost[i][j].
For example, the cost of having 'a' followed by 'a' is denoted by cost[0][0] and the cost of having 'b' followed by 'c' is denoted by cost[1][3].
The total cost of a string is the total cost of two consecutive characters in s. for matrix cost is
[1 2]
[3 4],
and the string is "abba", then we have
the cost of having 'a' followed by 'b' is is cost[0][1]=2.
the cost of having 'b' followed by 'b' is is `cost0=4.
the cost of having 'b' followed by 'a' is cost0=3.
In total, the cost of the string "abba" is 2+4+3=9.
Example:
consider, for example, K is 3,t is 2, the cost matrix is
[2 1]
[3 4]
There are two strings that its total cost is 3. Those strings are:
"aab"
"ba"
our answer will be "aab" as it is lexicographically smallest.
my approach
I tried to find and store all those combinations of i, j such that it sums up to desired value k or is individual equals k.
for above example
v={
{2,1},
{3,4}
}
k = 3
and v[0][0] + v[0][1] = 3 & v[1][0] = 3 . I tried to store the pairs in an array like this std::vector<std::vector<std::pair<int, int>>>. and based on it i will create all possible strings and will store in the set and it will give me the strings in lexicographical order.
i stucked by writing this much code:
#include<iostream>
#include<vector>
int main(){
using namespace std;
vector<vector<int>>v={{2,1},{3,4}};
vector<pair<int,int>>k;
int size=v.size();
for(size_t i=0;i<size;i++){
for(size_t j=0;j<size;j++){
if(v[i][j]==3){
k.push_back(make_pair(i,j));
}
}
}
}
please help me how such a problem can be tackled, Thank you. My code can only find the individual [i,j] pairs that can be equal to desired K. I don't have idea to collect multiple [i,j] pairs which sum's to desired value and it also appears my approach is totally naive and based on brute force. Looking for better perception to solve the problems and implement it in the code. Thank you.
This is a backtracking problem. General approach is :
a) Start with the "smallest" letter for e.g. 'a' and then recurse on all the available letters. If you find a string that sums to K then you have the answer because that will be the lexicographically smallest as we are finding it from smallest to largest letter.
b) If not found in 'a' move to the next letter.
Recurse/backtrack can be done as:
Start with a letter and the original value of K
explore for every j = 0 to t and reducing K by cost[i][j]
if K == 0 you found your string.
if K < 0 then that path is not possible, so remove the last letter in the string, try other paths.
Pseudocode :
string find_smallest() {
for (int i = 0; i < t; i++) {
s = (char)(i+97)
bool value = recurse(i,t,K,s)
if ( value ) return s;
s = ""
}
return ""
}
bool recurse(int i, int t, int K, string s) {
if ( K < 0 ) {
return false;
}
if ( K == 0 ) {
return true;
}
for ( int j = 0; j < t; j++ ) {
s += (char)(j+97);
bool v = recurse(j, t, K-cost[i][j], s);
if ( v ) return true;
s -= (char)(j+97);
}
return false;
}
In your implementation, you would probably need another vector of vectors of pairs to explore all your candidates. Also another vector for updating the current cost of each candidate as it builds up. Following this approach, things start to get a bit messy (IMO).
A more clean and understandable option (IMO again) could be to approach the problem with recursivity:
#include <iostream>
#include <vector>
#define K 3
using namespace std;
string exploreCandidate(int currentCost, string currentString, vector<vector<int>> &v)
{
if (currentCost == K)
return currentString;
int size = v.size();
int lastChar = (int)currentString.back() - 97; // get ASCII code
for (size_t j = 0; j < size; j++)
{
int nextTotalCost = currentCost + v[lastChar][j];
if (nextTotalCost > K)
continue;
string nextString = currentString + (char)(97 + j); // get ASCII char
string exploredString = exploreCandidate(nextTotalCost, nextString, v);
if (exploredString != "00") // It is a valid path
return exploredString;
}
return "00";
}
int main()
{
vector<vector<int>> v = {{2, 1}, {3, 4}};
int size = v.size();
string initialString = "00"; // reserve first two positions
for (size_t i = 0; i < size; i++)
{
for (size_t j = 0; j < size; j++)
{
initialString[0] = (char)(97 + i);
initialString[1] = (char)(97 + j);
string exploredString = exploreCandidate(v[i][j], initialString, v);
if (exploredString != "00") { // It is a valid path
cout << exploredString << endl;
return 0;
}
}
}
}
Let us begin from the main function:
We define our matrix and iterate over it. For each position, we define the corresponding sequence. Notice that we can use indices to get the respective character of the English alphabet, knowing that in ASCII code a=97, b=98...
Having this initial sequence, we can explore candidates recursively, which lead us to the exploreCandidate recursive function.
First, we want to make sure that the current cost is not the value we are looking for. If it is, we leave immediately without even evaluating the following iterations for candidates. We want to do this because we are looking for the lexicographically smallest element, and we are not asked to provide information about all the candidates.
If the cost condition is not satisfied (cost < K), we need to continue exploring our candidate, but not for the whole matrix but only for the row corresponding to the last character. Then we can encounter two scenarios:
The cost condition is met (cost = K): if at some point of recursivity the cost is equal to our value K, then the string is a valid one, and since it will be the first one we encounter, we want to return it and finish the execution.
The cost is not valid (cost > K): If the current cost is greater than K, then we need to abort this branch and see if other branches are luckier. Returning a boolean would be nice, but since we want to output a string (or maybe not, depending on the statement), an option could be to return a string and use "00" as our "false" value, allowing us to know whether the cost condition has been met. Other options could be returning a boolean and using an output parameter (passed by reference) to contain the output string.
EDIT:
The provided code assumes positive non-zero costs. If some costs were to be zero you could encounter infinite recursivity, so you would need to add more constraints in your recursive function.

How to optimize my Langford Sequence function?

This is my code for making a Langford Sequence out of an array of pairs of numbers (112233 -> 312132). I wanted to write a recursive function, because I wasn't able to find one online anywhere as a self-improvement exercise with algorithms. My question is, how do I optimize it? Is there a way to apply dynamic programming to this and have a better time/space complexity with emphasis on time complexity? My current Runtime complexity is O(n^2) and Space complexity of O(n). Any sort of help in writing cleaner code is also appreciated. Thanks. Also, is this a P or an NP problem?
#include <iostream>
using namespace std;
const int arrLen = 8;
const int seqLen = 8;
bool langfordSequence(int * arr, int indx, int *seq, int pos);
int main() {
int arr[] = {1,1,2,2,3,3,4,4};
int seq[] = {0,0,0,0,0,0,0,0};
bool test = langfordSequence(arr, 0, seq, 0);
if (test)
cout << "Langford Sequence Successful: " << endl;
else
cout << "Langford Sequence Failed: " << endl;
for (int i = 0; i < seqLen; i++)
{
cout << seq[i] << " ";
}
return 0;
}
bool langfordSequence(int * arr, int indx, int *seq, int pos)
{
if (indx >= arrLen - 1) //this means we've reached the end of the array
return true;
if (pos + arr[indx] + 1 >= seqLen) //if the second part of the number is off the array
return false;
if (seq[pos] == 0 && seq[pos + arr[indx] + 1] == 0)
{
seq[pos] = arr[indx];
seq[pos + arr[indx] + 1] = arr[indx];
if (langfordSequence(arr, indx + 2, seq, 0)) //the current pair is good, go to the next one, start from the beginning
return true;
else
{
seq[pos] = 0;
seq[pos + arr[indx] + 1] = 0;
if (langfordSequence(arr, indx, seq, pos + 1))
return true;
}
}
else
{
if (langfordSequence(arr, indx, seq, pos + 1)) //current position is no good, try next position
return true;
}
}
Here’s pseudocode for the idea I was referring to in my comments. I haven’t searched to see who else has done something like this yet (because I like to solve things myself first) but someone else probably has priority.
Algorithm LANGFORD
Parameters N (largest element in the top-level, final sequence), M (largest element of the intermediate, hooked sequence). At the top level, M = N.
Returns: A list of all sequences of length 2N such that each element j in 1..M appears exactly twice separated by exactly j elements and the position of the second M is less than N + M/2 + 1. All other elements of the sequence are set to 0.
If M == 1 (base case)
Let S' := []
For i := 0 to N-2
Let s' be the length 2N sequence containing the subsequence "101" starting at position i (counting from 0), and zero everywhere else.
Insert s' into S'
Return S'
Otherwise: (inductive case)
Let S' := []
Let S := LANGFORD(N,M-1)
For each s in S
Let r := reverse(s)
For i := 0 to floor(N - M/2 + 1)
If s[i] == s[i+M+1] == 0
Let s' be s with s'[i] and s'[i+M+1] replaced by M
Insert s' into S'
If r != s and r[i] == r[i+M+1] == 0
Let r' be r with r'[i] and r'[i+M+1] replaced by M
Insert r' into S'
Return S'
Running this algorithm for N = 4, we have initially M = 4 and recurse until N = 4, M = 1. This step gives us the list [[10100000],[01010000],[00101000]]. We pass this back up to the M=2 step, which finds the hooked sequences [[12102000],[10120020],[20020101],[02002101],[00201210],[01210200],[20021010],[00201210],[20121000],[02012100]]. Passing these up to the M=3 step, we get [[30023121],[13120320],[13102302],[31213200],[23021310],[23121300],[03121320]]. Finally, we return to the top-level function and find the sequence [[41312432]], which also represents its symmetric dual 23421314.
Essentially, we're trying to fit each puzzle piece like "30003" into each potential solution, keeping in mind that the mirror image of any solution is a solution. The time and space complexity are dominated by the combinatorial explosion of potential solutions for values of M around N/2. It might be fast to store the sequences as byte arrays aligned to use vector instructions, and the lists as array lists (vector in C++, [sequence] in Haskell, etc.).

Number of Rs in a string

I have an assignment where I'm given a string S containing the letters 'R' and 'K', for example "RRRRKKKKRK".
I need to obtain the maximum number of 'R's that string could possibly hold by flipping characters i through j to their opposite. So:
for(int x = i; x < j; x++)
{
if S[x] = 'R'
{
S[X] = 'S';
}
else
{
S[X] = 'R';
}
}
However, I can only make the above call once.
So for the above example: "RRRRKKKKRK".
You would have i = 4 and j = 8 which would result in: "RRRRRRRRKR" and you would then output the number of R's in the resulting string: 9.
My code partially works, but there are some cases that it doesn't. Can anyone figure out what is missing?
Sample Input
2
RKKRK
RKKR
Sample Output
4
4
My Solution
My solution which works only for the first case, I don't know what I'm missing to complete the algorithm:
int max_R = INT_MIN;
for (int i = 0; i < s.size(); i++)
{
for (int j = i + 1; j < s.size(); j++)
{
int cnt = 0;
string t = s;
if (t[j] == 'R')
{
t[j] = 'K';
}
else
{
t[j] = 'R';
}
for (int b = 0; b < s.size(); b++)
{
if (t[b] == 'R')
{
cnt++;
if (cnt > max_R)
{
max_R = cnt;
}
}
}
}
}
cout << max_R << endl;
How about turning this into the Maximum subarray problem which has O(n) solution?
Run through the string once, giving 'K' a value of 1, and 'R' a value of -1.
E.g For 'RKRRKKKKRKK' you produce an array -> [-1, 1, -1, -1, 1, 1, 1, 1, -1, 1, 1] -> [-1, 1, -2, 4, -1, 2] (I grouped consecutive -1s and 1s to be more clear)
Apply Kadane's algorithm on the generated array. What you get from doing this is the maximum number of 'R's you can obtain from flipping 'K's.
Continuing with the example, you find that the maximum subarray is [4, -1, 2] with a sum of 5.
Now add the absolute value of the negative values outside this subarray with the sum of your maximum subarray to obtain your answer.
In our case, only -1 and -2 are negative and outside the subarray. We get |-1| + |-2| + 5 = 8
Try to carefully think about your solution. Do you understand, what it does?
First, let’s forget that the input file may contain multiple tests, so let’s get rid of the while loop. Now, we have just two for loops. The second one obviously just counts R’s in the processed string. But what does the first one do?
The answer is that the first loop flips all the letters from the second one (i.e. which has index 1) till the end of the string. We can see that in the first testcase:
RKKRK
it is indeed the optimal solution. The string turns into RRRKR and we get four R’s. But in the second case:
RKKR
the string turns into RRRK and we get three R’s. While if we flipped just the letters from 2 to 3 (i.e. indices 1 to 2) we could get RRRR which has four R’s.
So your algorithm always flips letters from index 1 to the end, but this is not always optimal. What can we do? How do we know which letters to flip? Well, there are some smart solutions, but the easiest is to just try all possible combinations!
You can flip all the letters from 0 to 1, count the number of R’s, remember it. Get back to the original string, flip letters from 0 to 2, count R’s, remember it and so on till you flip from 0 to n-1. Then you flip letters from 1 to 2, from 1 to 3, etc. And the answer is the largest value you remembered.
This is horribly inefficient, but this works. After you get more practice in solving algorithmic problems, get back to this task and try to figure out more efficient solutions. (Hint: if you consider building the optimal answer incrementally, that is by going through the string char by char and transforming the optimal solution for the substring s[0..i] into the optimal solution for s[0..i+1] you can arrive to a pretty straightforward O(n^2) algorithm. This can be enhanced to O(n), but this step is slightly more involved.)
Here is the sketch of this solution:
def solve(s):
answer = 0
for i in 0..(n-1)
for j in i..(n-1)
t = copy(s) # we will need the original string later
flip(t, i, j) # flip letters from i to j in t
c = count_R(t) # count R's in t
answer = max(answer, c)
return answer

How to find all substrings that start and end with 1?

You are given a string of 0’s and 1’s you have to find all substrings in the string which starts and end with a 1.
For example, given 0010110010, output should be the six strings:
101
1011
1011001
11
11001
1001
Obviously there is an O(N^2) solution, but I'm looking for a solution with complexity on the order of O(N). Is it possible?
Obviously there is an O(N^2) solution, but I'm looking for a solution with complexity on the order of O(N). Is it possible?
Let k be the number of 1s in our input string. Then there are O(k^2) such substrings. Enumerating them must take at least O(k^2) time. If k ~ N, then enumerating them must take O(N^2) time.
The only way to get an O(N) solution is if we add the requirement that k is o(sqrt(N)). There cannot be an O(N) solution in the general case with no restriction on k.
An actual O(k^2) solution is straightforward:
std::string input = ...;
std::vector<size_t> ones;
ones.reserve(input.size());
// O(N) find the 1s
for (size_t idx = 0; idx < input.size(); ++idx) {
if (input[idx] == '1') {
ones.push_back(idx);
}
}
// O(k^2) walk the indices
for (size_t i = 0; i < ones.size(); ++i) {
for (size_t j = i+1; j < ones.size(); ++j) {
std::cout << input.substr(i, j - i + 1) << '\n';
}
}
Update We have to account for the lengths of the substrings as well as the number of them. The total length of all the strings is O(k * N), which is strictly greater than the previously claimed bound of O(k^2). Thus, the o(sqrt(N)) bound on k is insufficient - we actually need k to be O(1) in order to yield an O(N) solution.
You can find the same in O(n) with the following steps :
1. Count the number of 1's.
2. Let # of 1's be x, we return x(x-1)/2.
This quite trivially counts the number of possible pairs of 1's.
The code itself is probably worth trying yourself!
EDIT:
If you want to return the substrings themselves, you must restrict the number of 1's in your substring in order to get some sort of O(N) solution (or really O(x) where x is your # of 1's) , as enumerating them in itself cannot be reduced in a general case from O(N^2) time complexity.
If you just need the number of substrings, and not the substrings themselves, you could probably pull it off by counting the number of pairs after doing an initial O(n) sum of the number of 1's you encounter
Assuming N is supposed to be the number of 1s in your string (or at least proportional to it, which is reasonable assuming a constant probability of 1 for each character):
If you need the substrings themselves, there's going to be N(N-1)/2, which is quadratic, so it's completely impossible to be less complex than quadratic.
import java.util.*;
public class DistictSubstrings {
public static void main(String args[]) {
// a hash set
Scanner in = new Scanner(System.in);
System.out.print("Enter The string");
String s = in.nextLine();
int L = s.length();
Set<String> hs = new HashSet<String>();
// add elements to the hash set
for (int i = 0; i < L; ++i) {
for (int j = 0; j < L-i ; ++j) {
if(s.charAt(j)=='1'&&s.charAt(j+i)=='1')
{
hs.add(s.substring(j, j+i + 1));
}
}
}
Iterator it=hs.iterator();
System.out.println("the string starts and endswith 1");
System.out.println(hs.size());
while(it.hasNext())
{
System.out.println(it.next()+" ");
}
String s="1001010001";
for(int i=0;i<=s.length()-1;i++)
{
for(int j=0;j<=s.length()-1;j++)
{
if(s.charAt(j)=='1' && s.charAt(i)=='1' && i<j)
{
System.out.println(s.substring(i,j+1));
}
}
}
The following python code will help you to find all substrings that starts and ends with 1.
# -*- coding: utf-8 -*-
"""
Created on Tue Sep 26 14:25:14 2017
#author: Veeramani Natarajan
"""
# Python Implementation to find all substrings that start and end with 1
# Function to calculate the count of sub-strings
def calcCount(mystring):
cnt=-1
index=0
while(index<len(mystring)):
if(mystring[index]=='1'):
cnt += 1
index += 1
return cnt
mystring="0010110010";
index=0;
overall_cnt=0
while(index<len(mystring)):
if(mystring[index]=='1'):
partcount=calcCount(mystring[index:len(mystring)])
overall_cnt=overall_cnt+partcount
# print("index is",index)
# print("passed string",mystring[index:len(mystring)])
# print("Count",partcount,"overall",overall_cnt)
index=index+1
# print the overall sub strings count
print (overall_cnt)
Note:
Though this is not O(N) solution, i believe it will help someone to understand the python implementation of the above problem statement.
O(n) solution is definitely possible using DP.
We take an array of pairs where the first element in each pair denotes the number of substrings upto that index and the second element denotes the number of substrings starting with 1 up to but not including that index. (So, if the char at that index is 1, the second element won't count the substring [1, 1])
We simply iterate through the array and build the solution incrementally as we do in DP and after the end of the loop, we have the final value in the pair's first element in the last index of our array. Here's the code:
int getoneonestrings(const string &str)
{
int n = str.length();
if (n == 1)
{
return 0;
}
vector< pair<int, int> > dp(n, make_pair(0, 0));
for (int i = 1; i < n; i++)
{
if (str[i] == '0')
{
dp[i].first = dp[i - 1].first;
}
else
{
dp[i].first = dp[i - 1].first + dp[i - 1].second +
(str[i - 1] == '1' ? 1 : 0);
}
dp[i].second = dp[i - 1].second + (str[i - 1] == '1' ? 1 : 0);
}
return dp[n - 1].first;
}

Generating all permutations with repetition

How could we generate all possible permutations of n (given) distinct items taken r at a time where any item can be repeated any number of times?
Combinatorics tell me that there will be n^r of them, just wondering how to generate them with C++/python?
Here is a possible implementation in C++, along the lines of the standard library function std::next_permutation
//---------------------------------------------------------------------------
// Variations with repetition in lexicographic order
// k: length of alphabet (available symbols)
// n: number of places
// The number of possible variations (cardinality) is k^n (it's like counting)
// Sequence elements must be comparable and increaseable (operator<, operator++)
// The elements are associated to values 0÷(k-1), max=k-1
// The iterators are at least bidirectional and point to the type of 'max'
template <class Iter>
bool next_variation(Iter first, Iter last, const typename std::iterator_traits<Iter>::value_type max)
{
if(first == last) return false; // empty sequence (n==0)
Iter i(last); --i; // Point to the rightmost element
// Check if I can just increase it
if(*i < max) { ++(*i); return true; } // Increase this element and return
// Find the rightmost element to increase
while( i != first )
{
*i = 0; // reset the right-hand element
--i; // point to the left adjacent
if(*i < max) { ++(*i); return true; } // Increase this element and return
}
// If here all elements are the maximum symbol (max=k-1), so there are no more variations
//for(i=first; i!=last; ++i) *i = 0; // Should reset to the lowest sequence (0)?
return false;
} // 'next_variation'
And that's the usage:
std::vector<int> b(4,0); // four places initialized to symbol 0
do{
for(std::vector<int>::const_iterator ib=b.begin(); ib!=b.end(); ++ib)
{
std::cout << std::to_string(*ib);
}
std::cout << '\n';
}
while( next_variation(b.begin(), b.end(), 2) ); // use just 0-1-2 symbols
Treat your permutation as a r-digit number in a n-based numerical system. Start with 000...0 and increase the 'number' by one: 0000, 0001, 0002, 000(r-1), 0010, 0011, ...
The code is quite simple.
Here's an example of #Inspired's method with n as the first three letters of the alphabet and r = 3:
alphabet = [ 'a', 'b', 'c' ]
def symbolic_increment( symbol, alphabet ):
## increment our "symbolic" number by 1
symbol = list(symbol)
## we reverse the symbol to maintain the convention of having the LSD on the "right"
symbol.reverse()
place = 0;
while place < len(symbol):
if (alphabet.index(symbol[place])+1) < len(alphabet):
symbol[place] = alphabet[alphabet.index(symbol[place])+1]
break
else:
symbol[place] = alphabet[0];
place+=1
symbol.reverse()
return ''.join(symbol)
permutations=[]
r=3
start_symbol = alphabet[0] * (r)
temp_symbol = alphabet[0] * (r)
while 1:
## keep incrementing the "symbolic number" until we get back to where we started
permutations.append(temp_symbol)
temp_symbol = symbolic_increment( temp_symbol, alphabet)
if( temp_symbol == start_symbol ): break
You can also probably do it with itertools:
from itertools import product
r=3
for i in xrange(r-1):
if (i==0):
permutations = list(product(alphabet, alphabet))
else:
permutations = list(product(permutations, alphabet))
permutations = [ ''.join(item) for item in permutations ]