Generating all permutations with repetition - c++

How could we generate all possible permutations of n (given) distinct items taken r at a time where any item can be repeated any number of times?
Combinatorics tell me that there will be n^r of them, just wondering how to generate them with C++/python?

Here is a possible implementation in C++, along the lines of the standard library function std::next_permutation
//---------------------------------------------------------------------------
// Variations with repetition in lexicographic order
// k: length of alphabet (available symbols)
// n: number of places
// The number of possible variations (cardinality) is k^n (it's like counting)
// Sequence elements must be comparable and increaseable (operator<, operator++)
// The elements are associated to values 0รท(k-1), max=k-1
// The iterators are at least bidirectional and point to the type of 'max'
template <class Iter>
bool next_variation(Iter first, Iter last, const typename std::iterator_traits<Iter>::value_type max)
{
if(first == last) return false; // empty sequence (n==0)
Iter i(last); --i; // Point to the rightmost element
// Check if I can just increase it
if(*i < max) { ++(*i); return true; } // Increase this element and return
// Find the rightmost element to increase
while( i != first )
{
*i = 0; // reset the right-hand element
--i; // point to the left adjacent
if(*i < max) { ++(*i); return true; } // Increase this element and return
}
// If here all elements are the maximum symbol (max=k-1), so there are no more variations
//for(i=first; i!=last; ++i) *i = 0; // Should reset to the lowest sequence (0)?
return false;
} // 'next_variation'
And that's the usage:
std::vector<int> b(4,0); // four places initialized to symbol 0
do{
for(std::vector<int>::const_iterator ib=b.begin(); ib!=b.end(); ++ib)
{
std::cout << std::to_string(*ib);
}
std::cout << '\n';
}
while( next_variation(b.begin(), b.end(), 2) ); // use just 0-1-2 symbols

Treat your permutation as a r-digit number in a n-based numerical system. Start with 000...0 and increase the 'number' by one: 0000, 0001, 0002, 000(r-1), 0010, 0011, ...
The code is quite simple.

Here's an example of #Inspired's method with n as the first three letters of the alphabet and r = 3:
alphabet = [ 'a', 'b', 'c' ]
def symbolic_increment( symbol, alphabet ):
## increment our "symbolic" number by 1
symbol = list(symbol)
## we reverse the symbol to maintain the convention of having the LSD on the "right"
symbol.reverse()
place = 0;
while place < len(symbol):
if (alphabet.index(symbol[place])+1) < len(alphabet):
symbol[place] = alphabet[alphabet.index(symbol[place])+1]
break
else:
symbol[place] = alphabet[0];
place+=1
symbol.reverse()
return ''.join(symbol)
permutations=[]
r=3
start_symbol = alphabet[0] * (r)
temp_symbol = alphabet[0] * (r)
while 1:
## keep incrementing the "symbolic number" until we get back to where we started
permutations.append(temp_symbol)
temp_symbol = symbolic_increment( temp_symbol, alphabet)
if( temp_symbol == start_symbol ): break
You can also probably do it with itertools:
from itertools import product
r=3
for i in xrange(r-1):
if (i==0):
permutations = list(product(alphabet, alphabet))
else:
permutations = list(product(permutations, alphabet))
permutations = [ ''.join(item) for item in permutations ]

Related

How to get all dictionary words from a list of letters?

I have an input string, like "fairy", and I need to get the English words that can be formed from it. Here's an example:
5: Fairy
4: Fray, Airy, Fair, Fiar
3: Fay, fry, arf, ary, far, etc.
I have an std::unordered_set<std::string> of dictionary words so I can easily iterate over it. I've created permutations before as shown below:
std::unordered_set<std::string> permutations;
// Finds every permutation (non-duplicate arrangement of letters)
std::sort(letters.begin(), letters.end());
do {
// Check if the word is a valid dictionary word first
permutations.insert(letters);
} while (std::next_permutation(letters.begin(), letters.end()));
That's perfect for length 5. I can check each letters to see if it matches and I end up with "fairy", which is the only 5 letter word that can be found from those letters.
How could I find words of a smaller length? I'm guessing it has to do with permutations as well, but I wasn't sure how to implement it.
You can keep an auxiliary data structure and add a special symbol to mark an end-of-line:
#include <algorithm>
#include <string>
#include <set>
#include <list>
#include <iostream>
int main()
{
std::list<int> l = {-1, 0 ,1, 2, 3, 4};
std::string s = "fairy";
std::set<std::string> words;
do {
std::string temp = "";
for (auto e : l)
if (e != -1) temp += s[e];
else break;
words.insert(temp);
} while(std::next_permutation(l.begin(), l.end()));
}
Here the special symbol is -1
Okay, you have to ask yourself a question. Can you reuse letters? For instance, if you're given the word friend, is fee legal? Friend has 1 e and fee has 2. That's an important but minor detail.
Algorithm 1: Brute Force
You can iterate over your entire list of possible words and write a method "does this word consist only of letters in this other word"? If so, add it to your final list.
That algorithm changes very slightly based on your answer to the first question, but it's not hard to write.
Algorithm 2: Recursive Approach
Create a method addWords().
/**
* letters is the list of letters you're allowed to use
* word may not be empty
*/
void addWords(string letters, string word) {
size_t length = word.length();
for (int index = 0; index < length; ++index) {
string newWord = word + letters[index];
string remainingLetters = letters.substr(0, index) + letters(index + 1);
// if newword is in your dictionary, add it to the output
...
addWords(remainingLetters, newWord);
}
}
Let's look how this works with addWords("fairy", "") --
First loop: add f to the empty string and check if f is a word.
Then recurse with addWords("airy", f"). We'll look at recursion shortly.
Second loop: add a to the empty string and check if a is a word. It is, so we'll add it to the output and recurse with addWords("firy", "a").
Repeat, checking each one-letter word (5 times total).
Now, let's look at one level of recursion -- addWords("airy", "f"). Now, we're going to try in order fa, fi, etc. Then we'll recurse again with something like addWords("iry", "fa") (etc).
From recursing the second loop, we would try words beginning with a instead of f. So we would end up testing af, ai, etc.
This works if the answer to your first question is "no, we don't reuse letters". This method does NOT work if the answer is yes.
You can check every prefix (or suffix, you need only be consistent) of each permutation. This will consider some substrings multiple times, but it's a simple change.
std::unordered_set<std::string> permutations;
// Finds every permutation (non-duplicate arrangement of letters)
std::sort(letters.begin(), letters.end());
do {
std::string_view view = letters;
for (auto i = 1; i < view.size(); ++i) {
auto prefix = view.substr(0, i);
// check if prefix is dictionary word
permutations.insert(prefix);
}
} while (std::next_permutation(letters.begin(), letters.end()));
Algorithmically, you can use something like a counter to generate all subsets of your word, and then find all the permutations:
For example:
00000 ---> null
00001 ---> y
00010 ---> r
00011 ---> ry
00100 ---> i
00101 ---> iy
...
11110 ---> Fair
11111 ---> Fairy
Note: Now, do your permutation function for each word to generate other orders of the chars. See here for the permutation.
For implementing the counter, use something like a boolean array, and change the lowest bit, and update others if it needs. In each level, choose those "chars" that their indices are true in your boolean array.
Trie might be an appropriate structure to store the word.
I suggest also to use sorted anagram as "key" instead of the word directly:
class Node
{
public:
std::map<char, Node> children; // Might be an array<std::unique_ptr<Node>, 26>
std::set<std::string> data; // List of word valid with anagram
// Traditionally, Trie would use instead ` bool endOfWord = false;`
Node() = default;
const Node* get(char c) const
{
auto it = children.find(c);
if (it == children.end()) {
return nullptr;
}
return &it->second;
}
};
class Trie
{
Node root;
public:
void add(const std::string& word)
{
std::string sorted = word;
std::sort(sorted.begin(), sorted.end());
Node* node = &root;
for (const char c : sorted) {
node = &node->children[c];
}
node->data.insert(word);
}
// ...
};
Then to print all anagrams you might do:
void print_valid_words(std::string letters) const
{
std::sort(letters.begin(), letters.end());
print_valid_words(&root, letters);
}
private:
void print_valid_words(const Node* current, std::string_view letters) const
{
if (current == nullptr) return;
for (auto word : current->data) {
std::cout << word << std::endl;
}
for (std::size_t i = 0; i < letters.size(); ++i)
{
if (i == 0 || letters[i] != letters[i - 1]) {
print_valid_words(current->get(letters[i]), letters.substr(i + 1));
}
}
}
Demo

Given an integer K and a matrix of size t x t. construct a string s consisting of first t lowercase english letters such that the total cost of s is K

I'm solving this problem and stuck halfway through, looking for help and a better method to tackle such a problem:
problem:
Given an integer K and a matrix of size t x t. we have to construct a string s consisting of the first t lowercase English letters such that the total cost of s is exactly K. it is guaranteed that there exists at least one string that satisfies given conditions. Among all possible string s which is lexicographically smallest.
Specifically the cost of having the ith character followed by jth character of the English alphabet is equal to cost[i][j].
For example, the cost of having 'a' followed by 'a' is denoted by cost[0][0] and the cost of having 'b' followed by 'c' is denoted by cost[1][3].
The total cost of a string is the total cost of two consecutive characters in s. for matrix cost is
[1 2]
[3 4],
and the string is "abba", then we have
the cost of having 'a' followed by 'b' is is cost[0][1]=2.
the cost of having 'b' followed by 'b' is is `cost0=4.
the cost of having 'b' followed by 'a' is cost0=3.
In total, the cost of the string "abba" is 2+4+3=9.
Example:
consider, for example, K is 3,t is 2, the cost matrix is
[2 1]
[3 4]
There are two strings that its total cost is 3. Those strings are:
"aab"
"ba"
our answer will be "aab" as it is lexicographically smallest.
my approach
I tried to find and store all those combinations of i, j such that it sums up to desired value k or is individual equals k.
for above example
v={
{2,1},
{3,4}
}
k = 3
and v[0][0] + v[0][1] = 3 & v[1][0] = 3 . I tried to store the pairs in an array like this std::vector<std::vector<std::pair<int, int>>>. and based on it i will create all possible strings and will store in the set and it will give me the strings in lexicographical order.
i stucked by writing this much code:
#include<iostream>
#include<vector>
int main(){
using namespace std;
vector<vector<int>>v={{2,1},{3,4}};
vector<pair<int,int>>k;
int size=v.size();
for(size_t i=0;i<size;i++){
for(size_t j=0;j<size;j++){
if(v[i][j]==3){
k.push_back(make_pair(i,j));
}
}
}
}
please help me how such a problem can be tackled, Thank you. My code can only find the individual [i,j] pairs that can be equal to desired K. I don't have idea to collect multiple [i,j] pairs which sum's to desired value and it also appears my approach is totally naive and based on brute force. Looking for better perception to solve the problems and implement it in the code. Thank you.
This is a backtracking problem. General approach is :
a) Start with the "smallest" letter for e.g. 'a' and then recurse on all the available letters. If you find a string that sums to K then you have the answer because that will be the lexicographically smallest as we are finding it from smallest to largest letter.
b) If not found in 'a' move to the next letter.
Recurse/backtrack can be done as:
Start with a letter and the original value of K
explore for every j = 0 to t and reducing K by cost[i][j]
if K == 0 you found your string.
if K < 0 then that path is not possible, so remove the last letter in the string, try other paths.
Pseudocode :
string find_smallest() {
for (int i = 0; i < t; i++) {
s = (char)(i+97)
bool value = recurse(i,t,K,s)
if ( value ) return s;
s = ""
}
return ""
}
bool recurse(int i, int t, int K, string s) {
if ( K < 0 ) {
return false;
}
if ( K == 0 ) {
return true;
}
for ( int j = 0; j < t; j++ ) {
s += (char)(j+97);
bool v = recurse(j, t, K-cost[i][j], s);
if ( v ) return true;
s -= (char)(j+97);
}
return false;
}
In your implementation, you would probably need another vector of vectors of pairs to explore all your candidates. Also another vector for updating the current cost of each candidate as it builds up. Following this approach, things start to get a bit messy (IMO).
A more clean and understandable option (IMO again) could be to approach the problem with recursivity:
#include <iostream>
#include <vector>
#define K 3
using namespace std;
string exploreCandidate(int currentCost, string currentString, vector<vector<int>> &v)
{
if (currentCost == K)
return currentString;
int size = v.size();
int lastChar = (int)currentString.back() - 97; // get ASCII code
for (size_t j = 0; j < size; j++)
{
int nextTotalCost = currentCost + v[lastChar][j];
if (nextTotalCost > K)
continue;
string nextString = currentString + (char)(97 + j); // get ASCII char
string exploredString = exploreCandidate(nextTotalCost, nextString, v);
if (exploredString != "00") // It is a valid path
return exploredString;
}
return "00";
}
int main()
{
vector<vector<int>> v = {{2, 1}, {3, 4}};
int size = v.size();
string initialString = "00"; // reserve first two positions
for (size_t i = 0; i < size; i++)
{
for (size_t j = 0; j < size; j++)
{
initialString[0] = (char)(97 + i);
initialString[1] = (char)(97 + j);
string exploredString = exploreCandidate(v[i][j], initialString, v);
if (exploredString != "00") { // It is a valid path
cout << exploredString << endl;
return 0;
}
}
}
}
Let us begin from the main function:
We define our matrix and iterate over it. For each position, we define the corresponding sequence. Notice that we can use indices to get the respective character of the English alphabet, knowing that in ASCII code a=97, b=98...
Having this initial sequence, we can explore candidates recursively, which lead us to the exploreCandidate recursive function.
First, we want to make sure that the current cost is not the value we are looking for. If it is, we leave immediately without even evaluating the following iterations for candidates. We want to do this because we are looking for the lexicographically smallest element, and we are not asked to provide information about all the candidates.
If the cost condition is not satisfied (cost < K), we need to continue exploring our candidate, but not for the whole matrix but only for the row corresponding to the last character. Then we can encounter two scenarios:
The cost condition is met (cost = K): if at some point of recursivity the cost is equal to our value K, then the string is a valid one, and since it will be the first one we encounter, we want to return it and finish the execution.
The cost is not valid (cost > K): If the current cost is greater than K, then we need to abort this branch and see if other branches are luckier. Returning a boolean would be nice, but since we want to output a string (or maybe not, depending on the statement), an option could be to return a string and use "00" as our "false" value, allowing us to know whether the cost condition has been met. Other options could be returning a boolean and using an output parameter (passed by reference) to contain the output string.
EDIT:
The provided code assumes positive non-zero costs. If some costs were to be zero you could encounter infinite recursivity, so you would need to add more constraints in your recursive function.

Choose some numbers coding in Gray Code

I have to write a programm that shows some numbers coding in Gray Code. I already found an algorithm written in C++ in this page ( https://www.geeksforgeeks.org/given-a-number-n-generate-bit-patterns-from-0-to-2n-1-so-that-successive-patterns-differ-by-one-bit/ ).
But I want to create a new method to delete the numbers that have two "1" consecutively and have "1" in their extremity (left and right).
Example : for n = 3 we get this numbers :
000
001
011
010
110
111
101
100
Now I want to delete this numbers : 011 , 110 , 111 , 101 and show the other numbers remiding in the list.
My idea is to create a vector of vectors. Something like that for example when n = 3 : {{000},{001},{011},{010},{110},{111},{101},{100}}.
For the size it will be like this :
int m = pow(2,n);
int vector[m][n];
For example : vector[0][1] = {0} and vector[1][2] = {1} if I'm correct with the sizes.
Now to delete the numbers that have two "1" consecutively and have "1" in their extremity I can use this code :
while (i < m){
for (j=0; j<n-1; j++){
if (vector[i][j]==vector[i][j+1]==1 && vector[i][0]==vector[i][n-1]==1 )
i=i+1; //Don't show this number
else { cout <<vector[i][j] << endl; i=i+1; }
}
}
Now the problem is that I don't know how to store the result in the Gray Code written in C++ in my vectors, or maybe there is a way to compare between two numbers from this code without using vectors.
This is going to be extra work when you get to larger strings, and the code is not trivial to read. How about creating a simple mask? Shift a pair of contiguous 1 bits the length of the number (num).
mask = 0b11000 // fill in the correct quantity of 0s
end_mask = 0b10001
while mask > 1
if (num && mask) == mask
remove num from array
mask = mask >> 1
if num && end_mask == end_mask
remove num from array
Without using bit manipulation, which admittedly would be faster, since you have a vector of vectors, one way to perform the removal is to use std::adjacent_find using a predicate to find the adjacent 1's, and to use std::remove_if to remove those vectors matching the criteria of having adjacent 1's.
Here is an example:
#include <algorithm>
#include <vector>
#include <iostream>
#include <iterator>
bool findOnes(const std::vector<int>& v)
{
// less than 2 digits, so can't do anything
if ( v.size() < 2 )
return false;
// test extremes
if ( v.front() == 1 && v.back() == 1 )
return true;
// check if there are adjacent 1's
return std::adjacent_find(v.begin(), v.end(), [&](int n1, int n2)
{ return n1 == 1 && n2 == 1; }) != v.end();
}
int main()
{
//test
std::vector<std::vector<int>> vect = {{0,0,0},{0,0,1},{0,1,1},{0,1,0},{1,1,0},{1,1,1},{1,0,1},{1,0,0}};
// erase the vectors that match the criteria
vect.erase(std::remove_if(vect.begin(), vect.end(), findOnes), vect.end());
// show the final results
for ( auto& i : vect )
{
std::copy(i.begin(), i.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << "\n";
}
}
Live Example
Basically, if the adjacent_find does not find adjacent 1's, the iterator returned will be end(). Thus in the findOne predicate function, after doing the easy tests for size and the extreme values, adjacent_find takes over and does the rest.

Retrieving two highest values from vector

I have declared a vector that stores an area of all Bounding Boxes in a given frame. I then used a iterated this vector from beginning to end to retrieve value. I then sorted these values in ascending order (lowest -> highest).
Here is the Code i use:
double area = contourArea(boundingBoxArea);
list_area.push_back(area);
sort(list_area.begin(), list_area.end());
vector<double>::const_iterator area_iter;
int i = 0;
for(area_iter = list_area.begin(); area_iter != list_area.end(); area_iter++)
{
i++;
cout<<"Iterator " <<i<< " size is : " <<*area_iter<<endl;
}
My issue is that I am only interested in the last two values out of the set of numbers (2 highest values) but I cant really get my head around it to how should i go about it to achieve my goal.
Anyone out here has a suggestion or solution to my problem...?
Regards
Iterate over the list, compare the new value each time against the old highest value. If the new value is higher store it as the highest and have a second variable that that keeps the previous highest value.
Something like this .. may not be syntactically correct but here it is:-
int iCurrentHighest = 0;
int previousHighest = 0;
for(area_iter = list_area.begin(); area_iter != list_area.end(); area_iter++)
{
if(*area_iter>iCurrentHighest)
{
previousHighest = iCurrentHighest;
iCurrentHighest = *area_iter;
}
else
{
if ((*area_iter<iCurrentHighest) && (*area_iter>previousHighest))
{
previousHighest = *area_iter;
}
}
}
What you need is std::nth_element
double area = contourArea(boundingBoxArea);
list_area.push_back(area);
std::nth_element(list_area.begin(), list_area.begin() + 2, list_area.end(), std::greater<double>());
// first 2 elements in list_area are now 2 greatest
vector<double>::const_iterator area_iter;
int i = 0;
for(area_iter = list_area.begin(); area_iter != list_area.end(); area_iter++)
{
i++;
cout<<"Iterator " <<i<< " size is : " <<*area_iter<<endl;
}
This is a different version of the algorithm provided by #Rich above with slightly less comparisons (instead of testing each value with the largest and then the second largest, by reversing the order the expected number of comparisons can be reduced: if a number is smaller than the second largest it gets compared only once).
int first = 0;
int second = 0;
for (auto const & v : list) {
if (v > second) {
second = v;
if (second > first)
std::swap(first,second);
}
}
If there is no known value that is less than all others, the first and second values can be initialized with the first two values in the vector (in order).

Strategy to modify permutation algorithm to prevent duplicate printouts

I've been reviewing algorithms for practice, and I'm currently looking at a permutation algorithm that I quite like:
void permute(char* set, int begin, int end) {
int range = end - begin;
if (range == 1)
cout << set << endl;
else {
for(int i = 0; i < range; ++i) {
swap(&set[begin], &set[begin+i]);
permute(set, begin+1, end);
swap(&set[begin], &set[begin+i]);
}
}
}
I actually wanted to apply this to a situation where there will be many repeated characters though, so I need to be able to modify it to prevent the printing of duplicate permutations.
How would I go about detecting that I was generating a duplicate? I know I could store this in a hash or something similar, but that's not an optimal solution - I'd prefer one that didn't require extra storage. Can someone give me a suggestion?
PS: I don't want to use the STL permutation mechanisms, and I don't want a reference to another "unique permutation algorithm" somewhere. I'd like to understand the mechanism used to prevent duplication so I can build it into this in learn, if possible.
There is no general way to prevent arbitrary functions from generating duplicates. You can always filter out the duplicates, of course, but you don't want that, and for very good reasons. So you need a special way to generate only non-duplicates.
One way would be to generate the permutations in increasing lexicographical order. Then you can just compare if a "new" permatutation is the same as the last one, and then skip it. It gets even better: the algorithm for generating permutations in increasing lexicographical order given at http://en.wikipedia.org/wiki/Permutations#Generation_in_lexicographic_order doesn't even generate the duplicates at all!
However, that is not an answer to your question, as it is a different algorithm (although based on swapping, too).
So, let's look at your algorithm a little closer. One key observation is:
Once a character is swapped to position begin, it will stay there for all nested calls of permute.
We'll combine this with the following general observation about permutations:
If you permute a string s, but only at positions where there's the same character, s will remain the same. In fact, all duplicate permutations have a different order for the occurences of some character c, where c occurs at the same positions.
OK, so all we have to do is to make sure that the occurences of each character are always in the same order as in the beginning. Code follows, but... I don't really speak C++, so I'll use Python and hope to get away with claiming it's pseudo code.
We start by your original algorithm, rewritten in 'pseudo code':
def permute(s, begin, end):
if end == begin + 1:
print(s)
else:
for i in range(begin, end):
s[begin], s[i] = s[i], s[begin]
permute(s, begin + 1, end)
s[begin], s[i] = s[i], s[begin]
and a helper function that makes calling it easier:
def permutations_w_duplicates(s):
permute(list(s), 0, len(s)) # use a list, as in Python strings are not mutable
Now we extend the permute function with some bookkeeping about how many times a certain character has been swapped to the begin position (i.e. has been fixed), and we also remember the original order of the occurences of each character (char_number). Each character that we try to swap to the begin position then has to be the next higher in the original order, i.e. the number of fixes for a character defines which original occurence of this character may be fixed next - I call this next_fixable.
def permute2(s, next_fixable, char_number, begin, end):
if end == begin + 1:
print(s)
else:
for i in range(begin, end):
if next_fixable[s[i]] == char_number[i]:
next_fixable[s[i]] += 1
char_number[begin], char_number[i] = char_number[i], char_number[begin]
s[begin], s[i] = s[i], s[begin]
permute2(s, next_fixable, char_number, begin + 1, end)
s[begin], s[i] = s[i], s[begin]
char_number[begin], char_number[i] = char_number[i], char_number[begin]
next_fixable[s[i]] -= 1
Again, we use a helper function:
def permutations_wo_duplicates(s):
alphabet = set(s)
next_fixable = dict.fromkeys(alphabet, 0)
count = dict.fromkeys(alphabet, 0)
char_number = [0] * len(s)
for i, c in enumerate(s):
char_number[i] = count[c]
count[c] += 1
permute2(list(s), next_fixable, char_number, 0, len(s))
That's it!
Almost. You can stop here and rewrite in C++ if you like, but if you are interested in some test data, read on.
I used a slightly different code for testing, because I didn't want to print all permutations. In Python, you would replace the print with a yield, with turns the function into a generator function, the result of which can be iterated over with a for loop, and the permutations will be computed only when needed. This is the real code and test I used:
def permute2(s, next_fixable, char_number, begin, end):
if end == begin + 1:
yield "".join(s) # join the characters to form a string
else:
for i in range(begin, end):
if next_fixable[s[i]] == char_number[i]:
next_fixable[s[i]] += 1
char_number[begin], char_number[i] = char_number[i], char_number[begin]
s[begin], s[i] = s[i], s[begin]
for p in permute2(s, next_fixable, char_number, begin + 1, end):
yield p
s[begin], s[i] = s[i], s[begin]
char_number[begin], char_number[i] = char_number[i], char_number[begin]
next_fixable[s[i]] -= 1
def permutations_wo_duplicates(s):
alphabet = set(s)
next_fixable = dict.fromkeys(alphabet, 0)
count = dict.fromkeys(alphabet, 0)
char_number = [0] * len(s)
for i, c in enumerate(s):
char_number[i] = count[c]
count[c] += 1
for p in permute2(list(s), next_fixable, char_number, 0, len(s)):
yield p
s = "FOOQUUXFOO"
A = list(permutations_w_duplicates(s))
print("%s has %s permutations (counting duplicates)" % (s, len(A)))
print("permutations of these that are unique: %s" % len(set(A)))
B = list(permutations_wo_duplicates(s))
print("%s has %s unique permutations (directly computed)" % (s, len(B)))
print("The first 10 permutations :", A[:10])
print("The first 10 unique permutations:", B[:10])
And the result:
FOOQUUXFOO has 3628800 permutations (counting duplicates)
permutations of these that are unique: 37800
FOOQUUXFOO has 37800 unique permutations (directly computed)
The first 10 permutations : ['FOOQUUXFOO', 'FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUXOOF', 'FOOQUUXOFO', 'FOOQUUFXOO', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX']
The first 10 unique permutations: ['FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX', 'FOOQUUOFXO', 'FOOQUUOFOX', 'FOOQUUOXFO', 'FOOQUUOXOF']
Note that the permutations are computed in the same order than the original algorithm, just without the duplicates. Note that 37800 * 2! * 2! * 4! = 3628800, just like you would expect.
You could add an if statement to prevent the swap code from executing if it would swap two identical characters. The for loop is then
for(int i = 0; i < range; ++i) {
if(i==0 || set[begin] != set[begin+i]) {
swap(&set[begin], &set[begin+i]);
permute(set, begin+1, end);
swap(&set[begin], &set[begin+i]);
}
}
The reason for allowing the case i==0 is make sure the recursive call happens exactly once even if all the characters of the set are the same.
A simple solution is to change the duplicate characters randomly to characters that aren't already present. Then after permutation, change the characters back. Only accept a permutation if its characters are in order.
e.g. if you have "a,b,b"
you would have had the following:
a b b
a b b
b a b
b a b
b b a
b b a
But, if we start with a,b,b and note the duplicate b's, then we can change the second b to a c
now we have a b c
a b c - accept because b is before c. change c back to b to get a b b
a c b - reject because c is before b
b a c - accept as b a b
b c a - accept as b b a
c b a - reject as c comes before b.
c a b - reject as c comes before b.
OPTION 1
One option would be to use 256 bits of storage on the stack to store a bitmask of which characters you had tried in the for loop, and only to recurse for new characters.
OPTION 2
A second option is to use the approach suggested in the comments ( http://n1b-algo.blogspot.com/2009/01/string-permutations.html) and change the for loop to:
else {
char last=0;
for(int i = 0; i < range; ++i) {
if (last==set[begin+i])
continue;
last = set[begin+i];
swap(&set[begin], &set[begin+i]);
permute(set, begin+1, end);
swap(&set[begin], &set[begin+i]);
}
}
However, to use this approach you will also have to sort the characters set[begin],set[begin+1],...set[end-1] at the entry to the function.
Note that you have to sort every time the function is called. (The blog post does not seem to mention this, but otherwise you will generate too many results for an input string "aabbc". The problem is that the string does not stay sorted after swap is used.)
This is still not very efficient. For example, for a string containing 1 'a' and N 'b's this approach will end up calling the sort N times for an overall complexity of N^2logN
OPTION 3
A more efficient approach for long strings containing lots of repeats would be to maintain both the string "set" and a dictionary of how many of each type of character you have left to use. The for loop would change to a loop over the keys of the dictonary as these would be the unique characters that are allowed at that position.
This would have complexity equal to the number of output strings, and only a very small extra amount of storage to hold the dictionary.
Simply insert each element to a set. It automatically removes duplicates. Declare set s as global variable.
set <string>s;
void permute(string a, int l, int r) {
int i;
if (l == r)
s.insert(a);
else
{
for (i = l; i <= r; i++)
{
swap((a[l]), (a[i]));
permute(a, l+1, r);
swap((a[l]), (a[i])); //backtrack
}
}
}
Finally print using the function
void printr()
{
set <string> ::iterator itr;
for (itr = s.begin(); itr != s.end(); ++itr)
{
cout << '\t' << *itr;
}
cout << '\t' << *itr;
}
The key is not to swap the same character twice. So, you could use an unordered_set to memorize which characters have been swapped.
void permute(string& input, int begin, vector<string>& output) {
if (begin == input.size()){
output.push_back(input);
}
else {
unordered_set<char> swapped;
for(int i = begin; i < input.size(); i++) {
// Do not swap a character that has been swapped
if(swapped.find(input[i]) == swapped.end()){
swapped.insert(input[i]);
swap(input[begin], input[i]);
permute(input, begin+1, output);
swap(input[begin], input[i]);
}
}
}
}
You can go through your original code by hand, and you will find those cases where duplicates occur are "swapping with the character which has been swapped."
Ex: input = "BAA"
index = 0, i = 0, input = "BAA"
----> index = 1, i = 1, input = "BAA"
----> index = 1, i = 2, input = "BAA" (duplicate)
index = 0, i = 1, input = "ABA"
----> index = 1, i = 1, input = "ABA"
----> index = 1, i = 2, input = "AAB"
index = 0, i = 2, input = "AAB"
----> index = 1, i = 1, input = "AAB" (duplicate)
----> index = 1, i = 2, input = "ABA" (duplicate)