Replacing N-level for loops with recursion in C++ - c++

I have been trying for some time now to come up with a way to compute all the various combinations of strings of words for some time now. Unlike most methods for combining on the web though, the algorithm must produce every combination, including those in which all the combined elements aren't in a single combination. ie, if I am combining 'Hello', 'New' and 'World', the combinations I am looking for are:
HelloNewWorld
HelloNew
HelloWorld
Hello
NewWorld
New
World
A professor from my college did come up with a quick and dirty solution for doing just that, but it is using nested for loops.
#include <iostream>
#include <vector>
#include <array>
#include <string>
int main()
{
std::vector<std::array<std::string, 2>> vec(3);
vec[0] = {"Hello", ""};
vec[1] = {"New", ""};
vec[2] = {"World", ""};
for (int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++)
for (int k = 0; k < 2; k++)
std::cout << vec[0][i] + vec[1][j] + vec[2][k] << std::endl;
}
As you might imagine, I desired a way to make this actually somewhat usable and portable. I know that this is possible with recursion, I just don't know how to implement it. Optimally, I would like to make this tail-recursive if at all possible, as the plan is to compute very large combinations. What would be the best way to do this recursively, and would it be easy to make tail-recursive?

At each level it recurses both with and without the current word printing the result when it gets to the end of all the words:
#include <iostream>
#include <string>
#include <vector>
void recurse(std::vector<std::string> &values,size_t level,std::string str) {
if (level<values.size()) {
recurse(values,level+1,str+values[level]);
recurse(values,level+1,str);
} else {
std::cout<<str<<"\n";
}
}
int main(int argc, char*argv[]) {
if (argc<2)
std::cout<<argv[0]<<" <word> [<word> [...]]\n";
else {
std::vector<std::string> values;
for(int i=1;i<argc;++i) {
values.push_back(argv[i]);
}
recurse(values,0,"");
}
return 0;
}
Which, when run with ./a.out Hello New World produces:
HelloNewWorld
HelloNew
HelloWorld
Hello
NewWorld
New
World

You can do this pretty efficiently by using all combinations from k=1 to k=N for a vector of N elements. Using Howard Hinnant's library available here, you can use it fairly effectively. In my case, I've named the library sampling.h, which is the only external dependency and can be viewed in it's entirety here.
#include "sampling.h"
#include <iostream>
#include <vector>
/**
* This function can take any container that has a bidirectional
* iterator (std::list, std::deque, std::vector) that contains elements
* of type std::string or similar, that must implement an `operator+`
* and `operator<<` for printing.
*/
template <typename BiDirStringContainer>
void print_combinations(BiDirStringContainer& container)
{
auto first = container.begin();
auto last = container.end();
for (size_t i = 1; i <= container.size(); ++i) {
auto mid = first + i;
for_each_combination(first, mid, last, [](auto f, auto l) {
std::string w;
for (; f != l; ++f) {
w += *f;
}
std::cout << w << std::endl;
return false;
});
}
}
int main(void)
{
std::vector<std::string> words = {
"Hello",
"New",
"World",
};
print_combinations(words);
return 0;
}
Compiling this with the C++14 standard and running it outputs:
Hello
New
World
HelloNew
HelloWorld
NewWorld
HelloNewWorld
This is exactly what your post described. Since the lambda is a custom functor, and can store state, you can do whatever you would like with the combinations: store a copy, print them, etc.
This is dramatically faster than anything you can get in the standard library without major work, or from suggestions made for the standard library. For example, std::next_combination and std::next_permutation (the former was not included, but was suggested here). I highly suggest reading Howard Hinnant's entirely blog post: it is enlightening. The time complexity on his implementations, and brute speed beats most other suggestions. If you need high performance combinations or permutations, he's already done the work for you.

If i understand this correctly you want to generate all combinations of a string. in that case you can use a BFS along with a set and a queue to generate the combinations, I will try to explain.
Say your string is ABCD. You have a queue to which you add ABCD and a set to which you add ABCD now
while the queue is not empty
1) you pop the top element
2) you generate substrings of that popped element
a) if that substring is not in the set add it to the queue
to generate substrings in step 2 you do the following
for(int i =0;i<string.length();i++)
{
string substr1 = string.substr(0,i);
string substr2 = string.substr(i,string.length()-1);
string substring = substr1+substr2;
}
doing this on ABCD(the input string) will generate BCD,ACD and ABD and ABC. now add those 3 to the set and the queue
now, you add BCD,ACD and ABD to the set. Lets say BCD is queue.front(). You pop that and generate CD,BD and BC and add them to the set and queue. when you pop ACD next, you generate CD,AD and AC but now you wont add CD to the queue because it is in the set.
EDIT:
I see your issue, my answer works for a string but you can use the same principle on a vector<string> to generate all combinations ABCD would just beHello(A)World(B)...

If the only possibilities is for a word to appear or not appear, that makes two possibilities. So for n words you have 2^n combinations. So you just count through the 2^n numbers from 0 (including) to 2^n-1 (including) and map each bit to one word.
No recursion needed, just one counting for loop.

Related

Vector of set insert elements

I'm trying to write a function which will return vector of set type string which represent members of teams.
A group of names should be classified into teams for a game. Teams should be the same size, but this is not always possible unless n is exactly divisible by k. Therefore, they decided that the first mode (n, k) teams have n / k + 1 members, and the remaining teams have n / k members.
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <list>
typedef std::vector<std::set<std::string>>vek;
vek Distribution(std::vector<std::string>names, int k) {
int n = names.size();
vek teams(k);
int number_of_first = n % k;
int number_of_members_first = n / k + 1;
int number_of_members_remaining = n / k;
int l = 0;
int j = 0;
for (int i = 1; i <= k; i++) {
if (i <= number_of_first) {
int number_of_members_in_team = 0;
while (number_of_members_in_team < number_of_members_first) {
teams[l].insert(names[j]);
number_of_members_in_team++;
j++;
}
}
else {
int number_of_members_in_team = 0;
while (number_of_members_in_team < number_of_members_remaining) {
teams[l].insert(names[j]);
number_of_members_in_team++;
j++;
}
}
l++;
}
return teams;
}
int main ()
{
for (auto i : Distribution({"Damir", "Ana", "Muhamed", "Marko", "Ivan",
"Mirsad", "Nikolina", "Alen", "Jasmina", "Merima"
}, 3)) {
for (auto j : i)
std::cout << j << " ";
std::cout << std::endl;
}
return 0;
}
OUTPUT should be:
Damir Ana Muhamed Marko
Ivan Mirsad Nikolina
Alen Jasmina Merima
MY OUTPUT:
Ana Damir Marko Muhamed
Ivan Mirsad Nikolina
Alen Jasmina Merima
Could you explain me why names are not printed in the right order?
teams being a std::vector<...> supports random access via an index.
auto & team_i = teams[i]; (0 <= i < teams.size()), will give you an element of the vector. team_i is a reference to type std::set<std::list<std::string>>.
As a std::set<...> does not support random access via an index, you will need to access the elements via iterators (begin(), end() etc.), e.g.: auto set_it = team_i.begin();. *set_it will be of type std::list<std::string>.
Since std::list<...> also does not support random access via an index, again you will need to access it via iterators, e.g.: auto list_it = set_it->begin();. *list_it will be of type std::string.
This way it is possible to access every set in the vector, every list in each set, and every string in each list (after you have added them to the data structure).
However - using iterators with std::set and std::list is not as convenient as using indexed random access with std::vector. std::vector has additional benefits (simple and efficient implementation, continous memory block).
If you use std::vectors instead of std::set and std::list, vek will be defined as:
typedef std::vector<std::vector<std::vector<std::string>>> vek;
std::list being a linked list offers some benefits (like being able to add an element in O(1)). std::set guarentees that each value is present once.
But if you don't really need these features, you could make you code simpler (and often more efficient) if you use only std::vectors as your containers.
Note: if every set will ever contain only 1 list (of strings) you can consider to get rid of 1 level of the hirarchy, I.e. store the lists (or vectors as I suggested) directly as elements of the top-level vector.
UPDATE:
Since the question was changed, here's a short update:
In my answer above, ignore all the mentions of the std::list. So when you iterate on the set::set the elements are already std::strings.
The reason the names are not in the order you expect:
std::set keeps the elements sorted, and when you iterate it you will get the elements by that sorting order. See the answer here: Is the std::set iteration order always ascending according to the C++ specification?. Your set contains std::strings and the default sort order for them is alphabetically.
Using std::vector instead of std::set like I proposed above, will get you the result you wanted (std::vector is not sorted automatically).
If you want to try using only std::vector:
Change vek to:
typedef std::vector<std::vector<std::string>>vek;
And replace the usage of insert (to add an element to the set) with push_back to do the same for a vector.

STL algorithms for pairwise comparison and tracking max/longest sequence

Consider this fairly easy algorithmic problem:
Given an array of (unsorted) numbers, find the length of the longest sequence of adjacent numbers that are increasing. For example, if we have {1,4,2,3,5}, we expect the result to be 3 since {2,3,5} gives the longest increasing sequence of adjacent/contiguous elements. Note that for non-empty arrays, such as {4,3,2,1}, the minimum result will be 1.
This works:
#include <algorithm>
#include <iostream>
#include <vector>
template <typename T, typename S>
T max_adjacent_length(const std::vector<S> &nums) {
if (nums.size() == 0) {
return 0;
}
T maxLength = 1;
T currLength = 1;
for (size_t i = 0; i < nums.size() - 1; i++) {
if (nums[i + 1] > nums[i]) {
currLength++;
} else {
currLength = 1;
}
maxLength = std::max(maxLength, currLength);
}
return maxLength;
}
int main() {
std::vector<double> nums = {1.2, 4.5, 3.1, 2.7, 5.3};
std::vector<int> ints = {4, 3, 2, 1};
std::cout << max_adjacent_length<int, double>(nums) << "\n"; // 2
std::cout << max_adjacent_length<int, int>(ints) << "\n"; // 1
return 0;
}
As an exercise for myself, I was wondering if there is/are STL algorithm(s) that achieve the same effect, thereby (ideally) avoiding the raw for-loop I have. The motivation behind doing this is to learn more about STL algorithms, and practice using abstracted algorithms to make my code more general and reusable.
Here are my ideas, but they don't quite achieve what I'd like.
std::adjacent_find achieves the pairwise comparisons and can be used to find the index of a non-increasing pair, but doesn't easily facilitate the ability to keep a current and maximum length and compare the two. It could be possible to have those state variables as part of my predicate function, but that seems a bit wrong since ideally you'd like your predicate function to not have any side effects, right?
std::adjacent_difference is interesting. One could use it to construct a vector of the differences between adjacent numbers. Then, starting from the second element, depending on if the difference is positive or negative, we could again track the maximum number of consecutive positive differences seen. This is actually quite close to achieving what we'd like. See the example code below:
#include <numeric>
#include <vector>
template <typename T, typename S> T max_adjacent_length(std::vector<S> &nums) {
if (nums.size() == 0) {
return 0;
}
std::adjacent_difference(nums.begin(), nums.end(), nums.begin());
nums.erase(std::begin(nums)); // keep only differences
T maxLength = 1, currLength = 1;
for (auto n : nums) {
currLength = n > 0 ? (currLength + 1) : 1;
maxLength = std::max(maxLength, currLength);
}
return maxLength;
}
The problem here is that we lose out the const-ness of nums if we want to compute the difference, or we have to sacrifice space and create a copy of nums, which is a no-no given the original solution is O(1) space complexity already.
Is there an idea/solution that I have overlooked that achieves what I want in a succinct and readable manner?
In both your code snippets, you are iterating through a range (in the first version, with an index-based-loop, and in the second with a range-for loop). This is not really the kind of code you should be writing if you want to use the standard algorithms, which work with iterators into the range. Instead of thinking of a range as a collection of elements, if you start thinking in terms of pairs of iterators, choosing the right algorithms becomes easier.
For this problem, here's a reasonable way to write this code:
auto max_adjacent_length = [](auto const & v)
{
long max = 0;
auto begin = v.begin();
while (begin != v.end()) {
auto next = std::is_sorted_until(begin, v.end());
max = std::max(std::distance(begin, next), max);
begin = next;
}
return max;
};
Here's a demo.
Note that you were already on the right track in terms of picking a reasonable algorithm. This could be solved with adjacent_find as well, with just a little more work.

Manipulating array's values in a certain way

So I was asked to write a function that changes array's values in a way that:
All of the values that are the smallest aren't changed
if, let's assume, the smallest number is 2 and there is no 3's and 4's then all 5's are changed for 3's etc.
for example, for an array = [2, 5, 7, 5] we would get [2, 3, 4, 3], which generalizes to getting a minimal value of an array which remains unchanged, and every other minimum (not including the first one) is changed depending on which minimum it is. On our example - 5 is the first minimum (besides 2), so it is 2 (first minimum) + 1 = 3, 7 is 2nd smallest after 2, so it is 2+2(as it is 2nd smallest).
I've come up with something like this:
int fillGaps(int arr[], size_t sz){
int min = *min_element(arr, arr+sz);
int w = 1;
for (int i = 0; i<sz; i++){
if (arr[i] == min) {continue;}
else{
int mini = *min_element(arr+i, arr+sz);
for (int j = 0; j<sz; j++){
if (arr[j] == mini){arr[j] = min+w;}
}
w++;}
}
return arr[sz-1];
}
However it works fine only for the 0th and 1st value, it doesnt affect any further items. Could anyone please help me with that?
I don't quite follow the logic of your function, so can't quite comment on that.
Here's how I interpret what needs to be done. Note that my example implementation is written to be as understandable as possible. There might be ways to make it faster.
Note that I'm also using an std::vector, to make things more readable and C++-like. You really shouldn't be passing raw pointers and sizes, that's super error prone. At the very least bundle them in a struct.
#include <algorithm>
#include <set>
#include <unordered_map>
#include <vector>
int fillGaps (std::vector<int> & data) {
// Make sure we don't have to worry about edge cases in the code below.
if (data.empty()) { return 0; }
/* The minimum number of times we need to loop over the data is two.
* First to check which values are in there, which lets us decide
* what each original value should be replaced with. Second to do the
* actual replacing.
*
* So let's trade some memory for speed and start by creating a lookup table.
* Each entry will map an existing value to its new value. Let's use the
* "define lambda and immediately invoke it" to make the scope of variables
* used to calculate all this as small as possible.
*/
auto const valueMapping = [&data] {
// Use an std::set so we get all unique values in sorted order.
std::set<int> values;
for (int e : data) { values.insert(e); }
std::unordered_map<int, int> result;
result.reserve(values.size());
// Map minimum value to itself, and increase replacement value by one for
// each subsequent value present in the data vector.
int replacement = *values.begin();
for (auto e : values) { result.emplace(e, replacement++); }
return result;
}();
// Now the actual algorithm is trivial: loop over the data and replace each
// element with its replacement value.
for (auto & e : data) { e = valueMapping.at(e); }
return data.back();
}

How to cut off parts of a string, which every string in a collection has

My currently problem is the following:
I have a std::vector of full path names to files.
Now i want to cut off the common prefix of all string.
Example
If I have these 3 strings in the vector:
/home/user/foo.txt
/home/user/bar.txt
/home/baz.txt
I would like to cut off /home/ from every string in the vector.
Question
Is there any method to achieve this in general?
I want an algorithm that drops the common prefix of all string.
I currently only have an idea which solves this problem in O(n m) with n strings and m is the longest string length, by just going through every string with every other string char by char.
Is there a faster or more elegant way solving this?
This can be done entirely with std:: algorithms.
synopsis:
sort the input range if not already sorted. The first and last paths in the sorted range
will be the most dissimilar. Best case is O(N), worst case O(N + N.logN)
use std::mismatch to determine the larges common sequence between the
two most dissimilar paths [insignificant]
run through each path erasing the first COUNT characters where COUNT is the number of characters in the longest common sequence. O (N)
Best case time complexity: O(2N), worst case O(2N + N.logN) (can someone check that?)
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
std::string common_substring(const std::string& l, const std::string& r)
{
return std::string(l.begin(),
std::mismatch(l.begin(), l.end(),
r.begin(), r.end()).first);
}
std::string mutating_common_substring(std::vector<std::string>& range)
{
if (range.empty())
return std::string();
else
{
if (not std::is_sorted(range.begin(), range.end()))
std::sort(range.begin(), range.end());
return common_substring(range.front(), range.back());
}
}
std::vector<std::string> chop(std::vector<std::string> samples)
{
auto str = mutating_common_substring(samples);
for (auto& s : samples)
{
s.erase(s.begin(), std::next(s.begin(), str.size()));
}
return samples;
}
int main()
{
std::vector<std::string> samples = {
"/home/user/foo.txt",
"/home/user/bar.txt",
"/home/baz.txt"
};
samples = chop(std::move(samples));
for (auto& s : samples)
{
std::cout << s << std::endl;
}
}
expected:
baz.txt
user/bar.txt
user/foo.txt
Here's an alternate `common_substring' which does not require a sort. time complexity is in theory O(N) but whether it's faster in practice you'd have to check:
std::string common_substring(const std::vector<std::string>& range)
{
if (range.empty())
{
return {};
}
return std::accumulate(std::next(range.begin(), 1), range.end(), range.front(),
[](auto const& best, const auto& sample)
{
return common_substring(best, sample);
});
}
update:
Elegance aside, this is probably the fastest way since it avoids any memory allocations, performing all transformations in-place. For most architectures and sample sizes, this will matter more than any other performance consideration.
#include <iostream>
#include <vector>
#include <string>
void reduce_to_common(std::string& best, const std::string& sample)
{
best.erase(std::mismatch(best.begin(), best.end(),
sample.begin(), sample.end()).first,
best.end());
}
void remove_common_prefix(std::vector<std::string>& range)
{
if (range.size())
{
auto iter = range.begin();
auto best = *iter;
for ( ; ++iter != range.end() ; )
{
reduce_to_common(best, *iter);
}
auto prefix_length = best.size();
for (auto& s : range)
{
s.erase(s.begin(), std::next(s.begin(), prefix_length));
}
}
}
int main()
{
std::vector<std::string> samples = {
"/home/user/foo.txt",
"/home/user/bar.txt",
"/home/baz.txt"
};
remove_common_prefix(samples);
for (auto& s : samples)
{
std::cout << s << std::endl;
}
}
You have to search every string in the list. However you don't need to compare all the characters in every string. The common prefix can only get shorter, so you only need to compare with "the common prefix so far". I don't think this changes the big-O complexity - but it will make quite a difference to the actual speed.
Also, these look like file names. Are they sorted (bearing in mind that many filesystems tend to return things in sorted order)? If so, you only need to consider the first and last elements. If they are probably pr mostly ordered, then consider the common prefix of the first and last, and then iterate through all the other strings shortening the prefix further as necessary.
You just have to iterate over every string. You can only avoid iterating over the full length of strings needlessly by exploiting the fact, that the prefix can only shorten:
#include <iostream>
#include <string>
#include <vector>
std::string common_prefix(const std::vector<std::string> &ss) {
if (ss.empty())
// no prefix
return "";
std::string prefix = ss[0];
for (size_t i = 1; i < ss.size(); i++) {
size_t c = 0; // index after which the string differ
for (; c < prefix.length(); c++) {
if (prefix[c] != ss[i][c]) {
// strings differ from character c on
break;
}
}
if (c == 0)
// no common prefix
return "";
// the prefix is only up to character c-1, so resize prefix
prefix.resize(c);
}
return prefix;
}
void strip_common_prefix(std::vector<std::string> &ss) {
std::string prefix = common_prefix(ss);
if (prefix.empty())
// no common prefix, nothing to do
return;
// drop the common part, which are always the first prefix.length() characters
for (std::string &s: ss) {
s = s.substr(prefix.length());
}
}
int main()
{
std::vector<std::string> ss { "/home/user/foo.txt", "/home/user/bar.txt", "/home/baz.txt"};
strip_common_prefix(ss);
for (std::string &s: ss)
std::cout << s << "\n";
}
Drawing from the hints of Martin Bonner's answer, you may implement a more efficient algorithm if you have more prior knowledge on your input.
In particular, if you know your input is sorted, it suffices to compare the first and last strings (see Richard's answer).
i - Find the file which has the least folder depth (i.e. baz.txt) - it's root path is home
ii - Then go through the other strings to see if they start with that root.
iii - If so then remove root from all the strings.
Start with std::size_t index=0;. Scan the list to see if characters at that index match (note: past the end does not match). If it does, advance index and repeat.
When done, index will have the value of the length of the prefix.
At this point, I'd advise you to write or find a string_view type. If you do, simply create a string_view for each of your strings str with start/end of index, str.size().
Overall cost: O(|prefix|*N+N), which is also the cost to confirm that your answer is correct.
If you don't want to write a string_view, simply call str.erase(str.begin(), str.begin()+index) on each str in your vector.
Overall cost is O(|total string length|+N). The prefix has to be visited in order to confirm it, then the tail of the string has to be rewritten.
Now the cost of the breadth-first is locality, as you are touching memory all over the place. It will probably be more efficient in practice to do it in chunks, where you scan the first K strings up to length Q and find the common prefix, then chain that common prefix plus the next block. This won't change the O-notation, but will improve locality of memory reference.
for(vector<string>::iterator itr=V.begin(); itr!=V.end(); ++itr)
itr->erase(0,6);

How to think in Python after working in C++?

I'm brand new to Python and trying to learn it by replicating the following C++ function into python
// determines which words in a vector consist of the same letters
// outputs the words with the same letters on the same line
void equivalentWords(vector <string> words, ofstream & outFile) {
outFile << "Equivalent words\n";
// checkedWord is parallel to the words vector. It is
// used to make sure each word is only displayed once.
vector <bool> checkedWord (words.size(), false);
for(int i = 0; i < words.size(); i++) {
if (!checkedWord[i]){
outFile << " ";
for(int j = i; j < words.size(); j++){
if(equivalentWords(words[i], words[j], outFile)) {
outFile << words[j] << " ";
checkedWord[j] = true;
}
}
outFile << "\n";
}
}
}
In my python code (below), rather than having a second vector, I have a list ("words") of lists of a string, a sorted list of the chars in the former string (because strings are immutable), and a bool (that tells if the word has been checked yet). However, I can't figure out how to change a value as you iterate through a list.
for word, s_word, checked in words:
if not checked:
for word1, s_word1, checked1 in words:
if s_word1 == s_word:
checked1 = True # this doesn't work
print word1,
print ""
Any help on doing this or thinking more "Pythony" is appreciated.
Keeping things simple, this is O(N) complexity and should be sufficient if you don't have GBs of word data. Note that set() and dict() basically is a hashed index (free and builtin!).
index = {}
for word, s_word in words:
index[s_word] = index.get(s_word, []) + [word]
for similar_words in index.values():
print ' '.join(similar_words)
Don't know what you are using it for, but it might be of interest to you that in python 2.7 a Counter class was introduced in the collections module.
If you really want to keep your algorithm and update a boolean list (which you don't because that algorithm would do inefficient double loops), you would do it like this:
checked = [False] * len(words)
for i, (word, word_s) in enumerate(words):
if checked[i]:
continue
for j, (other, other_s) in enumerate(words[i:]):
if word_s == other_s:
print other,
checked[i+j] = True
print
I think the word you're looking for is Pythonic, here's a pythonic code sample for what you're tying to do, determine words that are equivalent, where equivalence is determined by having the same set of letters
import collections
def print_equivalent_words(words):
eq_words = defaultdict(list)
for word in words:
eq_words["".join(sorted(set(word)))].append(word)
for k,v in eq_words.items():
print(v)
I generally like catchmeifyoutry's answer, but I would personally tighten it up a bit further as
for word in set(words):
print word
Edit: My answer is a shorter but functionally equivalent form of catchmeifyoutry's original, pre-edited answer.
This is not the best algorithm to solve this problem (it's O(N^2) instead of O(N)), but here's a pythonic version of it. The method I've used is to replace your array of bits with a set that contains words you've already seen.
checked = set()
for i, word in enumerate(words):
if word in checked:
continue
to_output = [word]
for word2 in words[i + 1:]:
if equivalentWords(word, word2):
to_output.append(word2)
checked.add(word2)
print ' '.join(to_output)
Make words a list of objects:
class Word(object):
def __init__(self, word, s_word, checked=False):
self.word = word
self.s_word = s_word
self.checked = checked
....
for word1 in words:
if word1.s_word == word.s_word:
word1.checked = True
print word1.word
print
Based on the comment:
// determines which words in a vector consist of the same letters
// outputs the words with the same letters on the same line
I'm not quite sure that the original code works, and even if it does, I can't say I like it much. First of all, based on the loop nesting, it looks like the complexity is O(N2). Second, I can't figure out what it's doing well enough to be sure it really does what's stated above (it uses a three-parameter overload of equivalentWords, which seems to be missing, which makes it hard to say though).
Some of the Python solutions are a lot shorter and simpler -- to the point that I feel reasonably certain they simply don't work. A couple seem to simply print out unique words, which (at least as I interpret it) is not the intent at all.
Here's a version in C++ that does what I interpret the requirements to be:
#include <string>
#include <set>
#include <vector>
#include <algorithm>
#include <iostream>
#include <map>
std::string
sort_word(std::string word) {
std::sort(word.begin(), word.end());
return word;
}
namespace std {
std::ostream &
operator<<(std::ostream &os,
std::pair<std::string, std::set<std::string> >const &words)
{
std::copy(words.second.begin(), words.second.end(),
std::ostream_iterator<std::string>(os, "\t"));
return os;
}
}
void
equivalentWords(std::vector<std::string> const &words, std::ostream &os) {
typedef std::map<std::string, std::set<std::string> > word_list_t;
word_list_t word_list;
for (int i=0; i<words.size(); i++)
word_list[sort_word(words[i])].insert(words[i]);
std::copy(word_list.begin(), word_list.end(),
std::ostream_iterator<word_list_t::value_type>(os, "\n"));
}
int
main() {
std::vector<std::string> input;
std::copy(std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::back_inserter(input));
equivalentWords(input, std::cout);
return 0;
}
I think using that as a starting point for a Python version is more likely to produce a clean, working result.
I wouldn't say this is pythonic, but I'm quite proud of it.
import itertools
for _, to_output in itertools.groupby(sorted(words, key=sorted), sorted):
print ' '.join(to_output)