Remove anagrams from a string vector

Remove anagrams from a string vector - c++

I am trying to make a program which removes all anagrams from vector. I sorted them alphabetical and i put them in a class, hopefully it would be easier to delete de anagrams, but it didnt actually helped.
INPUT:{"have","ahve","frame","avhe","farme","code","abc","ehav"}
OUTPUT:{"code","abc"}
note- all words are lowercase
#include <iostream>
#include<vector>
#include<algorithm>
using namespace std;
struct Fraza{
string word;
string word_alph;
Fraza(){};
Fraza(string words,string word_alph){
this->word=move(word);
this->word_alph=move(word_alph);
}
void printverify() const{
cout<<word<<" is sorted "<<word_alph<<endl;
}
void print() const{
cout<<word<<endl;
}
};
int main() {
Fraza fraza [100];
vector<Fraza>text;
string tmp;
string tmpa;
// i am taking the input, store the words temporary in tmpa and sort tmpa, then i put them in struct
while(cin>>tmp){
tmpa=tmp;
sort(tmpa.begin(),tmpa.end());
text.emplace_back(Fraza(tmp,tmpa));
}
// trying to delete the words
for(int i=0;i<text.size();i++){
for(int j=i+1;j<text.size();j++) {
if (fraza[i].word_alph==fraza[j].word_alph) {
text.erase(text.begin()+i);
text.erase(text.begin()+j);
}
}
}
for(const auto& it:text)
it.printverify();
for(const auto& it:text)
it.print();
return 0;
}
so far i know that it might work comparing the sorted word. If 2 words are the same, i should try delete the words on the specific position

There are several bugs:
The Fraza constructor has a parameter called "words" that you don't use; this->word = std::move(word) moves from the member to itself. (Get familiar with the initialization list).
You have both an array fraza and a vector text. All the words are in text, but you compare the (empty) strings in fraza.
After you have erased the i:th string, the previously j:th string is the j-1:th string.
Fixing these bugs makes your program almost work, except it leaves behind one copy of anagrams that occur an odd number of times (since you remove an even number of them, there will always be one left).
I would approach this with a table.
First, encapsulate the sorting and give Fraza an ordering that makes anagrams equivalent:
class Fraza
{
public:
Fraza(const std::string& w)
: word(w), sorted(w)
{
std::sort(sorted.begin(), sorted.end());
}
bool operator< (const Fraza& other) const
{
return sorted < other.sorted;
}
const std::string& str() const { return word; }
private:
std::string word;
std::string sorted;
};
Then use std::map as a counting table:
int main()
{
// Count the words.
std::map<Fraza, int> table;
std::string word;
while (std::cin >> word)
{
table[word] += 1;
}
// Print the unique ones.
for (const auto& it: table)
{
if (it.second == 1)
{
std::cout << it.first.str() << std::endl;
}
}
}

Related

remove element by position in a vector<string> in c++

I have been trying to remove the value False and 0;0 from a vector<string> plan; containing the following
1003;2021-03-09;False;0;0;1678721F
1005;2021-03-05;False;0;0;1592221D
1005;2021-03-06;False;0;0;1592221D
1003;2021-03-07;False;0;0;1592221D
1003;2021-03-08;False;0;0;1592221D
1004;2021-03-09;False;0;0;1592221D
1004;2021-03-10;False;0;0;1592221D
1001;2021-03-11;False;0;0;1592221D
but the solutions I have found only work with int, and I tried the following
remove(plan.begin(), plan.end(), "False");
also with erase, but it didn't work
what is the mistake that I am making, or how should I do to eliminate the values that I want, which are in the position [2] [3] and [4], thanks for any help.

[Note: With the assumption 1003;2021-03-09;False;0;0;1678721F corresponding to a row inside std::vector<string>]
std::remove : Removes from the vector either a single element (position) or a range of elements ([first, last)).
In case std::vector<string> plan contains value False then it is removed.
std::vector < std::string > plan =
{
"1003","2021-03-09","False","0;0","1678721F"
};
std::remove(plan.begin(),plan.end(),"False");
In your case you need to remove given sub-string from each row of the plan. You need to iterate through all the rows to remove given value using std::string::erase.
std::vector < std::string > plan =
{
"1003;2021-03-09;False;0;0;1678721F",
"1005;2021-03-05;False;0;0;1592221D",
"1005;2021-03-06;False;0;0;1592221D",
"1003;2021-03-07;False;0;0;1592221D",
"1003;2021-03-08;False;0;0;1592221D",
"1004;2021-03-09;False;0;0;1592221D",
"1004;2021-03-10;False;0;0;1592221D",
"1001;2021-03-11;False;0;0;1592221D"};
for (auto & e:plan)
{
//As position of False;0;0; is at a fixed index, i.e: from index:16, 10 characters are removed
e.erase (16, 10);
}
To generalize, You can make use of std::String::find to find a sub-string and erase it.
void removeSubstrs(string& s, string p) {
string::size_type n = p.length();
for (string::size_type i = s.find(p);
i != string::npos;
i = s.find(p))
s.erase(i, n);
}
int
main ()
{
std::vector < std::string > plan =
{
"1003;2021-03-09;False;0;0;1678721F",
"1005;2021-03-05;False;0;0;1592221D",
"1005;2021-03-06;False;0;0;1592221D",
"1003;2021-03-07;False;0;0;1592221D",
"1003;2021-03-08;False;0;0;1592221D",
"1004;2021-03-09;False;0;0;1592221D",
"1004;2021-03-10;False;0;0;1592221D",
"1001;2021-03-11;False;0;0;1592221D"};
for (auto & e:plan)
{
removeSubstrs (e, ";False;0;0");
}
for (auto e:plan)
std::cout << e << std::endl;
return 0;
}

[Note: This answer assumes that each line corresponds to an element in the vector]
With the statement
remove(plan.begin(), plan.end(), "False");
you try to remove all elements from the vector that are equal to "False".
You need to iterate over the vector and erase the sub-string from each and every string in the vector.
For example you can use a range for loop to iterate over all the strings (or rather references to them), and then use the std::string functions find to find the sub-strings you want to remove and replace to replace the sub-strings with empty strings (i.e. nothing).

If you are sure that there is only one occurrence of "First" and "0;0" in your vector, you can use something like this:
std::string EraseFirstSubString(
const std::string & main_str,
const std::string & sub_str)
{
std::string new_main_str = main_str;
size_t pos = new_main_str.find(sub_str);
if (pos != std::string::npos)
{
new_main_str.erase(pos, sub_str.length());
}
return new_main_str;
}
int main()
{
std::vector<std::string> plan = {
"1003;2021-03-09;False;0;0;1678721F",
"1005;2021-03-05;False;0;0;1592221D",
"1005;2021-03-06;False;0;0;1592221D",
"1003;2021-03-07;False;0;0;1592221D",
"1003;2021-03-08;False;0;0;1592221D",
"1004;2021-03-09;False;0;0;1592221D",
"1004;2021-03-10;False;0;0;1592221D",
"1001;2021-03-11;False;0;0;1592221D"
};
for (std::string & str : plan)
{
str = EraseFirstSubString(str, "False");
str = EraseFirstSubString(str, "0;0");
}
};
But, if you think that you may have many occurrences of those sub-strings, you should improve a little bit your sub-string removing mechanism like this:
std::string EaraseSubStrings(
const std::string & main_str,
const std::string & sub_str)
{
std::string new_main_str = main_str;
size_t pos = new_main_str.find(sub_str);
while (pos != std::string::npos)
{
new_main_str.erase(pos, sub_str.length());
pos = new_main_str.find(sub_str);
}
return new_main_str;
}

If you already have a vector of individual std::string objects, you can easily use the operations that the strings library offers.
#include <algorithm>
#include <vector>
#include <string>
// before C++20 change constexpr to inline
constexpr void change(std::vector<std::string>& sv, std::string const& rem) {
for_each(beign(sv),end(sv), [&rem](std::string& s) {
s.erase(std::min(s.size(),s.find(rem)), rem.size());
});
}

Creating a custom comparator in C++

Background:
I got asked this question today in a online practice interview and I had a hard time figuring out a custom comparator to sort. Here is the question
Question:
Implement a document scanning function wordCountEngine, which receives a string document and returns a list of all unique words in it and their number of occurrences, sorted by the number of occurrences in a descending order. If two or more words have the same count, they should be sorted according to their order in the original sentence. Assume that all letters are in english alphabet. You function should be case-insensitive, so for instance, the words “Perfect” and “perfect” should be considered the same word.
The engine should strip out punctuation (even in the middle of a word) and use whitespaces to separate words.
Analyze the time and space complexities of your solution. Try to optimize for time while keeping a polynomial space complexity.
Examples:
input: document = "Practice makes perfect. you'll only
get Perfect by practice. just practice!"
output: [ ["practice", "3"], ["perfect", "2"],
["makes", "1"], ["youll", "1"], ["only", "1"],
["get", "1"], ["by", "1"], ["just", "1"] ]
My idea:
The first think I wanted to do was first get the string without punctuation and all in lower case into a vector of strings. Then I used an unordered_map container to store the string and a count of its occurrence. Where I got stuck was creating a custom comparator to make sure that if I have a string that has the same count then I would sort it based on its precedence in the actual given string.
Code:
#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
#include <sstream>
#include <iterator>
#include <numeric>
#include <algorithm>
using namespace std;
struct cmp
{
bool operator()(std::string& word1, std::string& word2)
{
}
};
vector<vector<string>> wordCountEngine( const string& document )
{
// your code goes here
// Step 1
auto doc = document;
std::string str;
remove_copy_if(doc.begin(), doc.end(), std::back_inserter(str),
std::ptr_fun<int, int>(&std::ispunct));
for(int i = 0; i < str.size(); ++i)
str[i] = tolower(str[i]);
std::stringstream ss(str);
istream_iterator<std::string> begin(ss);
istream_iterator<std::string> end;
std::vector<std::string> vec(begin, end);
// Step 2
std::unordered_map<std::string, int> m;
for(auto word : vec)
m[word]++;
// Step 3
std::vector<std::vector<std::string>> result;
for(auto it : m)
{
result.push_back({it.first, std::to_string(it.second)});
}
return result;
}
int main() {
std::string document = "Practice makes perfect. you'll only get Perfect by practice. just practice!";
auto result = wordCountEngine(document);
for(int i = 0; i < result.size(); ++i)
{
for(int j = 0; j < result[0].size(); ++j)
{
std::cout << result[i][j] << " ";
}
std::cout << "\n";
}
return 0;
}
If anyone can help me with learning how to build a custom comparator for this code I would really appreciate it.

You could use a std::vector<std::pair<std::string, int>>, with each pair representing one word and the number of occurrences of that word in the sequence. Using a vector will help to maintain the order of the original sequence when two or more words have the same count. Finally sort by occurrences.
#include <vector>
#include <algorithm>
#include <string>
#include <sstream>
std::vector<std::vector<std::string>> wordCountEngine(const std::string& document)
{
std::vector<std::pair<std::string, int>> words;
std::istringstream ss(document);
std::string word;
//Loop through words in sequence
while (getline(ss, word, ' '))
{
//Convert to lowercase
std::transform(word.begin(), word.end(), word.begin(), tolower);
//Remove punctuation characters
auto it = std::remove_if(word.begin(), word.end(), [](char c) { return !isalpha(c); });
word.erase(it, word.end());
//Find this word in the result vector
auto pos = std::find_if(words.begin(), words.end(),
[&word](const std::pair<std::string, int>& p) { return p.first == word; });
if (pos == words.end()) {
words.push_back({ word, 1 }); //Doesn't occur -> add it
}
else {
pos->second++; //Increment count
}
}
//Sort vector by word occurrences
std::sort(words.begin(), words.end(),
[](const std::pair<std::string, int>& p1, const std::pair<std::string, int>& p2) { return p1.second > p2.second; });
//Convert to vector<vector<string>>
std::vector<std::vector<std::string>> result;
result.reserve(words.size());
for (auto& p : words)
{
std::vector<std::string> v = { p.first, std::to_string(p.second) };
result.push_back(v);
}
return result;
}
int main()
{
std::string document = "Practice makes perfect. you'll only get Perfect by practice. just practice!";
auto result = wordCountEngine(document);
for (auto& word : result)
{
std::cout << word[0] << ", " << word[1] << std::endl;
}
return 0;
}
Output:
practice, 3
perfect, 2
makes, 1
youll, 1
only, 1
get, 1
by, 1
just, 1

In step2, try this:
std::vector<std::pair<std::pair<std::string, int>, int>> m;
Here, the pair stores the string and this index of its occurance, and the vector stores the pair and the count of its occurances. Write a logic, to sort according to the count first and then if the counts are same, then sort it according to the position of its occurance.
bool sort_vector(const std::pair<const std::pair<std::string,int>,int> &a, const std::pair<const std::pair<std::string,int>,int> &b)
{
if(a.second==b.second)
{
return a.first.second<b.first.second
// This will make sure that if the no of occurances of each string is same, then it will be sorted according to the position of the string
}
return a.second>b.second
//This will make sure that the strings are sorted in the order to return the string having higher no of occurances first.
}
You have to write a logic to count the number of occurrences and the index of occurrence of each word in the string.

Split a string inside a class constructor

I have a class with 2 data members: size and an array of ints (dynamically allocated). The purpose of the class is to create an array of a size and fill it with values. The task is to create a constructor that takes a string as its parameter, but the string looks like this: "12|13|14|15" etc. I have searched this but all the solutions are a little too complicated, as they involve vectors and we haven't started with vectors yet. I basically want to put these numbers into the array of ints, 1 by 1 and also find out the size of the array. How can I do that? I tried messing with getline and stringstream but that gave me a load of errors. My code looks like this.
#include <string>
#include <iostream>
#include <sstream>
using namespace std;
class IntArrays {
private:
static int objcount;
int size;
public:
int *arrayints;
const static int objcountf();
IntArrays(int);
IntArrays(const IntArrays &p){
size = p.size;
for (int i = 0;i <size;i++){
arrayints[i] = p.arrayints[i];
}
}
IntArrays(std::string f){
// ignore the other constructors, this is the constructor that is giving me trouble
int counter =0;
istringstream inStream(f);
string newstring;
while (getline(iss,newstring, '|')){
arrayints[counter] = stoi(newstring);
counter++;}
void enternums();
(note that this is only the header file, and that the current string constructor I have there does not work.

This code is my version. I prefer to use a vector rather than a raw array.
Class definition:
class IntArrays {
public:
IntArrays(const string&, const char&);
const vector<int>& data() { return _data; }
const int size() { return _data.size(); }
private:
vector<int> _data;
};
The following is the constructor implementation:
IntArrays::IntArrays(const string& str, const char& delimiter) {
string buff;
for(auto& n:str) {
if(n != delimiter) buff+=n; else
if(n == delimiter && buff != "") {
_data.push_back(stoi(buff));
buff = "";
}
}
if(buff != "") _data.push_back(stoi(buff));
}
And then we just use the class:
IntArrays a("1|4|9|6|69", '|');
vector<int> da = a.data();
IntArrays b("1,4,9,6,69", ',');
vector<int> db = b.data();

I will try to have a recursion for that =p
Sorry that I cannot provide a c++ version =p..
This is a java version i guess.
list parse(string data, list base) {
if (data.length > 0) {
string s = data.subStr(0,1);
if (s == "|") {
base.push(0); //set the initial value for a new value
} else {
int i = parseInt(s);
int result = base.pop()*10 + i; //recalculate the result
base.push(result);
}
return parse(data.subStr(1),base); //recursion
} else {
return base; //return the result
}
}

As Joachim pointed out, you do not initialize the pointer. Unfortunately, you do not have the size of the array before the allocation, so you are left with a few solutions:
process the input twice (really bad if you have a large number of entries); On the first pass, count the inputs. Then allocate the array, then read them again, into the allocated array.
read the inputs into a linked list; Each element in the list would hold a value, and the address of the next element.
Preallocate a block of memory and hope it is large enough to read the entire array. If it is not, reallocate a larger block and copy the already read values into it, then discard the initial block (this is what std::vector does).

C++ Finding Anagrams in words

I'm working on a program that looks at whether or not a particular word is an anagram using std:count however, I don't think my function logic is correct and I cannot seem to figure it out.
Assume there are the following words in the file:
Evil
Vile
Veil
Live
My code is as follows:
#include <iostream>
#include <vector>
#include <fstream>
#include <map>
using namespace std;
struct Compare {
std::string str;
Compare(const std::string& str) : str(str) {}
};
bool operator==(const std::pair<int, std::string>&p, const Compare& c) {
return c.str == p.second;
}
bool operator==(const Compare& c, const std::pair<int, std::string>&p) {
return c.str == p.second;
}
std::vector<std::string> readInput(ifstream& file)
{
std::vector<std::string> temp;
string word;
while (file >> word)
{
temp.push_back(word);
}
std::sort(temp.begin(), temp.end());
return temp;
}
int main(int argc, char *argv[]) {
string file = "testing.txt";
ifstream ss(file.c_str());
if(!ss.is_open())
{
cerr << "Cannot open the text file";
}
std::vector<std::string> words = readInput(ss);
std::map<int, std::string> wordsMap;
//std::map<std::string value, int key> values;
for(unsigned i=0; (i < words.size()); i++)
{
wordsMap[i] = words[i];
}
int count = std::count(wordsMap.begin(), wordsMap.end(), Compare("Evil"));
cout << count << endl;
}
I'm pretty sure it's just a case of my logic is wrong in the functions. I hope someone can help :)

The most simple approach would be
To check like following (pseudo code)
bool isAnagram(string s, string t) {return sort(s) == sort(t); }
So, use some think like following, no need of std::map
struct Compare {
std::string str;
Compare(const std::string& x) : str(x) {
std::sort(str.begin(),str.end()); std::transform(str.begin(),
str.end(),str.begin(), ::toupper);}
bool operator ()(const std::string& t)
{
std::string s= t;
std::transform(s.begin(), s.end(),s.begin(), ::toupper);
std::sort(s.begin(),s.end());
return s == str;
}
};
And then
int count = std::count_if(words.begin(), words.end(), Compare("Evil"));
See HERE

This is not the most efficient algorithm, but a quick change to your program that would work could be:
bool operator==(const std::pair<int, std::string>&p, const Compare& c) {
std::string a = c.str;
std::transform(a.begin(), a.end(), a.begin(), ::tolower);
std::sort(a.begin(), a.end());
std::string b = p.second;
std::transform(b.begin(), b.end(), b.begin(), ::tolower);
std::sort(b.begin(), b.end());
return a == b;
}

EDIT: It seems in your present code, you are checking whether the strings are exactly equal to each other (not anagrams).
INSTEAD:
For each word, make an array of 26 elements, each element corresponding to a letter of the alphabet. Parse each word character by character, and increase the count of the particular character in the respective array.
For example for evil, the array would be:
0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0. // It has 1's for letters e,v,i and l
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
You make this array for each word that you have. In your case, all the words will have the same array. You then compare these arrays element-wise and proceed accordingly.
Now you just need to see which words have the same corresponding array.
If you want to compare all the N words pair-wise, you can do so using two nested loops in O(N^2) complexity.
The complexity for comparing one pair is O(1).
Complexity of creating the arrays = O(L) where L is the length of the string.

Consider the following:
map<string, set<string>> anagrams;
for (auto word : words)
anagrams[sort(word)].insert(word);
const set<string>& find_anagrams(const string& word)
{
return anagrams[word];
}

When you have a lot of words that are relatively short (or if you can work with large number libs), then you can use a solution similar to what I wrote here -
Generate same unique hash code for all anagrams
Essentially - map each character to a unique prime number (doesn't have to be big, you can map the entire ABC into primes up to 101), and for each word multiply the primes received from it characters. Since multiplication is commutative, anagrams would give the same result, so you just compare that result, hash it, or do whatever you want
Keep in mind that for long words the values would grow pretty fast, so you might need a big numbers lib

Better solution of finding a pattern of a string?

I am trying to find an optimal way to find a pattern of a string and compare. For example, I have s1 = "red blue blue red red yellow", and s2 = "abbaac". This would match because they have the same pattern.
My thinking of doing this would be iterate through s1 and s2, use a vector container to record the corresponding place's count (for s1 would be corresponding word's count, and for s2 would be corresponding letter's count) and then compare.
This is really inefficient because I iterator through the whole s1 and s2. If s1 = "red blue red red red yellow" and s2 = "abbaac". After the third red, there is essentially no point to keep iterating it through.
So, any better idea on how to do this?
Code:
#include "stdafx.h"
#include <iostream>
#include <string>
#include <array>
#include <sstream>
#include <vector>
#include <algorithm>
using namespace std;
vector<int> findPattern(string pattern){
vector<int> counts;
for (int i = 0; i < pattern.size(); ++i){
counts.push_back(0);
int counter = 0;
for (int j = i + 1; j < pattern.size(); ++j){
if (pattern[i] == pattern[j]){
++counter;
}
counts[i] = counter;
}
}
return counts;
}
vector<int> findPatternLong(string pattern){
istringstream iss (pattern);
string word;
vector<string> v;
while (iss >> word){
v.push_back(word);
}
vector<int> counts2;
for (int i = 0; i < v.size(); ++i){
counts2.push_back(0);
int counter = 0;
for (int j = i + 1; j < v.size(); ++j){
if (v[i] == v[j]){
++counter;
}
counts2[i] = counter;
}
}
return counts2;
}
int main(int argc, char * argv[]){
vector<int> v1 = findPattern("abbaac");
vector<int> v2 = findPatternLong("red blue blue red red yellow");
if (v1.size() == v2.size()){
for (int i = 0; i < v1.size(); ++i){
if (v1[i] != v2[i]){
cout << "Unmatch" << endl;
return false;
}
}
cout << "match" << endl;
return true;
} else
cout << "Unmatch" << endl;
return 0;
}

#Tony beat me with same idea, but since I already typed this, here it goes :-)
First of all, don't worry so much about efficiency and focus on correctness: indeed, premature optimization is the root of all evil. Write test cases and make sure your code passes each one.
Second, I think I would start with a maps/dictionary D, and have a loop in which I'd parse one element of each string (a word in s1, let's call it "w" and a character in your s2, say "c"), choose one element as the key (say the "c" characters) and check if "c" already has an entry in the dictionary:
If we ran out of elements at the same time, the strings match
If we ran out of elements on one side, we know there's no match
If "c" doesn't have an entry in D, store the current values: D[c] = w;
else if "c" already has an entry, check if the entry matches the value found on the string: is D[c] == w? If it doesn't we know there's no match
If that code works, then optimization could start. In your example, maybe we could use a simple array instead of a dictionary because ASCII characters are a small finite set.

It's not the most efficient code, but close to simplest:
std::map<char, std::string> letter_to_word;
std::set<std::string> words_seen;
std::istringstream iss(s1);
std::string word;
for (std::string::size_t i = 0; i < s2.size(); ++i)
{
if (!(iss >> word))
return false; // more letters than words
std::string& expected_word = letter_to_word[s2[i]];
if (expected_word == "")
{
// if different letters require different words...
if (words_seen.find(word) != words_seen.end())
return false; // multiple letters for same word
words_seen.insert(word);
expected_word = word; // first time we've seen letter, remember associated word
}
else if (expected_word != word)
return false; // different word for same letter
}
return !(iss >> word); // check no surplus words

You don't need two vectors.
When processing the second string, compare the count of the first pattern, to the first entry. If it matches, keep going otherwise stop. Repeat for the rest of the patterns in the second string.
You don't need to store the pattern counts of the second string.

EDIT
I just read that the question had the patterns in a string and this answer pertains to comparing collections of varying types. I suppose the answer still holds a little water if the 2 input strings were first converted :)
I would not say this is the most efficient solution, but I like how it is extensible.
Firstly, there is the PatternResult class. It stores the result of a pattern:
class PatternResult {
private:
std::vector<int> result_;
public:
PatternResult(const std::vector<int>& result) : result_(result) {
};
bool operator == (const PatternResult& rhs) const {
if(result_.size() != rhs.result_.size())
return false;
else {
for(std::vector<int>::size_type r(0);
r < result_.size();
++r) {
if(result_[r] != rhs.result_[r])
return false;
};
return true;
};
};
}; // eo class PatternResult
It takes a vector of integers, the value of which denotes it's value. We overload == to compare two pattern results, meaning they have the same sequence irrespective of the source data.
Then we need a pattern counter that can assign the same sequence numbers, but take any type:
template<class T>
class PatternCounter {
private:
typedef std::vector<T> vec_type;
typedef std::map<T, int> map_type;
map_type found_;
int counter_;
public:
PatternCounter() : counter_(1) {
};
PatternResult count(const vec_type& input ){
std::vector<int> ret;
for(vec_type::const_iterator cit(input.begin());
cit != input.end();
++cit) {
if(found_.find(*cit) != found_.end()) {
ret.push_back(found_[*cit]);
} else {
found_[*cit] = counter_;
ret.push_back(counter_);
++counter_;
};
};
return PatternResult(ret);
};
};
And we're done. Test code:
std::vector<std::string> inp1;
inp1.push_back("red");
inp1.push_back("blue");
inp1.push_back("blue");
inp1.push_back("red");
inp1.push_back("yellow");
std::vector<char> inp2;
inp2.push_back('a');
inp2.push_back('b');
inp2.push_back('b');
inp2.push_back('a');
inp2.push_back('c');
PatternCounter<std::string> counter1;
PatternCounter<char> counter2;
PatternResult res1(counter1.count(inp1));
PatternResult res2(counter2.count(inp2));
if(res1 == res2) {
// pattern sequences are equal
};
Note this was quick and dirty, I am sure it could be made more efficient.

Basically, you want to check that the sequence follows the same order. You're not worried about what the sequence actually is: first second first first third is good enough. Now, you could do this with a container that maps a string to an int in some way. However, you would be storing copies of each string and you're ignoring the fact that you don't really care about string values. For tiny test cases, this wouldn't matter, but for a large sequence of long words, you're quickly chewing up memory when you don't need to.
So let's use the fact that we don't care about the string values or about storing them. If that's the case, we can use a hash function to transform our strings to simple size_t values with a fairly strong guarantee that they're going to be unique. However, the hashes are not sequential and we will need to retrieve the sequence based on the hash value. The simplest way to record their sequence is to map them to the size of the map for easy lookup. The last piece of the puzzle is to check that the hashes are in the same sequence.
I'm also assuming that you don't just want to compare a sentence with a word, but maybe 2 words or two sentences. Here's a quick C++11 sample that basically does the above and doesn't hold anything in memory unless it needs to.
Ofcourse, this can still be optimized more - for example, executing things parallel.
#include <iostream>
#include <vector>
#include <string>
#include <map>
#include <sstream>
/*
s1 = "red blue blue red red yellow"
s2 = "abbaac"
This would match because they have the same pattern.
*/
typedef std::map<size_t,size_t> hash_map;
typedef std::vector<std::string> wordlist;
size_t ordered_symbol( hash_map &h, std::string const& word )
{
std::hash<std::string> hash_fn;
size_t hash = hash_fn(word);
if(h.find(hash)==h.end())
{
size_t const sequence = h.size();
h[hash] = sequence;
return sequence;
}
return h[hash];
}
wordlist create_wordlist( std::string const& str )
{
if(str.find_first_of(' ') != std::string::npos)
{
wordlist w1;
std::stringstream sstr(str);
std::string s;
while(sstr>>s)
w1.push_back(s);
return w1;
}
wordlist w2;
for(auto i : str)
{
std::string s;
s.append(1,i);
w2.push_back(s);
}
return w2;
}
bool pattern_matches( std::string const& s1, std::string const& s2 )
{
wordlist const w1 = create_wordlist(s1);
wordlist const w2 = create_wordlist(s2);
if(w1.size()!=w2.size())
return false;
hash_map h1,h2;
for( size_t i = 0; i!=w1.size(); ++i)
if(ordered_symbol(h1,w1[i])!=ordered_symbol(h2,w2[i]))
return false;
return true;
}
void test( std::string const& s1, std::string const& s2 )
{
std::cout<<"["<<s1<<"] "
<<(pattern_matches(s1,s2)? "<==>" : "<=!=>")
<<"["<<s2<<"]\n";
}
int main()
{
test("red blue blue red red yellow","abbaac");
test("red blue blue red red yellow","first second second first first third");
test("abbaac","12211g");
test("abbaac","red blue blue red red yellow");
test("abbgac","red blue blue red red yellow");
return 0;
}
//Output:
//[red blue blue red red yellow] <==>[abbaac]
//[red blue blue red red yellow] <==>[first second second first first third]
//[abbaac] <==>[12211g]
//[abbaac] <==>[red blue blue red red yellow]
//[abbgac] <=!=>[red blue blue red red yellow]
EDIT: Here's a non C++11 version that should work on VS2010. However, since C++03 does not include a string hash function in the standard library, this example uses a hash function taken from stack overflow. A much better hash function to use would be this one if you have access to the boost libraries.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove anagrams from a string vector - c++

Related

remove element by position in a vector<string> in c++

Creating a custom comparator in C++

Split a string inside a class constructor

C++ Finding Anagrams in words

Better solution of finding a pattern of a string?

Categories

Resources