I'm working on a program that looks at whether or not a particular word is an anagram using std:count however, I don't think my function logic is correct and I cannot seem to figure it out.
Assume there are the following words in the file:
Evil
Vile
Veil
Live
My code is as follows:
#include <iostream>
#include <vector>
#include <fstream>
#include <map>
using namespace std;
struct Compare {
std::string str;
Compare(const std::string& str) : str(str) {}
};
bool operator==(const std::pair<int, std::string>&p, const Compare& c) {
return c.str == p.second;
}
bool operator==(const Compare& c, const std::pair<int, std::string>&p) {
return c.str == p.second;
}
std::vector<std::string> readInput(ifstream& file)
{
std::vector<std::string> temp;
string word;
while (file >> word)
{
temp.push_back(word);
}
std::sort(temp.begin(), temp.end());
return temp;
}
int main(int argc, char *argv[]) {
string file = "testing.txt";
ifstream ss(file.c_str());
if(!ss.is_open())
{
cerr << "Cannot open the text file";
}
std::vector<std::string> words = readInput(ss);
std::map<int, std::string> wordsMap;
//std::map<std::string value, int key> values;
for(unsigned i=0; (i < words.size()); i++)
{
wordsMap[i] = words[i];
}
int count = std::count(wordsMap.begin(), wordsMap.end(), Compare("Evil"));
cout << count << endl;
}
I'm pretty sure it's just a case of my logic is wrong in the functions. I hope someone can help :)
The most simple approach would be
To check like following (pseudo code)
bool isAnagram(string s, string t) {return sort(s) == sort(t); }
So, use some think like following, no need of std::map
struct Compare {
std::string str;
Compare(const std::string& x) : str(x) {
std::sort(str.begin(),str.end()); std::transform(str.begin(),
str.end(),str.begin(), ::toupper);}
bool operator ()(const std::string& t)
{
std::string s= t;
std::transform(s.begin(), s.end(),s.begin(), ::toupper);
std::sort(s.begin(),s.end());
return s == str;
}
};
And then
int count = std::count_if(words.begin(), words.end(), Compare("Evil"));
See HERE
This is not the most efficient algorithm, but a quick change to your program that would work could be:
bool operator==(const std::pair<int, std::string>&p, const Compare& c) {
std::string a = c.str;
std::transform(a.begin(), a.end(), a.begin(), ::tolower);
std::sort(a.begin(), a.end());
std::string b = p.second;
std::transform(b.begin(), b.end(), b.begin(), ::tolower);
std::sort(b.begin(), b.end());
return a == b;
}
EDIT: It seems in your present code, you are checking whether the strings are exactly equal to each other (not anagrams).
INSTEAD:
For each word, make an array of 26 elements, each element corresponding to a letter of the alphabet. Parse each word character by character, and increase the count of the particular character in the respective array.
For example for evil, the array would be:
0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0. // It has 1's for letters e,v,i and l
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
You make this array for each word that you have. In your case, all the words will have the same array. You then compare these arrays element-wise and proceed accordingly.
Now you just need to see which words have the same corresponding array.
If you want to compare all the N words pair-wise, you can do so using two nested loops in O(N^2) complexity.
The complexity for comparing one pair is O(1).
Complexity of creating the arrays = O(L) where L is the length of the string.
Consider the following:
map<string, set<string>> anagrams;
for (auto word : words)
anagrams[sort(word)].insert(word);
const set<string>& find_anagrams(const string& word)
{
return anagrams[word];
}
When you have a lot of words that are relatively short (or if you can work with large number libs), then you can use a solution similar to what I wrote here -
Generate same unique hash code for all anagrams
Essentially - map each character to a unique prime number (doesn't have to be big, you can map the entire ABC into primes up to 101), and for each word multiply the primes received from it characters. Since multiplication is commutative, anagrams would give the same result, so you just compare that result, hash it, or do whatever you want
Keep in mind that for long words the values would grow pretty fast, so you might need a big numbers lib
Related
I have been trying to remove the value False and 0;0 from a vector<string> plan; containing the following
1003;2021-03-09;False;0;0;1678721F
1005;2021-03-05;False;0;0;1592221D
1005;2021-03-06;False;0;0;1592221D
1003;2021-03-07;False;0;0;1592221D
1003;2021-03-08;False;0;0;1592221D
1004;2021-03-09;False;0;0;1592221D
1004;2021-03-10;False;0;0;1592221D
1001;2021-03-11;False;0;0;1592221D
but the solutions I have found only work with int, and I tried the following
remove(plan.begin(), plan.end(), "False");
also with erase, but it didn't work
what is the mistake that I am making, or how should I do to eliminate the values that I want, which are in the position [2] [3] and [4], thanks for any help.
[Note: With the assumption 1003;2021-03-09;False;0;0;1678721F corresponding to a row inside std::vector<string>]
std::remove : Removes from the vector either a single element (position) or a range of elements ([first, last)).
In case std::vector<string> plan contains value False then it is removed.
std::vector < std::string > plan =
{
"1003","2021-03-09","False","0;0","1678721F"
};
std::remove(plan.begin(),plan.end(),"False");
In your case you need to remove given sub-string from each row of the plan. You need to iterate through all the rows to remove given value using std::string::erase.
std::vector < std::string > plan =
{
"1003;2021-03-09;False;0;0;1678721F",
"1005;2021-03-05;False;0;0;1592221D",
"1005;2021-03-06;False;0;0;1592221D",
"1003;2021-03-07;False;0;0;1592221D",
"1003;2021-03-08;False;0;0;1592221D",
"1004;2021-03-09;False;0;0;1592221D",
"1004;2021-03-10;False;0;0;1592221D",
"1001;2021-03-11;False;0;0;1592221D"};
for (auto & e:plan)
{
//As position of False;0;0; is at a fixed index, i.e: from index:16, 10 characters are removed
e.erase (16, 10);
}
To generalize, You can make use of std::String::find to find a sub-string and erase it.
void removeSubstrs(string& s, string p) {
string::size_type n = p.length();
for (string::size_type i = s.find(p);
i != string::npos;
i = s.find(p))
s.erase(i, n);
}
int
main ()
{
std::vector < std::string > plan =
{
"1003;2021-03-09;False;0;0;1678721F",
"1005;2021-03-05;False;0;0;1592221D",
"1005;2021-03-06;False;0;0;1592221D",
"1003;2021-03-07;False;0;0;1592221D",
"1003;2021-03-08;False;0;0;1592221D",
"1004;2021-03-09;False;0;0;1592221D",
"1004;2021-03-10;False;0;0;1592221D",
"1001;2021-03-11;False;0;0;1592221D"};
for (auto & e:plan)
{
removeSubstrs (e, ";False;0;0");
}
for (auto e:plan)
std::cout << e << std::endl;
return 0;
}
[Note: This answer assumes that each line corresponds to an element in the vector]
With the statement
remove(plan.begin(), plan.end(), "False");
you try to remove all elements from the vector that are equal to "False".
You need to iterate over the vector and erase the sub-string from each and every string in the vector.
For example you can use a range for loop to iterate over all the strings (or rather references to them), and then use the std::string functions find to find the sub-strings you want to remove and replace to replace the sub-strings with empty strings (i.e. nothing).
If you are sure that there is only one occurrence of "First" and "0;0" in your vector, you can use something like this:
std::string EraseFirstSubString(
const std::string & main_str,
const std::string & sub_str)
{
std::string new_main_str = main_str;
size_t pos = new_main_str.find(sub_str);
if (pos != std::string::npos)
{
new_main_str.erase(pos, sub_str.length());
}
return new_main_str;
}
int main()
{
std::vector<std::string> plan = {
"1003;2021-03-09;False;0;0;1678721F",
"1005;2021-03-05;False;0;0;1592221D",
"1005;2021-03-06;False;0;0;1592221D",
"1003;2021-03-07;False;0;0;1592221D",
"1003;2021-03-08;False;0;0;1592221D",
"1004;2021-03-09;False;0;0;1592221D",
"1004;2021-03-10;False;0;0;1592221D",
"1001;2021-03-11;False;0;0;1592221D"
};
for (std::string & str : plan)
{
str = EraseFirstSubString(str, "False");
str = EraseFirstSubString(str, "0;0");
}
};
But, if you think that you may have many occurrences of those sub-strings, you should improve a little bit your sub-string removing mechanism like this:
std::string EaraseSubStrings(
const std::string & main_str,
const std::string & sub_str)
{
std::string new_main_str = main_str;
size_t pos = new_main_str.find(sub_str);
while (pos != std::string::npos)
{
new_main_str.erase(pos, sub_str.length());
pos = new_main_str.find(sub_str);
}
return new_main_str;
}
If you already have a vector of individual std::string objects, you can easily use the operations that the strings library offers.
#include <algorithm>
#include <vector>
#include <string>
// before C++20 change constexpr to inline
constexpr void change(std::vector<std::string>& sv, std::string const& rem) {
for_each(beign(sv),end(sv), [&rem](std::string& s) {
s.erase(std::min(s.size(),s.find(rem)), rem.size());
});
}
I'm currently trying to make a very fast anagram solver, and right now it's bottlenecked by the creation of the permutations. is there another way to do the whole program or to optimize the permutation creation?
here's my code:
#include <string>
#include <vector>
#include <algorithm>
#include <iostream>
#include <fstream>
#include <unordered_set>
#include <vector>
#include <boost/asio/thread_pool.hpp>
#include <boost/asio/post.hpp>
void get_permutations(std::string s, std::vector<std::string> &permutations)
{
std::sort(s.begin(), s.end());
do
{
permutations.push_back(s);
} while (std::next_permutation(s.begin(), s.end()));
}
void load_file(std::unordered_set<std::string> &dictionary, std::string filename)
{
std::ifstream words(filename);
std::string element;
while (words >> element)
{
std::transform(element.begin(), element.end(), element.begin(), ::tolower);
dictionary.insert(element);
}
}
void print_valid(const std::unordered_set<std::string>& dictionary, const std::vector<std::string>::const_iterator start, const std::vector<std::string>::const_iterator stop)
{
for (auto iter = start; iter != stop; iter++)
{
if (dictionary.contains(*iter) == true)
{
std::cout << *iter << "\n";
}
}
}
int main()
{
const std::string s = "asdfghjklq";
std::vector<std::string> permutations;
boost::asio::thread_pool pool(2);
std::cout << "Loading english dictionary\n";
std::unordered_set<std::string> dictionary;
load_file(dictionary, "words");
std::cout << "Done\n";
//std::cout << "Enter the anagram: ";
//getline(std::cin, s);
clock_t start = clock();
get_permutations(s, permutations);
//std::cout << permutations.size() << std::endl;
std::cout << "finished permutations\n";
if (permutations.size() > 500000)
{
std::cout << "making new\n";
for (size_t pos = 0; pos < permutations.size(); pos += (permutations.size() / 3))
{
boost::asio::post(pool, [&dictionary, &permutations, pos] { print_valid(dictionary, (permutations.begin() + pos), (permutations.begin() + pos + (permutations.size() /3) ) ); });
}
pool.join();
}
else
{
print_valid(dictionary, permutations.begin(), permutations.end());
}
clock_t finish = clock();
double time_elapsed = (finish - start) / static_cast<double>(CLOCKS_PER_SEC);
std::cout << time_elapsed << "\n";
std::cout << permutations.size() << std::endl;
return 0;
}
the creation of permutations is in get_permutations
the thread pooling was something to test for very large sets of permutations
Think about how you would go about this by hand - how do you check if two words are anagrams of each other?
e.g.: banana <-> aaannb
How would you solve this on a piece of paper? Would you create all 720 permutations and check if any one matches? Or is there an easier, more intuitive way?
So what makes a word an anagram of another word, i.e. what condition needs to be satisfied?
It's all about letter counts. If both words contain an equal amount of all letters, they're anagrams of each other.
e.g.:
banana -> 3x a, 2x n, 1x b
aaannb -> 3x a, 2x n, 1x b
same letter counts so they must be anagrams!
So armed with this knowledge can you construct an algorithm that doesn't require iterating all possible permutations?
Solution
I'd only recommend to read this once you've tried to come up with your own optimized algorithm
You just need to build a lookup-table of letter-counts to dictionary words, e.g.:
1x a, 1x n -> ["an"]
3x a, 1x b, 2x n -> ["banana", "nanaba"]
1x a, 1x p, 1x r, 1x t -> ["part", "trap"]
... etc ...
then you can decompose your search word as well into letter counts, e.g. banana -> 3x a, 1x b, 2x n and search for the decomposition in your lookup table.
The result will be the list of words from your dictionary you can build with the given collection of letters - aka all possible anagrams for the given string.
aussuming some kind of structure named letter_counts that contains the letter composition the algorithm could look like:
std::vector<std::string> find_anagrams(std::vector<std::string> const& dictionary, std::string const& wordToCheck) {
// build a lookup map for letter composition -> word
std::unordered_map<letter_counts, std::vector<std::string>> compositionMap;
for(auto& str : dictionary)
compositionMap[letter_counts{str}].push_back(str);
// get all words that are anagrams of the given one
auto it = compositionMap.find(letter_counts{wordToCheck});
// no matches in dictionary
if(it == compositionMap.end())
return {};
// list of all anagrams
auto result = it->second;
// remove workToCheck from result if it is present
result.erase(std::remove_if(result.begin(), result.end(), [&wordToCheck](std::string const& str) { return str == wordToCheck; }), result.end());
return result;
}
This will run in O(n) time and has a space-complexity of O(n), with n being the number of words in the dictionary.
(It would be armortized O(1) time if you don't include the construction of the compositionMap as part of the algorithm)
In comparison to a permutation-based approach, that has O(n!) time complexity (or how i like to call it O(scary)).
Here's a full code example that only deals with letters a-z, but you can easily modify letter_counts to make it work with other characters as well:
godbolt example
#include <string_view>
#include <cctype>
#include <vector>
#include <string>
#include <unordered_map>
#include <iostream>
struct letter_counts {
static const int num_letters = 26;
int counts[num_letters];
explicit letter_counts(std::string_view str) : counts{0} {
for(char c : str) {
c = std::tolower(c);
if(c >= 'a' && c <= 'z')
counts[c - 'a']++;
}
}
};
bool operator==(letter_counts const& lhs, letter_counts const& rhs) {
for(int i = 0; i < letter_counts::num_letters; i++) {
if(lhs.counts[i] != rhs.counts[i]) return false;
}
return true;
}
template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
std::hash<T> hasher;
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
namespace std {
template<>
struct hash<letter_counts> {
size_t operator()(const letter_counts& letterCounts) const
{
size_t result = 0;
auto hasher = std::hash<int>{};
for(int i : letterCounts.counts)
hash_combine(result, hasher(i));
return result;
}
};
}
std::vector<std::string> find_anagrams(std::vector<std::string> const& dictionary, std::string const& wordToCheck) {
// build a lookup map for letter composition -> word
std::unordered_map<letter_counts, std::vector<std::string>> compositionMap;
for(auto& str : dictionary)
compositionMap[letter_counts{str}].push_back(str);
// get all words that are anagrams of the given one
auto it = compositionMap.find(letter_counts{wordToCheck});
// no matches in dictionary
if(it == compositionMap.end())
return {};
// list of all anagrams
auto result = it->second;
// remove workToCheck from result if it is present
result.erase(std::remove_if(result.begin(), result.end(), [&wordToCheck](std::string const& str) { return str == wordToCheck; }), result.end());
return result;
}
int main() {
std::vector<std::string> dict = {
"banana",
"nanaba",
"foobar",
"bazinga"
};
std::string word = "aaannb";
for(auto& str : find_anagrams(dict, word)) {
std::cout << str << std::endl;
}
}
The permutation method you have is way too slow, especially since the number of permutations of a string of n distinct characters scales super-exponentially. Try something like hashing and an equality predicate, where the hash is based on the sorted string, and the equality predicated only tests if the sorted version of 2 strings are equal. You can use boost::unordered_map to create custom hash functions and add words which fit the anagram to the key set.
Note that the number of combinations have a tendency to become very large very quickly. Two words are anagrams if you sort the characters of both words alphabetically and then the sorted strings match up. Based on that fact I made the following example that puts a dictionary into a multimap where it is possible to find all anagrams of a word quickly. It does this by using the alphabetically sorted input string as key into the map.
Live demo : https://onlinegdb.com/fXUVZruwq
#include <algorithm>
#include <iostream>
#include <locale>
#include <map>
#include <vector>
#include <set>
// create a class to hold anagram information
class anagram_dictionary_t
{
public:
// create a dictionary based on an input list of words.
template<typename std::size_t N>
explicit anagram_dictionary_t(const std::string (&words)[N])
{
for (std::string word : words)
{
auto key = make_key(word);
std::string lower{ word };
to_lower(lower);
m_anagrams.insert({ key, lower});
}
}
// find all the words that match the anagram
auto find_words(const std::string& anagram)
{
// get the unique key for input word
// this is done by sorting all the characters in the input word alphabetically
auto key = make_key(anagram);
// lookup all the words with the same key in the dictionary
auto range = m_anagrams.equal_range(key);
// create a set of found words
std::set<std::string> words;
for (auto it = range.first; it != range.second; ++it)
{
words.insert(it->second);
}
// return the words
return words;
}
// function to check if two words are an anagram
bool is_anagram(const std::string& anagram, const std::string& word)
{
auto words = find_words(anagram);
return (words.find(word) != words.end());
}
private:
// make a unique key out of an input word
// all anagrams should map to the same key value
static std::string make_key(const std::string& word)
{
std::string key{ word };
to_lower(key);
// two words are anagrams if they sort to the same key
std::sort(key.begin(), key.end());
return key;
}
static void to_lower(std::string& word)
{
for (char& c : word)
{
c = std::tolower(c, std::locale());
}
}
std::multimap<std::string, std::string> m_anagrams;
};
int main()
{
anagram_dictionary_t anagram_dictionary{ {"Apple", "Apricot", "Avocado", "Banana", "Bilberry", "Blackberry", "Blueberry" } };
std::string anagram{ "aaannb"};
auto words = anagram_dictionary.find_words(anagram);
std::cout << "input word = " << anagram << "\n found words : ";
for (const auto& word : words)
{
std::cout << word << "\n";
}
return 0;
}
I am trying to make a program which removes all anagrams from vector. I sorted them alphabetical and i put them in a class, hopefully it would be easier to delete de anagrams, but it didnt actually helped.
INPUT:{"have","ahve","frame","avhe","farme","code","abc","ehav"}
OUTPUT:{"code","abc"}
note- all words are lowercase
#include <iostream>
#include<vector>
#include<algorithm>
using namespace std;
struct Fraza{
string word;
string word_alph;
Fraza(){};
Fraza(string words,string word_alph){
this->word=move(word);
this->word_alph=move(word_alph);
}
void printverify() const{
cout<<word<<" is sorted "<<word_alph<<endl;
}
void print() const{
cout<<word<<endl;
}
};
int main() {
Fraza fraza [100];
vector<Fraza>text;
string tmp;
string tmpa;
// i am taking the input, store the words temporary in tmpa and sort tmpa, then i put them in struct
while(cin>>tmp){
tmpa=tmp;
sort(tmpa.begin(),tmpa.end());
text.emplace_back(Fraza(tmp,tmpa));
}
// trying to delete the words
for(int i=0;i<text.size();i++){
for(int j=i+1;j<text.size();j++) {
if (fraza[i].word_alph==fraza[j].word_alph) {
text.erase(text.begin()+i);
text.erase(text.begin()+j);
}
}
}
for(const auto& it:text)
it.printverify();
for(const auto& it:text)
it.print();
return 0;
}
so far i know that it might work comparing the sorted word. If 2 words are the same, i should try delete the words on the specific position
There are several bugs:
The Fraza constructor has a parameter called "words" that you don't use; this->word = std::move(word) moves from the member to itself. (Get familiar with the initialization list).
You have both an array fraza and a vector text. All the words are in text, but you compare the (empty) strings in fraza.
After you have erased the i:th string, the previously j:th string is the j-1:th string.
Fixing these bugs makes your program almost work, except it leaves behind one copy of anagrams that occur an odd number of times (since you remove an even number of them, there will always be one left).
I would approach this with a table.
First, encapsulate the sorting and give Fraza an ordering that makes anagrams equivalent:
class Fraza
{
public:
Fraza(const std::string& w)
: word(w), sorted(w)
{
std::sort(sorted.begin(), sorted.end());
}
bool operator< (const Fraza& other) const
{
return sorted < other.sorted;
}
const std::string& str() const { return word; }
private:
std::string word;
std::string sorted;
};
Then use std::map as a counting table:
int main()
{
// Count the words.
std::map<Fraza, int> table;
std::string word;
while (std::cin >> word)
{
table[word] += 1;
}
// Print the unique ones.
for (const auto& it: table)
{
if (it.second == 1)
{
std::cout << it.first.str() << std::endl;
}
}
}
Background:
I got asked this question today in a online practice interview and I had a hard time figuring out a custom comparator to sort. Here is the question
Question:
Implement a document scanning function wordCountEngine, which receives a string document and returns a list of all unique words in it and their number of occurrences, sorted by the number of occurrences in a descending order. If two or more words have the same count, they should be sorted according to their order in the original sentence. Assume that all letters are in english alphabet. You function should be case-insensitive, so for instance, the words “Perfect” and “perfect” should be considered the same word.
The engine should strip out punctuation (even in the middle of a word) and use whitespaces to separate words.
Analyze the time and space complexities of your solution. Try to optimize for time while keeping a polynomial space complexity.
Examples:
input: document = "Practice makes perfect. you'll only
get Perfect by practice. just practice!"
output: [ ["practice", "3"], ["perfect", "2"],
["makes", "1"], ["youll", "1"], ["only", "1"],
["get", "1"], ["by", "1"], ["just", "1"] ]
My idea:
The first think I wanted to do was first get the string without punctuation and all in lower case into a vector of strings. Then I used an unordered_map container to store the string and a count of its occurrence. Where I got stuck was creating a custom comparator to make sure that if I have a string that has the same count then I would sort it based on its precedence in the actual given string.
Code:
#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
#include <sstream>
#include <iterator>
#include <numeric>
#include <algorithm>
using namespace std;
struct cmp
{
bool operator()(std::string& word1, std::string& word2)
{
}
};
vector<vector<string>> wordCountEngine( const string& document )
{
// your code goes here
// Step 1
auto doc = document;
std::string str;
remove_copy_if(doc.begin(), doc.end(), std::back_inserter(str),
std::ptr_fun<int, int>(&std::ispunct));
for(int i = 0; i < str.size(); ++i)
str[i] = tolower(str[i]);
std::stringstream ss(str);
istream_iterator<std::string> begin(ss);
istream_iterator<std::string> end;
std::vector<std::string> vec(begin, end);
// Step 2
std::unordered_map<std::string, int> m;
for(auto word : vec)
m[word]++;
// Step 3
std::vector<std::vector<std::string>> result;
for(auto it : m)
{
result.push_back({it.first, std::to_string(it.second)});
}
return result;
}
int main() {
std::string document = "Practice makes perfect. you'll only get Perfect by practice. just practice!";
auto result = wordCountEngine(document);
for(int i = 0; i < result.size(); ++i)
{
for(int j = 0; j < result[0].size(); ++j)
{
std::cout << result[i][j] << " ";
}
std::cout << "\n";
}
return 0;
}
If anyone can help me with learning how to build a custom comparator for this code I would really appreciate it.
You could use a std::vector<std::pair<std::string, int>>, with each pair representing one word and the number of occurrences of that word in the sequence. Using a vector will help to maintain the order of the original sequence when two or more words have the same count. Finally sort by occurrences.
#include <vector>
#include <algorithm>
#include <string>
#include <sstream>
std::vector<std::vector<std::string>> wordCountEngine(const std::string& document)
{
std::vector<std::pair<std::string, int>> words;
std::istringstream ss(document);
std::string word;
//Loop through words in sequence
while (getline(ss, word, ' '))
{
//Convert to lowercase
std::transform(word.begin(), word.end(), word.begin(), tolower);
//Remove punctuation characters
auto it = std::remove_if(word.begin(), word.end(), [](char c) { return !isalpha(c); });
word.erase(it, word.end());
//Find this word in the result vector
auto pos = std::find_if(words.begin(), words.end(),
[&word](const std::pair<std::string, int>& p) { return p.first == word; });
if (pos == words.end()) {
words.push_back({ word, 1 }); //Doesn't occur -> add it
}
else {
pos->second++; //Increment count
}
}
//Sort vector by word occurrences
std::sort(words.begin(), words.end(),
[](const std::pair<std::string, int>& p1, const std::pair<std::string, int>& p2) { return p1.second > p2.second; });
//Convert to vector<vector<string>>
std::vector<std::vector<std::string>> result;
result.reserve(words.size());
for (auto& p : words)
{
std::vector<std::string> v = { p.first, std::to_string(p.second) };
result.push_back(v);
}
return result;
}
int main()
{
std::string document = "Practice makes perfect. you'll only get Perfect by practice. just practice!";
auto result = wordCountEngine(document);
for (auto& word : result)
{
std::cout << word[0] << ", " << word[1] << std::endl;
}
return 0;
}
Output:
practice, 3
perfect, 2
makes, 1
youll, 1
only, 1
get, 1
by, 1
just, 1
In step2, try this:
std::vector<std::pair<std::pair<std::string, int>, int>> m;
Here, the pair stores the string and this index of its occurance, and the vector stores the pair and the count of its occurances. Write a logic, to sort according to the count first and then if the counts are same, then sort it according to the position of its occurance.
bool sort_vector(const std::pair<const std::pair<std::string,int>,int> &a, const std::pair<const std::pair<std::string,int>,int> &b)
{
if(a.second==b.second)
{
return a.first.second<b.first.second
// This will make sure that if the no of occurances of each string is same, then it will be sorted according to the position of the string
}
return a.second>b.second
//This will make sure that the strings are sorted in the order to return the string having higher no of occurances first.
}
You have to write a logic to count the number of occurrences and the index of occurrence of each word in the string.
I have strings like 7X1234 XY1236 NM1235. I want to sort this strings using last 4 numerical digits only ignoring the initial two alphabets. Also, I want to compare those numerical digits to see if they are sequential.
One way to achieve this I can think of is to split these strings between alphabets and numerals as (7X and 1234) and work lexical cast the numeral string to int and work on it. But, how can I associate the alphabet part again to the numeral part that is how to prefix 7X again to 1234 at the end when the numeral strings are sorted and compared in C++?
In short if I have 7X1234 XY1236 NM1235 BV1238 I need to get 7X1234 NM1235 XY1236 BV1238
I did not elaborate that I wanted to find out if the numerical part of strings are sequential. Right now when I have just ints like 1234 1236 1235 1238 I do something like below
std::vector<int> sortedDigits{1234 1235 1236 1238};
int count = 1;
int pos = 0;
std::vector<std::pair<int, int> > myVec;
myVec.push_back(std::make_pair(sortedDigits[pos], count));
for(size_t i = 1; i < sortedDigits.size(); ++i)
{
if(sortedDigits[i] != (sortedDigits[i-1] + 1))
{
count = 1;
myVec.push_back(std::make_pair(sortedDigits[i], count) );
++pos;
}
else
{
sortedDigits[pos].second = ++count;
}
}
So at the end I get (1234, 3) and (1238, 1)
I don't know how can I get something like this when strings are there?
Since the character encoded values of numerals are ordered in the same order as the numbers they represent, you can do string comparison on the last four digits:
#include <cstring>
#include <string>
// Requires: a.size() >= 2, b.size() >= 2
bool two_less(std::string const & a, std::string const & b)
{
return std::strcmp(a.data() + 2, b.data() + 2) < 0;
}
Now use sort with predicate:
#include <algorithm>
#include <vector>
std::vector<std::string> data { "7X1234", "YX1236" };
std::sort(data.begin(), data.end(), two_less);
In C++11, and in particular if you have no repeated use for this, you can also use a lambda directly in the sort call:
std::sort(data.begin(), data.end(),
[](std::string const & a, std::string const & b)
{ return std::strcmp(a.data() + 2, b.data() + 2) < 0; });
Then you can even make the number "2" a captured variable if you need to vary it.
Use qsort and provide a comparator function that indexes into the start of the string plus an offset of two, rather than directly from the beginning of the string.
For example your comparator function could look like this:
int compare (const void * a, const void * b)
{
char * a_cmp = ((char *)a)+2;
char * b_cmp = ((char *)b)+2;
return strcmp(a_cmp, b_cmp);
}
You can e.g make struct like this
struct combined{
string alph;
int numeral;
};
put these in a c++ standard container
and use the sort of algoritm with a user defined compare object.
You should create a class that encapsulates your string and which has an int and and string field. This class can overload the comparison operators.
class NumberedString
{
private:
int number;
string originalString;
public:
NumberedString(string original) { ... }
friend bool operator> (NumberedString &left, NumberedString &right);
friend bool operator<=(NumberedString &left, NumberedString &right);
friend bool operator< (NumberedString &left, NumberedString &right);
friend bool operator>=(NumberedString &left, NumberedString &right);
};
You can just define your comparator
bool mycomparator(const std::string& a, const std::string& b) {
return a.substr(2) < b.substr(2);
}
then you can sort your std::vector<std::string> passing mycomparator as third parameter.
In C++11 this is also a case in which an anonymous lambda is a good fit...
#include <vector>
#include <algorithm>
#include <string>
#include <iostream>
int main(int argc, const char *argv[])
{
std::vector<std::string> data = {"7X1234", "XY1236", "NM1235", "BV1238"};
std::sort(data.begin(), data.end(),
[](const std::string& a, const std::string& b) {
return a.substr(2) < b.substr(2);
});
for (auto x : data) {
std::cout << x << std::endl;
}
return 0;
}
If you're 100% sure that the strings in the array are in XX9999 format you can use instead
return strncmp(a.data()+2, b.data()+2, 4) < 0;
that is more efficient because doesn't require any memory allocation to do the comparison.
Use a std::map<int, std::string>, using the int value as key and the respective string as value. You can then simply iterate over the map and retrieve the strings; they will already be in sorted order.
How about something like this:
std::string str[] = { "7X1234", "XY1236", "NM1235" };
std::map<int, std::string> m;
for(s : str)
{
std::ostringstream ss(s.substr(2));
int num;
ss >> num;
m[num] = s;
}
for(i : m)
{
std::cout << i->second << " ";
}
std::cout << std::endl;
I just typed this in, so minor typos/bugs may be there, but principle should work.