Find the most frequently word using hashmaps c++ - c++

I need to find the most frequently occurring word and return that value. I must use hash maps and the fuction would take a file name. This is what I've done so far but im very confused.
int most_frequent_word(string filename)
{
string words;
ifstream in(filename.c_str());
unordered_map<string, int> word_map;
while(in >> words)
{
for(int i = 0; i < 100; i++)
{
word_map[words[i]]++;
}
}
return words;
}
any help would be appreciated it. Thanks!

There are several issues in your code that might cause it to work not as expected.
First is for i loop. Why would you need taht loop at all? Leave it like this, you need to count words.
while(in >> words)
{
word_map[words]++;
}
Rename words to word, actually you are reading one word here in >> words.
The third is return statement. You cannot return string when it is declared that function returns int.
However there is nothing to return yet, because so far we only know the number each word occurred. Run a loop to find max value.
int result = 0;
for(unordered_map<string, int>::iterator it = word_map.begin(); it != word_map.end(); it++)
result = max(result, it->second);
return result;
Here word_map consists of pairs of a word and its number of occurrences. We need to iterate over all of these pairs looking for max occurrences. To do this we use iterator it.

I'm also confused!
for(int i = 0; i < 100; i++)
{
word_map[words[i]]++;
}
What are you doing here? Where does the 100 come from? Why do you care for single letters of your words (which is what words[i] gets you)?
If I understand your task correctly, wouldn't it suffice to
++word_map[words];
instead?
Also why do you return words? It's a string, and your function should return and int. Instead find the largest value in your map and you're done.

Related

How do you break a long string into words and iterate through each character of word and if they match increment a char count using stringstream

int MatchString::comparsion(string newq, string oldq){
//breaks down the string into the smaller strings
stringstream s1(newq);
stringstream s2(oldq);
string new_words;
string old_words;
int word_count = 0;
while(s1>>new_words&&s2>>old_words){
for(int i = 0; i<new_words.length();i++){
for(int j = 0; j<old_words.length();j++){
char a = new_words[i];
char b = old_words[j];
if(a == b){
char_count++;
}
else{
j++;
}
}//end of 2nd for
}//end of for
}
return char_count;
}
I'm currently trying to make a function that takes in two strings and breaks them down into words then into chars. Afterward, I try to compare the value of each char and see if they equal each other. And if they do I increment a char_count by 1. Else I increment j so I compare next char in string 2 with string 1. I need to use this char_count value later to develop another algorithm because I need it to calculate a percentage difference between the two strings which is why I return it at the end because including that calculation with this method would be a bit messy. However when cout the return value I get something completely wrong. I don't know what I'm doing wrong can you please help.
Your j++ under else in the for-loop is redundant, if I'm correct. Allow your for-loop to naturally advance its iterator, don't force it within else{}.

Partial string search in C++

Let's say I have a vector of strings called info that reads names of websites from a file one by one sequentially.
This is what I have that searches for names, by the complete name only:
int linearSearch(vector <string> inputs, string search_key){
for (int x=0; x<inputs.size(); x++){
if (search_key==inputs[x]){
return x;
}
}
return -1;
}
Now what if I wanted to COUNT the amount of websites with a particular word in it?
So if I had
apple.com
mac.com
macapple.com
applepie.com
potato.com
and I searched for "apple", it would return 3.
You can use string::find to perform a partial search of the string and store the value into a size_t variable.
Compare that to std::string::npos and increment count if they are not equal.
Here is an simple example using arrays not vector so you can learn and make modifications as required.
int main() {
string inputs[2] = {"stack overflow", "stack exchange"};
string search_key = "stack";
int count;
for(int i = 0; i <sizeof(inputs)/sizeof(inputs[0]); i++)
{
//npos returns -1. If substring is not found, find will return -1.
//if substring is found, condition fails and count is incremented
if (inputs[i].find(search_key) != string::npos)
count++;
}
cout << count;
return 0;
}
Here is the link for the code above. You can see that the output is 2 as expected as the word stack occurs twice in the inputs array.

independent things influence each other (I have no idea what is going on)

Sorry for the title, but I really have no idea what the problem is. The code looks like that (here it has no sense, but in the bigger project is has, so please, do not ask "why do you want to do....")
#include <iostream>
#include <vector>
#include <fstream>
using namespace std;
string sort (string slowo){
string litery = slowo;
for (int i=0; i<litery.length()-1; i++)
for (int j=0; j<litery.length()-1; j++)
if (litery[j]>litery[j+1])
swap(litery[j], litery[j+1]); // (3)
return litery;
}
int main()
{
fstream wordlist;
wordlist.open("wordlist_test",ios::in);
vector<string> words;
while (!wordlist.eof()){ // (4)
bool ok = true;
string word;
getline(wordlist,word);
string sorted = sort(word);
if (ok){
cout<<word<<endl; // (1)
words.push_back(word);
}
}
for (int i = 0; i<words.size(); i++){
cout<<words[i]<<endl; // (2)
}
}
There are for words in file "wordlist_tests". Program at the end should just write them to vector and write what's in vector into standard output. The problem is:
however line(1) proves that all words are ok
vector appears to be
empty in line (2)
now iteresting (probably just for me) part:
there are two ways to make it right:
I can just remove line(3) (however, if I am right, as the variable is passed to sort function through the value, it just swap two letters in independent variable; it has nothing to do with my vector), or:
I can change condition in while loop (4).
for example just like this:
int tmp = 0;
while (tmp < 5){
tmp++;
/..../
What is wrong with this code? What should I do write these words down to vector but still sort them and using this while loop? I cannot find the connection between this things (ok, I see that connection is variable word, but I do not know in what way). Any help appreciate.
What happens in swap() if one of the words is the empty sting ""?
If this happens, litery = "".
The condition in the loops will be to iterate from 0 to (unsigned) 0 - 1, which is a very large number.
You'll then execute if (litery[0] > litery[1])
litery[1] will access beyond the end of the empty string, which causes undefined behavior.
Let's fix this:
The common fix for this, is to iterate from 1 to string.length(). Here's an example:
string sort (string litery){
for (int i=1; i<litery.length(); i++)
for (int j=1; j<litery.length(); j++)
if (litery[j-1]>litery[j])
swap(litery[j-1], litery[j]);
return litery;
}

C++ Dynamic Array Inputs

I am using two dynamic arrays to read from a file. They are to keep track of each word and the amount of times it appears. If it has already appeared, I must keep track in one array and not add it into the other array since it already exists. However, I am getting blank spaces in my array when I meet a duplicate. I think its because my pointer continues to advance, but really it shouldn't. I do not know how to combat this. The only way I have was to use a continue; when I print out the results if the array content = ""; if (*(words + i) == "") continue;. This basically ignores those blanks in the array. But I think that is messy. I just want to figure out how to move the pointer back in this method. words and frequency are my dynamic arrays.
I would like guidance in what my problem is, rather than solutions.
I have now changed my outer loop to be a while loop, and only increment when I have found the word. Thank you WhozCraig and poljpocket.
Now this occurs.
Instead of incrementing your loop variable [i] every loop, you need to only increment it when a NEW word is found [i.e. not one already in the words array].
Also, you're wasting time in your inner loop by looping through your entire words array, since words will only exist up to index i.
int idx = 0;
while (file >> hold && idx < count) {
if (!valid_word(hold)) {
continue;
}
// You don't need to check past idx because you
// only have <idx> words so far.
for (int i = 0; i < idx; i++) {
if (toLower(words[i]) == toLower(hold)) {
frequency[i]++;
isFound = true;
break;
}
}
if (!isFound) {
words[idx] = hold;
frequency[idx] = 1;
idx++;
}
isFound = false;
}
First, to address your code, this is what it should probably look like. Note how we only increment i as we add words, and we only ever scan the words we've already added for duplicates. Note also how the first pass will skip the j-loop entirely and simply insert the first word with a frequency of 1.
void addWords(const std::string& fname, int count, string *words, int *frequency)
{
std::ifstream file(fname);
std::string hold;
int i = 0;
while (i < count && (file >> hold))
{
int j = 0;
for (; j<i; ++j)
{
if (toLower(words[j]) == toLower(hold))
{
// found a duplicate at j
++frequency[j];
break;
}
}
if (j == i)
{
// didn't find a duplicate
words[i] = hold;
frequency[i] = 1;
++i;
}
}
}
Second, to really address your code, this is what it should actually look like:
#include <iostream>
#include <fstream>
#include <map>
#include <string>
//
// Your implementation of toLower() goes here.
//
typedef std::map<std::string, unsigned int> WordMap;
WordMap addWords(const std::string& fname)
{
WordMap words;
std::ifstream inf(fname);
std::string word;
while (inf >> word)
++words[toLower(word)];
return words;
}
If it isn't obvious by now how a std::map<> makes this task easier, it never will be.
check out SEEK_CUR(). If you want to set the cursor back
The problem is a logical one, consider several situations:
Your algorithm does not find the current word. It is inserted at position i of your arrays.
Your algorithm does find the word. The frequency of the word is incremented along with i, which leaves you with blank entries in your arrays whenever there's a word which is already present.
To conclude, 1 works as expected but 2 doesn't.
My advice is that you don't rely on for loops to traverse the string but use a "get-next-until-end" approach which uses a while loop. With this, you can track your next insertion point and thus get rid of the blank entries.
int currentCount = 0;
while (file)
{
// your inner for loop
if (!found)
{
*(words + currentCount) = hold;
*(frequency + currentCount) = 1;
currentCount++;
}
}
Why not use a std::map?
void collect( std::string name, std::map<std::string,int> & freq ){
std::ifstream file;
file.open(name.c_str(), std::ifstream::in );
std::string word;
while( true ){
file >> word; // add toLower
if( file.eof() ) break;
freq[word]++;
}
file.close();
}
The problem with your solution is the use of count in the inner loop where you look for duplicates. You'll need another variable, say nocc, initially 0, used as limit in the inner loop and incremented whenever you add another word that hasn't been seen yet.

Creating own input masks

Basically I want to validate a string against a mask which is in the DB however to validate against it I need to assign a rule to that mask i.e [D] = 0<=10. So what I have done is retrieved that mask and extracted the [] from the letters and stored them in two different vectors, so my question is, that can you assign a rule to various cells with the vector
i.e
a[0] = 0<=10
a[1] = "H"
something along the lines of that below is my code bear in mind that the string in the top is not from the DB it is just a string i created assuming it is from the DB because the process will be the same
string s("[sh][a][mar][i]");
vector< vector<char> > Vect;
vector<char> vect;
int i = 0;
while(i < s.size()) {
if(s[i]=='[') {
i++;
vect.push_back(s[i]);
i++;
}
else if(s[i] == ']') {
i++;
Vect.push_back(vect);
vect.clear();
}
else {
vect.push_back(s[i]);
i++;
}
}
vector< vector<char> >::iterator it;
vector<char>::iterator itera;
vector<std::string> vectString;
for (it = Vect.begin() ; it != Vect.end() ; ++it ) {
string a;
for (itera = it->begin() ; itera != it->end() ; ++itera) {
cout << *itera;
a += *itera;
}
vectString.push_back(a);
}
I don't really understand your question, but here is what I can suggest:
Instead of using a huge if-else-if, try use std::string::find and std::string::substr to extract elements.
I don't really see the reason you transform std::string to char and then reverse it. Use std::string::find and std::string::substr to get vectString in one step may be a better idea.
std::string offers powerful functions, if you are not familiar with it, you may want to take a look at : http://www.cplusplus.com/reference/string/string/
Do you want to verify if a given string can be accepted by some regular expressions?
If this is the case, why don't you just use some regular expression objects which representing your rules and then check whether its matching result equals to your original string?