End of Line Word Counting (C++) - c++

I need to create a program that reads in a file, counts the words inside of it, and lists unique words with their frequency. The program considers any series of characters without spaces a word (so things like "hello." "hello" and ",.?" are all different words). I am having difficulty with using an if statement and adding a word at the end of the line to my word count. It counts the words that have spaces after them but not '/n'. This is the code I have for counting the words:
in.get(last);
in.get(current);
while(!in.eof())
{
if((current == ' ' && last != ' ') || (current == '/n' && last != ' ' && last != '/n'))
count++;
last = current;
in.get(current);
}

This is a painful way to do it... You are better off reading strings, which are automatically delimited by whitespace.
string word;
map<string,int> freq;
while( in >> word ) {
freq[word]++;
}
Note that in the example you gave, you used '/n', which should be '\n'. In my example, you don't even need it.

I would createca map,http://www.cplusplus.com/reference/map/map/, and if the word exists increment frequency otherwise add the word to the map.
This way you quickly check if the word exists, to have a unique list.

Related

Remove apostrophes that are at the ends but not the ones in the middle

I need to use a binary search tree to count the number of unique words in a file. Words like students and students' would make the count for the word students 2 as plurals are indistinguishable, but don't and dont are separate words. How would I go about this? I am thinking of removing the ending apostrophes of words before adding them into my tree but I'm not sure how to do so without removing all apostrophes, including the ones in the middle of words.
This is the code I am using to add the words into my tree:
while(std::cin.get(letter)) {
if(std::isalpha(letter) || letter == 39) {
letter = std::tolower(letter);
word.push_back(letter);
}
else {
if (word.size() != 0) {
tree.add(word);
word.clear();
}
}
If your goal is to remove all the ending apostrophes from a std::string, that can be easily done by using a loop (assuming word is not empty):
std::string word;
//...
while (!word.empty() && word.back() == '\'')
word.pop_back();
See std::string::pop_back.

How to use SAS to find the last sentence in a document?

I am trying to create a variable that contains the last sentence of a document.
The last sentence of the text can be separated by periods, question marks or exclamation points. The ending punctuation may be omitted.
if find(text, '.') >0 then last = strip(scan(text,-1,'.'));
else if find(text, '?') >0 then last = strip(scan(text,-1,'?'));
else if find(text, '!') >0 then last = strip(scan(text,-1,'!'));
SCAN(string, count <, character-list <, modifier>>)
Try
last = scan ( text, -1, '.?!', 't' );
The scan function will go from right to left when the count is negative. Use your sentence delimiters as the character-list to interpret a sentence as a 'word'. Use t modifier to trim the string before scanning.

return the i (index) word from string

in a given string, a function should return the word number i
char* getWord(char str[], int n )
so if str is "My,. name is Jeff", if I call getWord( 2 ) the return should be name
note that I can not use string.h
I tried counting the ' ' or the '.' between words, but it can get complicated when there are multiple of these coming one after the other
So what is the proper algorithm to this
you will need an outer loop which counts words, containing two inner loops. The first inner loop will be skipping whitespace characters. (The whitespace between words.) The second inner loop will be skipping non-whitespace characters. (The words themselves.)
Use strtok function passing all special chars as delimiters(e.g. ,.""). This way you can tokenize all words, and you can then easily return the word as per passed index.
Let me know if this works.
eg. strtok(str, delim) and so on.

String find, replace, and append algorithm in C++

I need to use getline(infile, aSentence) on 4 different sentences in a file and store them as strings. Then, I have to create an algorithm to move the first letter of every word to the last letter, then append "ay" the the word.
For example: "you may call me claptrap" will become "ouyay aymay allcay emay laptrapcay"
What's the best way to do this? I was thinking about using aSentence.find(" ") for the white space and aSentence.append to add "ay". I have no idea how to move the letter position though.
Hopefully this makes sense, thanks.
Code I have so far (incomplete, but it's the concept):
int characterIndex = 0;
char firstChar = sentence.at(characterIndex);
char currentChar = sentence.at(characterIndex);
while (currentChar != '.');
{
if(currentChar == ' ')
{
sentence.replace(characterIndex, "ay")
}
}
First thing is to write your function prototype
std::string toPigLatin(const std::string &english);
Now write a unit test for it. Pass in Hello world! and get back elloHay orlday! or whatevery you should get.
Now get it it through the unit test. If you pass an index variable i through the english, and append to the pig Latin, then i can be in three states, off-word (in whitespace or punctuation), on word, or on initial letter. We can have 1 letter words so we can go from on initial letter to off-word, but not from off-word to on-word, we have to go through the initial letter state.
When you enter the initial letter state, store the letter in a temporary. When you go into the off-word state, write it out and append an "ay". Otherwise write out the character you just read. Initial state is off-word.

C++ Find Word in String without Regex

I'm trying to find a certain word in a string, but find that word alone. For example, if I had a word bank:
789540132143
93
3
5434
I only want a match to be found for the value 3, as the other values do not match exactly. I used the normal string::find function, but that found matches for all four values in the word bank because they all contain 3.
There is no whitespace surrounding the values, and I am not allowed to use Regex. I'm looking for the fastest implementation of completing this task.
If you want to count the words you should use a string to int map. Read a word from your file using >> into a string then increment the map accordingly
string word;
map<string,int> count;
ifstream input("file.txt");
while (input.good()) {
input >> word;
count[word]++;
}
using >> has the benefit that you don't have to worry about whitespace.
All depends on the definition of words: is it a string speparated from others with a whitespace ? Or are other word separators (e.g. coma, dot, semicolon, colon, parenntheses...) relevant as well ?
How to parse for words without regex:
Here an accetable approach using find() and its variant find_first_of():
string myline; // line to be parsed
string what="3"; // string to be found
string separator=" \t\n,;.:()[]"; // string separators
while (getline(cin, myline)) {
size_t nxt=0;
while ( (nxt=myline.find(what, nxt)) != string::npos) { // search occurences of what
if (nxt==0||separator.find(myline[nxt-1])!=string::npos) { // if at befgin of a word
size_t nsep=myline.find_first_of(separator,nxt+1); // check if goes to end of wordd
if ((nsep==string::npos && myline.length()-nxt==what.length()) || nsep-nxt==what.length()) {
cout << "Line: "<<myline<<endl; // bingo !!
cout << "from pos "<<nxt<<" to " << nsep << endl;
}
}
nxt++; // ready for next occurence
}
}
And here the online demo.
The principle is to check if the occurences found correspond to a word, i.e. are at the begin of a string or begin of a word (i.e. the previous char is a separator) and that it goes until the next separator (or end of line).
How to solve your real problem:
You can have the fastest word search function: if ou use it for solving your problem of counting words, as you've explained in your comment, you'll waste a lot of efforts !
The best way to achieve this would certainly be to use a map<string, int> to store/updated a counter for each string encountered in the file.
You then just have to parse each line into words (you could use find_fisrst_of() as suggested above) and use the map:
mymap[word]++;