How to count how many words are in line?Smarter way? - c++

How to find out how many words are in line? I now that method where you count how many there are spaces. But what if someone hit 2 spaces or start line with space.
Is there any other or smarter way to solve this?
And is there any remark on my way of solving it or my code?
I solved it like this:
#include <iostream>
#include <cctype>
#include <cstring>
using namespace std;
int main( )
{
char str[80];
cout << "Enter a string: ";
cin.getline(str,80);
int len;
len=strlen(str);
int words = 0;
for(int i = 0; str[i] != '\0'; i++) //is space after character
{
if (isalpha(str[i]))
{
if(isspace(str[i+1]))
words++;
}
}
if(isalpha(str[len]))
{
words++;
}
cout << "The number of words = " << words+1 << endl;
return 0;
}

The std one-liner is:
words= distance(istream_iterator<string>(istringstream(str)), istream_iterator<string>());

streams by default skip spaces (multiple also).
So if you do something like:
string word;
int numWords = 0;
while (cin >> word) ++numWords;
That should count the number of words for simple cases (not considering what the format of a word is, skipping spaces).
If you want per line, you could read first the line, create a stream from a string, and do a similar thing like this:
string line, word;
int wordCount = 0;
getline(cin, line);
stringstream lineStream(line);
while (lineStream >> word) ++wordCount;
You should not use cin.getline and should prefer the free function std::getline, which takes a string that can be grown up and prevents stack overflows (lol). Stick to the free function for better safety.

First, you need a very specific definition of "word." Most of the answers will give slightly different counts than your attempt because you're using different definitions of what constitutes a word. Your example specifically requires alpha characters in certain positions. The answers based on streams will allow any non-space character to be part of a word.
The general solution is to come up with a precise definition of a word, transform this into a regular expression or finite state machine, and then count each instance of a match.
Here's a sample state machine solution:
std::size_t CountWords(const std::string &line) {
std::size_t count = 0;
enum { between_words, in_word } state = between_words;
for (const auto c : line) {
switch (state) {
case between_words:
if (std::isalpha(c)) {
state = in_word;
++count;
}
break;
case in_word:
if (std::isspace(c)) state = between_words;
break;
}
}
return count;
}
Some test cases to consider (and that highlight the differences among the definitions of a word):
"" empty string
" " just spaces
"a"
" one "
"count two"
"hyphenated-word"
"\"That's Crazy!\" she said." punctuation between alpha characters and adjacent spaces
"the answer is 42" should the number count as a word?

Related

C++ : find word in a string, count how many times was found, then print meaning of the word

I'm doing the assignment and I'm at the end of my powers. Right now I can't figure out what's missing or what I could change.
I need the program to read me a file. If it finds the beginning of the search word, it lists the word and its meaning. If he finds it more than once, he writes only that word without meaning.
Right now, if the program finds more words, it writes the meaning for the first word and writes the word for the other words found.
I don't know what other cycle I could use. If you could help me, I would be grateful.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include<bits/stdc++.h>
using namespace std;
int main()
{
ifstream dictionary("dictionary.txt");
if(!dictionary.is_open()){
cout<< "File failed to open" << endl;
return 0;
}
int option;
cout << "1.<starting>" << endl;
cout << "4.<stop>" << endl;
cin >> option;
string find_word;
string word, meaning;
string line;
string found;
int count = 0;
if (option == 1)
{
cout << "Find the meaning of the word beginning with the characters:";
cin >> find_word;
while (getline(dictionary,line))
{
stringstream ss(line);
getline (ss, word, ';');
getline (ss, meaning, ';');
if (word.rfind(find_word, 0) != string::npos)
{
count++;
if (count <=1)
{
found = word + meaning;
cout << found << endl;
}
if (count >= 2)
{
found = word ;
cout << found << endl;
}
}
}
}
if (option == 4)
{
return 0;
}
dictionary.close();
return 0;
}
EDIT
dictionary.txt looks like this:
attention; attentionmeaning
attention; attentionmeaning2
computer; computermeaning
criminal; criminalmeaning
boat; boatmeaning
alien; alienmeaning
atter; meaning
.
.
etc.
For example input is:
Find the meaning of the word beginning with the characters: att
this is what i get now (output):
attention attentionmeaning
attention
atter
this is what i expect (desire output):
attention
attention
atter
if program find only one searching word it should write this:
Find the meaning of the word beginning with the characters: bo
output:
boat boatmeaning
As it was already suggested, while reading the file, you don't know if there will be more than one entries matching your search term. That being said, you need some intermediate structure to store all the matching entries.
After you have gathered all the results, you can easily check if the data contains more than one result, in which case you only print the "word" without the meaning. In case there is only one result, you can print the "word" together with its meaning.
The code for that could look something like this:
struct Entry {
std::string name;
std::string meaning;
bool startsWith(const std::string& str) {
return name.find(str) != std::string::npos;
}
};
Entry createEntry(const std::string& line) {
Entry entry;
std::stringstream ss(line);
std::getline(ss, entry.name, ';');
std::getline(ss, entry.meaning, ';');
return entry;
}
int main() {
std::string query = "att";
std::ifstream dictionary("dictionary.txt");
std::vector<Entry> entries;
std::string line;
while (std::getline(dictionary, line)) {
Entry entry = createEntry(line);
if (entry.startsWith(query)) {
entries.emplace_back(std::move(entry));
}
}
for (const Entry& entry : entries) {
std::cout << entry.name << (entries.size() > 1 ? "\n" : " " + entry.meaning + '\n');
}
}
This code could definitely be more optimized, but for the sake of simplicity, this should suffice.
Demo
The problem is that at the first time through the loop you do not know if there is one or more valid words that follow from your string. I would suggest you create an empty list outside the loop, and push all the word and meaning pairs that match onto the list. Then after if the size of the list is 1 you can output the word and meaning pair else use a for loop to loop through and just print the words.

Replacing word in position X of a string C++

I am trying to write a function in a program that will take a string, a word and an integer and use the int as the index value and the word as the replacement value. For example, if the string is "This is a test.", the word is "example", and the number is 4, then the result would be "This is an example". This is what I have so far (I had to make multiple copies of the string because eventually, I am going to be passing it into two other functions by reference instead of as value)Right now it is using the character index instead of the word index in order to replace. How do I fix that?
#include "pch.h"
#include<iostream>
#include<string>
#include<sstream>
using namespace std;
int main()
{
string Input = "";
string Word = "";
int Number = 0;
cout << "Pleas enter a string using only lower case letters. \n";
getline(cin, Input);
cout << "Please enter a word using only lower case lettersS. \n";
getline(cin, Word);
cout << "Please enter a number. \n";
cin >> Number;
string StringCopy1 = Input;
string StringCopy2 = Input;
string StringCopy3 = Input;
}
void stringFunctionValue(string StringCopy1, int Number, string Word)
{
StringCopy1.replace(Number, Word.length, Word);
return StringCopy1;
}
First thing you have to do is find the nth word.
The first thing to come to mind is using std::istringstream to pull the string apart with >> and a std::ostringstream to write the new string.
std::istringstream in(StringCopy1);
std::string token;
std::ostringstream out;
int count = 0;
while (in >> token) // while we can get more tokens
{
if (++count != number) // not the number of the token to replace
{
out << token << " "; // write the token
}
else
{
out << word << " "; // write the replacement word
}
}
return out.str();
While this is easy to write, it has two problems: It loses the correct type of whitespace in the string AND places an extra space on the end of the string. It's also kind of slow and uses a lot more memory than if you modify the string in place.
Use std::string::find_first_not_of to find the first non-whitespace character. This will be the start of the first word. Then use std::string::find_first_of to find the next whitespace character. This will be the end of the word. Alternate back and forth finding non-whitespace then whitespace until you find the beginning and ending of the nth word. std::string::replace that word. This approach requires you to write more and more complicated code, but is much more satisfying. This is why I outlined it rather than fully implementing it: To allow you the joy for yourself.
Note: void stringFunctionValue(string StringCopy1, int Number, string Word) gives you no way to provide the result back to the user. This makes for an unhelpful function. Consider returning a string rather than void.

Splitting sentences and placing in vector

I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.
This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.
You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.
I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.

How to input text containing less than 1000 words with spaces and punctuations?

I am able to input string using the following code:
string str;
getline(cin, str);
But I want to know how to put an upper limit on the number of words that can be given as input.
You cannot do what you are asking with just getline or even read. If you want to limit the number of words you can use a simple for loop and the stream in operator.
#include <vector>
#include <string>
int main()
{
std::string word;
std::vector<std::string> words;
for (size_t count = 0; count < 1000 && std::cin >> word; ++count)
words.push_back(word);
}
This will read up to 1000 words and stuff them into a vector.
getline() reads characters and has no notion of what a word is. The definition of a word is likely to change with context and language. You'll need to read a stream one character at a time, extracting words that match your definition of a word and stop when you have met your limit.
You can either read one character at a time, or only process 1000 characters from your string(s).
You may be able to set a limit on std::string and use that.
Following will read only count no words separated by spaces in a vector, discarding
others.
Here punctuations are also read as "word" is separated by spaces, you need to remove them from vector.
std::vector<std::string> v;
int count=1000;
std::copy_if(std::istream_iterator<std::string>(std::cin),
// can use a ifstream here to read from file
std::istream_iterator<std::string>(),
std::back_inserter(v),
[&](const std::string & s){return --count >= 0;}
);
Hope this program helps you out. This code handles input ofmultiple words in a single line as well
#include<iostream>
#include<string>
using namespace std;
int main()
{
const int LIMIT = 5;
int counter = 0;
string line;
string words[LIMIT];
bool flag = false;
char* word;
do
{
cout<<"enter a word or a line";
getline(cin,line);
word = strtok(const_cast<char*>(line.c_str())," ");
while(word)
{
if(LIMIT == counter)
{
cout<<"Limit reached";
flag = true;
break;
}
words[counter] = word;
word = strtok(NULL," ");
counter++;
}
if(flag)
{
break;
}
}while(counter>0);
getchar();
}
As of now, this program has the limit to accept only 5 words and put it in a string array.
Use the following function:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684961%28v=vs.85%29.aspx
You can specify the third argument to limit the amount of read characters.

Counting occurrences of letter in a file

I'm trying to count the number of times each letter appears in a file. When I run the code below it counts "Z" twice. Can anyone explain why?
The test data is:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
#include <iostream> //Required if your program does any I/O
#include <iomanip> //Required for output formatting
#include <fstream> //Required for file I/O
#include <string> //Required if your program uses C++ strings
#include <cmath> //Required for complex math functions
#include <cctype> //Required for letter case conversion
using namespace std; //Required for ANSI C++ 1998 standard.
int main ()
{
string reply;
string inputFileName;
ifstream inputFile;
char character;
int letterCount[127] = {};
cout << "Input file name: ";
getline(cin, inputFileName);
// Open the input file.
inputFile.open(inputFileName.c_str()); // Need .c_str() to convert a C++ string to a C-style string
// Check the file opened successfully.
if ( ! inputFile.is_open())
{
cout << "Unable to open input file." << endl;
cout << "Press enter to continue...";
getline(cin, reply);
exit(1);
}
while ( inputFile.peek() != EOF )
{
inputFile >> character;
//toupper(character);
letterCount[static_cast<int>(character)]++;
}
for (int iteration = 0; iteration <= 127; iteration++)
{
if ( letterCount[iteration] > 0 )
{
cout << static_cast<char>(iteration) << " " << letterCount[iteration] << endl;
}
}
system("pause");
exit(0);
}
As others have pointed out, you have two Qs in the input. The reason you have two Zs is that the last
inputFile >> character;
(probably when there's just a newline character left in the stream, hence not EOF) fails to convert anything, leaving a 'Z' in the global 'character' from the previous iteration. Try inspecting inputFile.fail() afterwards to see this:
while (inputFile.peek() != EOF)
{
inputFile >> character;
if (!inputFile.fail())
{
letterCount[static_cast<int>(character)]++;
}
}
The idiomatic way to write the loop, and which also fixes your 'Z' problem, is:
while (inputFile >> character)
{
letterCount[static_cast<int>(character)]++;
}
There are two Q's in your uppercase string. I believe the reason you get two counts for Z is that you should check for EOF after reading the character, not before, but I am not sure about that.
Well, others already have pointed out the error in your code.
But here is one elegant way you can read the file and count the letters in it:
struct letter_only: std::ctype<char>
{
letter_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
return &rc[0];
}
};
struct Counter
{
std::map<char, int> letterCount;
void operator()(char item)
{
if ( item != std::ctype_base::space)
++letterCount[tolower(item)]; //remove tolower if you want case-sensitive solution!
}
operator std::map<char, int>() { return letterCount ; }
};
int main()
{
ifstream input;
input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only leters only!
input.open("filename.txt");
istream_iterator<char> start(input);
istream_iterator<char> end;
std::map<char, int> letterCount = std::for_each(start, end, Counter());
for (std::map<char, int>::iterator it = letterCount.begin(); it != letterCount.end(); ++it)
{
cout << it->first <<" : "<< it->second << endl;
}
}
This is modified (untested) version of this solution:
Elegant ways to count the frequency of words in a file
For one thing, you do have two Q's in the input.
Regarding Z, #Jeremiah is probably right in that it is doubly counted due to it being the last character, and your code not detecting EOF properly. This can be easily verified by e.g. changing the order of input characters.
As a side note, here
for (int iteration = 0; iteration <= 127; iteration++)
your index goes out of bounds; either the loop condition should be iteration < 127, or your array declared as int letterCount[128].
Given that you apparently only want to count English letters, it seems like you should be able to simplify your code considerably:
int main(int argc, char **argv) {
std::ifstream infile(argv[1]);
char ch;
static int counts[26];
while (infile >> ch)
if (isalpha(ch))
++counts[tolower(ch)-'a'];
for (int i=0; i<26; i++)
std::cout << 'A' + i << ": " << counts[i] <<"\n";
return 0;
}
Of course, there are quite a few more possibilities. Compared to #Nawaz's code (for example), this is obviously quite a bit shorter and simpler -- but it's also more limited (e.g., as it stands, it only works with un-accented English characters). It's pretty much restricted to the basic ASCII letters -- EBCDIC encoding, ISO 8859-x, or Unicode will break it completely.
His also makes it easy to apply the "letters only" filtration to any file. Choosing between them depends on whether you want/need/can use that flexibility or not. If you only care about the letters mentioned in the question, and only on typical machines that use some superset of ASCII, this code will handle the job more easily -- but if you need more than that, it's not suitable at all.