Number of Words in a file, c++ [duplicate] - c++

This question already has answers here:
Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?
(5 answers)
Closed 7 years ago.
I am trying to count the number of words in a file, I know this question has been asked but I have tried some implementations I have seen but I keep getting an error.
The line in the file I am reading is "Super Chill" but when I run the code I get a count 3, where >> gets Super the first time and then Chill twice. I have a couple questions regarding this method:
1) what does While(in) look for? How does it know when to stop?
2) Why is "Chill" getting stored twice with >>?
Here is the code
int countWords(std::istream& in){ // line in file is -> Super Chill
int count = 0;
std::string word;
while (in) {
in >> word;
if (word != "") {
count+= 1;
}
}
return count;
}

while (in) checks if no error has occurred. It's the same as writing while (!in.fail())
After you call in >> word and get the first "Chill", while (in) still is true, until the next call to in >> word. When you hit in >> word again it fails because it's at the end of the file and doesn't write anything to word, but the word variable still has "Chill" in it from the last time, so you count it a second time. Then the while (in) finally fails on the next iteration.
Calling while (in >> word) { ++count; } works because in >> word is actually the function in.operator>>(word) which happens to return an istream&, and an istream has an operator bool method which allows you to use it in a condition instead of writing !in.fail(). Sort of roundabout, I know. Point is, it calls in >> word, then checks if (in) and if it passes then calls ++count; and iterates again. Versus your original technique which counted the previous word even if in >> word failed.
To make this clearer, it might help to know that changing your original code's if statement to if (in) would have also worked, but would be sort of bad code.
As a final conclusion, the entire function could be written as:
int countWords(std::istream& in) {
int count = 0;
for (std::string word; in >> word; ++count) {}
return count;
}

I see you've already gotten one solution to the problem you posted. You might want to consider another possibility though:
int countWords(std::istream& in){
return std::distance(std::istream_iterator<std::string>(in),
std::istream_iterator<std::string>());
}
This doesn't actually eliminate the loop, but it hides it inside of std::distance where it's pretty difficult to mess things up.

Related

infinite loop in the first function dont know how to fix [duplicate]

This question already has answers here:
Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?
(5 answers)
Closed 4 years ago.
Keep getting stuck in an infinite loop dont know where my logic went wrong
used while eof and dont know what else is missing, also break statement didnt do anything but print out my test statement once
void readSetupData(string sfile, string efile, string afile, students
studArray[])
{
ifstream inS(sfile.c_str());
ifstream inA(afile.c_str());
ifstream inE(efile.c_str());
int numStudents = 0;
while (!inS.eof())
{
cout << "BEEP" << endl;
int id;
int examScore;
string name;
inS >> name >> studArray[numStudents].id >>
studArray[numStudents].name;
int examId;
inE >> id >> examId >> examScore;
int studentIndex = findStudent(id, studArray);
int examIndex = findExam(examId,
studArray[studentIndex].examArray);
studArray[studentIndex].examArray[examIndex].pointsScored
=
examScore;
int pointsAvail =
studArray[studentIndex].examArray[examIndex].pointsAvail;
studArray[studentIndex].examArray[examIndex].percentage =
(float)examScore / (float)pointsAvail;
}
while (!inA.eof())
{
int id;
int assignId;
int assignScore;
inA >> id >> assignId >> assignScore;
int studentIndex = findStudent(id, studArray);
int assignIndex = findAssignment(assignId,
studArray[studentIndex].assignArray);
studArray[studentIndex].assignArray[assignIndex].pointsScored
= assignScore;
}
}
the first void function is the problem and the test statement BEEP is repeated when compiled and ran with
./a.out student_info.txt exam_info assignment_info.txt exam_scores.txt assignment_scores grades.out
You are expecting eof to predict the future and assure you that a subsequent read won't encounter an end-of-file. That's just not what it does. Don't use the eof function at all.
You have no handling for any errors except to repeat the operation that failed which, with near certainty, will fail again and repeat the cycle. Instead, check each and every operation to see whether it succeeded or failed. If it fails, don't just keep going because it will just fail again leading to an endless loop.
One of the first things you should do when you don't understand why your code is doing what it's doing is add error checking to every single library call or function call that can possibly error in any conceivable way.

Ifstream in c++

I need some help with a code.
I need to take this information to my c++ code from another file, the last one is just like this:
Human:3137161264 46
This is what I wrote for it, it takes the word "Human" correctly but then it takes random numbers, not the ones written on the file I just wrote:
struct TSpecie {
string id;
int sizeGen;
int numCs; };
__
TSpecie readFile(string file){
TSpecie a;
ifstream in(file);
if (in){
getline(in,a.id,':');
in >> a.sizeGen;
in >> a.numCs;
}
else
cout << "File not found";
return a; }
Hope you can solve it and thanks for your help
3137161264 causes integer overflow leading to Undefined Behaviour.
So unsigned int sizeGen would be enough for this case, but consider long long (unsigned) int sizeGen too.
Edit 1: As pointed out by #nwp in comments to your question, you can also check your stream if any error has occured:
//read something and then
if (!in) {
// error occured during last reading
}
Always test whether input was successful after reading from the stream:
if (std::getline(in, a.id, ':') >> a.sizeGen >> a.NumCs) {
// ...
}
Most likely the input just failed. For example, the first number probably can't be read successful. Also note that std::getline() is an unformatted input function, i.e., it won't skip leading whitespace. For example the newline after the last number read is still in the stream (at least, since your use of std::getline() finishes on a colon, it will only create an odd ID).

How to use input stream overloading to insert item to map member in class?

I have C++ class Question to hold data from a file questions.txt of multiple choice questions and answers:
update:
I have updated the &operator>> operator overload I have one:
it only insert first multiple choice question of 2 multiple choice questions"read the first Question "
Data in file questions.txt:
A programming language is used in this Course? 3
1. C
2. Pascal
3. C++
4. Assembly
What compiler can you use to compile the programs in this Course? 4
1. Dev-C++
2. Borland C++Builder
3. Microsoft Visual C++
4. All of the above
I'm trying to insert the multiple answers into a map. I just want to ask how to overload operator>> to iterate over multiple answers to insert them into a map:
#include <string>
#include <iostream>
#include <sstream>
#include <map>
using namespace std;
class Question
{
string question;
int correctIndex;
map<int,string> answers;
friend std::istream &operator>>(std::istream &is, Question &q) {
getline(is, q.question, '?'); // stops at '?'
is>> q.correctIndex;
string line;
while (getline(is, line) && line.empty()) // skip leading blank lines
;
while (getline(is,line) && !line.empty()) // read until blank line
{
int id;
string ans;
char pt;
stringstream sst(line); // parse the line;
sst>>id>>pt; // take number and the following point
if (!sst || id==0 || pt!='.')
cout << "parsing error on: "<<line<<endl;
else {
getline (sst, ans);
q.answers[id] = ans;
}
}
return is;
}
};
int main()
{
ifstream readFile("questions.txt");//file stream
vector<Question> questions((istream_iterator<Question>(readFile)), istream_iterator<Question>());
}
There are two issues with your code: skipping the first answer and reading through the end of the file.
In this pair of loops:
while (getline(is, line) && line.empty()) // skip leading blank lines
;
while (getline(is,line) && !line.empty()) // read until blank line
{
the first non-empty line will terminate the first loop, but then immediately you call getline() again without actually reading any of its contents. This skips the first answers choice. You'll want to make sure that you don't actually call getline() the first time. Something like...
// skip leading blank lines
while (getline(is, line) && line.empty()) {
;
}
for (; is && !line.empty(); getline(is, line)) {
// ...
}
But the second and bigger problem is if you read through the end of the file (as your code does right now) the last operator>> will cause the istream to eof(), which will disregard the last Question that you have streamed. This is tricky since you have a variable-length input stream - we don't know when we've run out of input until we've actually run out of input.
Thankfully, we can do everything quite a bit simpler. First, instead of reading off the end of the input to trigger the error, we'll use the first read to cause us to stop:
friend std::istream &operator>>(std::istream &is, Question &q) {
if (!getline(is, q.question, '?')) { // stops at '?'
return is;
}
This way, if we hit EOF early, we stop early. For the rest, we can simply the reading greatly by using skipws(). Instead of manually looping through the empty lines (which is hard to do right, as per your initial bug), we can let operator>> do this for us by just skipping ahead.
When we run out of things to read, we just back out of the error flags - since we don't want fail() (if we try to read the next index and it's actually the next question) or eof() (we're done) triggered.
Altogether:
friend std::istream &operator>>(std::istream &is, Question &q) {
if (!getline(is, q.question, '?')) { // stops at '?'
return is;
}
is >> q.correctIndex;
int id;
char pt;
string ans;
is >> skipws;
while (is >> id >> pt && getline(is, ans)) {
q.answers[id] = ans;
}
// keep the bad bit, clear the rest
is.clear(is.rdstate() & ios::badbit);
return is;
}
Now that's also a little incomplete. Perhaps you want to indicate error if you don't read into answers anything that matched correctIndex? In that case, you would set the ios::failbit too.
First improvement
When the operator>> is used for a string, it stops at the first blank separator. So for reading correctly the question you should consider:
friend std::istream &operator>>(std::istream &is, Question &q) {
getline(is, q.question, '?'); // stops at '?'
return is>> q.correctIndex;
... // to do: for the answers (see below)
}
You could consider a similar approach, for reading each question, starting with its id. Unfortunately, using operator>> on int will not allow us to detect the last answer: the reading attempt would fail with the start of a non-numeric text for the next question.
The problem with the format
The format that you use has some ambiguities:
Are blank lines mandatory and mark the begin and end of the answers ? In this case the last question is invalid : an end of answer is missing).
Or are the blank lines optional and have to be ignored ? In this case, the first char determines if it's the start of a new question (non numeric) or if it's a new answer (numeric)
Or is it always expected that there are exactly 4 answers for a question ?
Alternative 1: a blank line marks end of question
The idea is to read line by line and parsing each line separately:
...
string line;
while (getline(is, line) && line.empty()) // skip leading blank lines
;
do // read until blank line
{
int id;
string ans;
char pt;
streamstring sst(line); // parse the line;
sst>>id>>pt; // take number and the following point
if (!sst || id==0 || pt!='.')
cout << "parsing error on: "<<line<<endl;
else {
getline (sst, ans);
q.answers[id] = ans;
}
getline(is,line);
} while (getline(is, line) && !line.empty());
Attention: as per hypothesis: the missing end-of-answer blank line, will cause the reading of the last question to fail. Ideally, you'd issue an error message to clarify (e.g. unexpected end of file). Correcting the input file with an empty blank line will work (an empty line ended with a new line).
Alternative 2: test first char of line to see if it's still next answer
The other alternative peeks the first character to read in order to check if it is an answer (starts with a digit), an empty line (to be skipped) and if not, it exits the loop.
...
string line, ans;
int c, id;
char pt;
while ((c = is.peek())!=EOF && (isdigit(c) || c=='\n')) { // test first char without reading it
getline(is, line);
if (!line.empty()) {
stringstream sst(line);
... // parse the line as above
}
}
}
With this option, the requirement is that the answers ends with a newline (i.e. trailing '\n'). An unfinished line interrupted with an EOF will cause the last question to be ignored as failed.

Simple C++ not reading EOF

I'm having a hard time understanding why while (cin.get(Ch)) doesn't see the EOF. I read in a text file with 3 words, and when I debug my WordCount is at 3 (just what I hoped for). Then it goes back to the while loop and gets stuck. Ch then has no value. I thought that after the newline it would read the EOF and break out. I am not allowed to use <fstream>, I have to use redirection in DOS. Thank you so much.
#include <iostream>
using namespace std;
int main()
{
char Ch = ' ';
int WordCount = 0;
int LetterCount = 0;
cout << "(Reading file...)" << endl;
while (cin.get(Ch))
{
if ((Ch == '\n') || (Ch == ' '))
{
++WordCount;
LetterCount = 0;
}
else
++LetterCount;
}
cout << "Number of words => " << WordCount << endl;
return 0;
}
while (cin >> Ch)
{ // we get in here if, and only if, the >> was successful
if ((Ch == '\n') || (Ch == ' '))
{
++WordCount;
LetterCount = 0;
}
else
++LetterCount;
}
That's the safe, and common, way to rewrite your code safely and with minimal changes.
(Your code is unusual, trying to scan all characters and count whitespace and newlines. I'll give a more general answer to a slightly different question - how to read in all the words.)
The safest way to check if a stream is finished if if(stream). Beware of if(stream.good()) - it doesn't always work as expected and will sometimes quit too early. The last >> into a char will not take us to EOF, but the last >> into an int or string will take us to EOF. This inconsistency can be confusing. Therefore, it is not correct to use good(), or any other test that tests EOF.
string word;
while(cin >> word) {
++word_count;
}
There is an important difference between if(cin) and if(cin.good()). The former is the operator bool conversion. Usually, in this context, you want to test:
"did the last extraction operation succeed or fail?"
This is not the same as:
"are we now at EOF?"
After the last word has been read by cin >> word, the string is at EOF. But the word is still valid and contains the last word.
TLDR: The eof bit is not important. The bad bit is. This tells us that the last extraction was a failure.
The Counting
The program counts newline and space characters as words. In your file contents "this if fun!" I see two spaces and no newline. This is consistent with the observed output indicating two words.
Have you tried looking at your file with a hex editor or something similar to be sure of the exact contents?
You could also change your program to count one more word if the last character read in the loop was a letter. This way you don't have to have newline terminated input files.
Loop Termination
I have no explanation for your loop termination issues. The while-condition looks fine to me. istream::get(char&) returns a stream reference. In a while-condition, depending on the C++ level your compiler implements, operator bool or operator void* will be applied to the reference to indicate if further reading is possible.
Idiom
The standard idiom for reading from a stream is
char c = 0;
while( cin >> c )
process(c);
I do not deviate from it without serious reason.
you input file is
this is fun!{EOF}
two spaces make WordCount increase to 2
and then EOF, exit loop! if you add a new line, you input file is
this is fun!\n{EOF}
I took your program loaded it in to visual studio 2013, changed cin to an fstream object that opened a file called stuff.txt which contains the exact characters "This is fun!/n/r" and the program worked. As previous answers have indicated, be careful because if there's not a /n at the end of the text the program will miss the last word. However, I wasn't able to replicate the application hanging in an infinite loop. The code as written looks correct to me.
cin.get(char) returns a reference to an istream object which then has it's operator bool() called which returns false when any of the error bits are set. There are some better ways to write this code to deal with other error conditions... but this code works for me.
In your case, the correct way to bail out of the loop is:
while (cin.good()) {
char Ch = cin.get();
if (cin.good()) {
// do something with Ch
}
}
That said, there are probably better ways to do what you're trying to do.

Cleaning a string of punctuation in C++

Ok so before I even ask my question I want to make one thing clear. I am currently a student at NIU for Computer Science and this does relate to one of my assignments for a class there. So if anyone has a problem read no further and just go on about your business.
Now for anyone who is willing to help heres the situation. For my current assignment we have to read a file that is just a block of text. For each word in the file we are to clear any punctuation in the word (ex : "can't" would end up as "can" and "that--to" would end up as "that" obviously with out the quotes, quotes were used just to specify what the example was).
The problem I've run into is that I can clean the string fine and then insert it into the map that we are using but for some reason with the code I have written it is allowing an empty string to be inserted into the map. Now I've tried everything that I can come up with to stop this from happening and the only thing I've come up with is to use the erase method within the map structure itself.
So what I am looking for is two things, any suggestions about how I could a) fix this with out simply just erasing it and b) any improvements that I could make on the code I already have written.
Here are the functions I have written to read in from the file and then the one that cleans it.
Note: the function that reads in from the file calls the clean_entry function to get rid of punctuation before anything is inserted into the map.
Edit: Thank you Chris. Numbers are allowed :). If anyone has any improvements to the code I've written or any criticisms of something I did I'll listen. At school we really don't get feed back on the correct, proper, or most efficient way to do things.
int get_words(map<string, int>& mapz)
{
int cnt = 0; //set out counter to zero
map<string, int>::const_iterator mapzIter;
ifstream input; //declare instream
input.open( "prog2.d" ); //open instream
assert( input ); //assure it is open
string s; //temp strings to read into
string not_s;
input >> s;
while(!input.eof()) //read in until EOF
{
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
}
input.close(); //close instream
for(mapzIter = mapz.begin(); mapzIter != mapz.end(); mapzIter++)
cnt = cnt + mapzIter->second;
return cnt; //return number of words in instream
}
void clean_entry(const string& non_clean, string& clean)
{
int i, j, begin, end;
for(i = 0; isalnum(non_clean[i]) == 0 && non_clean[i] != '\0'; i++);
begin = i;
if(begin ==(int)non_clean.length())
return;
for(j = begin; isalnum(non_clean[j]) != 0 && non_clean[j] != '\0'; j++);
end = j;
clean = non_clean.substr(begin, (end-begin));
for(i = 0; i < (int)clean.size(); i++)
clean[i] = tolower(clean[i]);
}
The problem with empty entries is in your while loop. If you get an empty string, you clean the next one, and add it without checking. Try changing:
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
to
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() > 0)
{
mapz[not_s]++; //increment occurence
}
input >>s;
EDIT: I notice you are checking if the characters are alphanumeric. If numbers are not allowed, you may need to revisit that area as well.
Further improvements would be to
declare variables only when you use them, and in the innermost scope
use c++-style casts instead of the c-style (int) casts
use empty() instead of length() == 0 comparisons
use the prefix increment operator for the iterators (i.e. ++mapzIter)
A blank string is a valid instance of the string class, so there's nothing special about adding it into the map. What you could do is first check if it's empty, and only increment in that case:
if (!not_s.empty())
mapz[not_s]++;
Style-wise, there's a few things I'd change, one would be to return clean from clean_entry instead of modifying it:
string not_s = clean_entry(s);
...
string clean_entry(const string &non_clean)
{
string clean;
... // as before
if(begin ==(int)non_clean.length())
return clean;
... // as before
return clean;
}
This makes it clearer what the function is doing (taking a string, and returning something based on that string).
The function 'getWords' is doing a lot of distinct actions that could be split out into other functions. There's a good chance that by splitting it up into it's individual parts, you would have found the bug yourself.
From the basic structure, I think you could split the code into (at least):
getNextWord: Return the next (non blank) word from the stream (returns false if none left)
clean_entry: What you have now
getNextCleanWord: Calls getNextWord, and if 'true' calls CleanWord. Returns 'false' if no words left.
The signatures of 'getNextWord' and 'getNextCleanWord' might look something like:
bool getNextWord (std::ifstream & input, std::string & str);
bool getNextCleanWord (std::ifstream & input, std::string & str);
The idea is that each function does a smaller more distinct part of the problem. For example, 'getNextWord' does nothing but get the next non blank word (if there is one). This smaller piece therefore becomes an easier part of the problem to solve and debug if necessary.
The main component of 'getWords' then can be simplified down to:
std::string nextCleanWord;
while (getNextCleanWord (input, nextCleanWord))
{
++map[nextCleanWord];
}
An important aspect to development, IMHO, is to try to Divide and Conquer the problem. Split it up into the individual tasks that need to take place. These sub-tasks will be easier to complete and should also be easier to maintain.