How can you ignore from cin until it goes to the next line? - c++

If you have a text file that you're reading character by character with cin:
char text;
cin >> text;
cout << char << endl;
Suppose you want to ignore any lines that start with ">" until the new line, how can you do that?

You can compare the char read for '>' using::
int strncmp ( const char * str1, const char * str2, size_t num );
If found, skip till char read equals '\n' i.e., skip till strncmp returns 0 for ( char, '\n', 1 )

Typically you want to use std::istream::ignore, something like:
static const int max_line = 65536;
if (text == ">")
std::cin.ignore(max_line, '\n');
Note that I've specified a maximum distance to skip of 64K bytes. Many people recommend something like std::numeric_limits<std::streamsize>::max(), which basically means to skip any amount of text until you find the delimiter (new-line, in this case).
IMO, specifying such a huge number is usually a poor idea -- if you go for too long without seeing a new-line, it's safe to stop and assume that you've gotten bad data. As soon as you've read enough to be reasonably certain there's a problem, it's better to stop and warn the user, rather than spending minutes with the program apparently locked up, reading gigabytes of useless data (and then probably giving the user an error message anyway).
Another possibility (especially if you're fairly sure you'll get good input) is to start by reading a full line (e.g., with std::getline), then if it starts with a >, just skip processing the line, and go back to read the next one.

Related

How exactly does the extract>> operator works in C++

I am a computer science student, an so do not have much experience with the C++ language (considering it is my first semester using this language,) or coding for that matter.
I was given an assignment to read integers from a text file in the simple form of:
19 3 -2 9 14 4
5 -9 -10 3
.
.
.
This sent me of on a journey to understand I/O operators better, since I am required to do certain things with this stream (duh.)
I was looking everywhere and could not find a simple explanation as to how does the extract>> operator works internally. Let me clarify my question:
I know that the extractor>> operator would extract one continues element until it hits space, tab, or newline. What I try to figure out is, where would the pointer(?) or read-location(?) be AFTER it extracts an element. Will it be on the last char of the element just removed or was it removed and therefore gone? will it be on the space/tab/'\n' character itself? Perhaps the beginning of the next element to extract?
I hope I was clear enough. I lack all the appropriate jargon to describe my problem clearer.
Here is why I need to know this: (in case anyone is wondering...)
One of the requirements is to sum all integers in each line separately.
I have created a loop to extract all integers one-by-one until it reaches the end of the file. However, I soon learned that the extract>> operator ignores space/tab/newline. What I want to try is to extract>> an element, and then use inputFile.get() to get the space/tab/newline. Then, if it's a newline, do what I gotta do.
This will only work if the stream pointer will be in a good position to extract the space/tab/newline after the last extraction>>.
In my previous question, I tried to solve it using getline() and an sstring.
SOLUTION:
For the sake of answering my specific question, of how operator>> works, I had to accept Ben Voigt's answer as the best one.
I have used the other solutions suggested here (using an sstring for each line) and they did work! (you can see it in my previous question's link) However, I implemented another solution using Ben's answer and it also worked:
.
.
.
if(readFile.is_open()) {
while (readFile >> newInput) {
char isNewLine = readFile.get(); //get() the next char after extraction
if(isNewLine == '\n') //This is just a test!
cout << isNewLine; //If it's a newline, feed a newline.
else
cout << "X" << isNewLine; //Else, show X & feed a space or tab
lineSum += newInput;
allSum += newInput;
intCounter++;
minInt = min(minInt, newInput);
maxInt = max(maxInt, newInput);
if(isNewLine == '\n') {
lineCounter++;
statFile << "The sum of line " << lineCounter
<< " is: " << lineSum << endl;
lineSum = 0;
}
}
.
.
.
With no regards to my numerical values, the form is correct! Both spaces and '\n's were catched:
Thank you Ben Voigt :)
Nonetheless, this solution is very format dependent and is very fragile. If any of the lines has anything else before '\n' (like space or tab), the code will miss the newline char. Therefore, the other solution, using getline() and sstrings, is much more reliable.
After extraction, the stream pointer will be placed on the whitespace that caused extraction to terminate (or other illegal character, in which case the failbit will also be set).
This doesn't really matter though, since you aren't responsible for skipping over that whitespace. The next extraction will ignore whitespaces until it finds valid data.
In summary:
leading whitespace is ignored
trailing whitespace is left in the stream
There's also the noskipws modifier which can be used to change the default behavior.
The operator>> leaves the current position in the file one
character beyond the last character extracted (which may be at
end of file). Which doesn't necessarily help with your problem;
there can be spaces or tabs after the last value in a line. You
could skip forward reading each character and checking whether
it is a white space other than '\n', but a far more idiomatic
way of reading line oriented input is to use std::getline to
read the line, then initialize an std::istringstream to
extract the integers from the line:
std::string line;
while ( std::getline( source, line ) ) {
std::istringstream values( line );
// ...
}
This also ensures that in case of a format error in the line,
the error state of the main input is unaffected, and you can
continue with the next line.
According to cppreference.com the standard operator>> delegates the work to std::num_get::get. This takes an input iterator. One of the properties of an input iterator is that you can dereference it multiple times without advancing it. Thus when a non-numeric character is detected, the iterator will be left pointing to that character.
In general, the behavior of an istream is not set in stone. There exist multiple flags to change how any istream behaves, which you can read about here. In general, you should not really care where the internal pointer is; that's why you are using a stream in the first place. Otherwise you'd just dump the whole file into a string or equivalent and manually inspect it.
Anyway, going back to your problem, a possible approach is to use the getline method provided by istream to extract a string. From the string, you can either manually read it, or convert it into a stringstream and extract tokens from there.
Example:
std::ifstream ifs("myFile");
std::string str;
while ( std::getline(ifs, str) ) {
std::stringstream ss( str );
double sum = 0.0, value;
while ( ss >> value ) sum += value;
// Process sum
}

Count the number of unique words and occurrence of each word

CSCI-15 Assignment #2, String processing. (60 points) Due 9/23/13
You MAY NOT use C++ string objects for anything in this program.
Write a C++ program that reads lines of text from a file using the ifstream getline() method, tokenizes the lines into words ("tokens") using strtok(), and keeps statistics on the data in the file. Your input and output file names will be supplied to your program on the command line, which you will access using argc and argv[].
You need to count the total number of words, the number of unique words, the count of each individual word, and the number of lines. Also, remember and print the longest and shortest words in the file. If there is a tie for longest or shortest word, you may resolve the tie in any consistent manner (e.g., use either the first one or the last one found, but use the same method for both longest and shortest). You may assume the lines comprise words (contiguous lower-case letters [a-z]) separated by spaces, terminated with a period. You may ignore the possibility of other punctuation marks, including possessives or contractions, like in "Jim's house". Lines before the last one in the file will have a newline ('\n') after the period. In your data files, omit the '\n' on the last line. You may assume that the lines will be no longer than 100 characters, the individual words will be no longer than 15 letters and there will be no more than 100 unique words in the file.
Read the lines from the input file, and echo-print them to the output file. After reaching end-of-file on the input file (or reading a line of length zero, which you should treat as the end of the input data), print the words with their occurrence counts, one word/count pair per line, and the collected statistics to the output file. You will also need to create other test files of your own. Also, your program must work correctly with an EMPTY input file – which has NO statistics.
Test file looks like this (exactly 4 lines, with NO NEWLINE on the last line):
the quick brown fox jumps over the lazy dog.
now is the time for all good men to come to the aid of their party.
all i want for christmas is my two front teeth.
the quick brown fox jumps over a lazy dog.
Copy and paste this into a small file for one of your tests.
Hints:
Use a 2-dimensional array of char, 100 rows by 16 columns (why not 15?), to hold the unique words, and a 1-dimensional array of ints with 100 elements to hold the associated counts. For each word, scan through the occupied lines in the array for a match (use strcmp()), and if you find a match, increment the associated count, otherwise (you got past the last word), add the word to the table and set its count to 1.
The separate longest word and the shortest word need to be saved off in their own C-strings. (Why can't you just keep a pointer to them in the tokenized data?)
Remember – put NO NEWLINE at the end of the last line, or your test for end-of-file might not work correctly. (This may cause the program to read a zero-length line before seeing end-of-file.)
This is not a long program – no more than about 2 pages of code
Here is what I have so far:
#include<iostream>
#include<iomanip>
#include<fstream>
#include<string>
#include<cstring>
using namespace std;
void totalwordCount(ifstream &inputFile)
{
char words[100][16]; // Holds the unique words.
char *token;
int totalCount = 0; // Counts the total number of words.
// Read every word in the file.
while(inputFile >> words[99])
{
totalCount++; // Increment the total number of words.
// Tokenize each word and remove spaces, periods, and newlines.
token = strtok(words[99], " .\n");
while(token != NULL)
{
token = strtok(NULL, " .\n");
}
}
cout << "Total number of words in file: " << totalCount << endl;
}
void uniquewordCount(ifstream &inputFile)
{
char words[100][16]; // Holds the unique words
int counter[100];
char *tok = "0";
int uniqueCount = 0; // Counts the total number of unique words
while(!inputFile.eof())
{
uniqueCount++;
tok = strtok(words[99], " .\n");
while(tok != NULL)
{
tok = strtok(NULL, " .\n");
inputFile >> words[99];
if(strcmp(tok, words[99]) == 0)
{
counter[99]++;
}
else
{
words[99][15] += 1;
}
uniqueCount++;
}
}
cout << counter[99] << endl;
}
int main(int argc, char *argv[])
{
ifstream inputFile;
char inFile[12] = "string1.txt";
char outFile[16] = "word result.txt";
// Get the name of the file from the user.
cout << "Enter the name of the file: ";
cin >> inFile;
// Open the input file.
inputFile.open(inFile);
// If successfully opened, process the data.
if(inputFile)
{
while(!inputFile.eof())
{
totalwordCount(inputFile);
uniquewordCount(inputFile);
}
}
return 0;
}
I already took care of how to count the total number of words in the file in the totalwordCount() function, but in the uniquewordCount() function, I am having trouble counting the total number of unique words and counting the number of occurrences of each word. Is there anything that I need to change in the uniquewordCount() function?
This program contains several issues which are to be considered harmful! To prevent bad software being created based on entirely nonsensical assignments like the above, here are a number of hints:
Always test the stream for success after reading from it. Using in.eof() to determine if the stream is in a good state does not work! One of the problems is that you will get an infinite loop if the stream goes bad for a different reason than end of file, e.g., failure to correctly parse a value (this will set std::ios_base::failbit but not std::ios_base::eofbit.
Reading to a fixed size char array a using in >> a without having set up limits for the number of characters to be read is the C++ way to spell gets()! If you really think that using in >> a is the right way to (see next item), you absolutely need to set up the array's width, e.g., using in >> std::setw(sizeof(a)) >> a. You still need to check that this extraction was successful, of course.
From the looks of it, your teacher wants you to actually use std::istream::getline() to read the array, e.g., using in.getline(a, sizeof(a)) (which, of course, needs to be checked for success).
Note that the formatted input, i.e., in >> a already tokenizes the stream being received by spaces! There is no need to faff about with strtok() after that.
Once you have consumed a stream, it is consumed. Assuming the characters don't come from a file but rather from something like standard input, you also can't rewind the stream to read it again. I'd think you want to tokenize the values once and use them for both purposes.
This is more of a sidenote: after you created a stream, its nature should be entirely immaterial for the processing of the stream's content (although, e.g., for string streams you might want to eventually collect the result using the str() member): implement your stream processing functions in terms of std::istream rather than std::ifstream!
Since you have a concrete question ("Is there anything that I need to change in the uniquewordCount() function?"): yes, everything! Throw away this function entirely and rethink what you need to do. Basically, the structure of the functionality should be along the lines of
char buffer[100];
while (in.getline(buffer, sizeof(buffer))) {
// tokenize buffer into words
// for each word check if it already exists
// if the word does not exist, append it to the array of known words and set count to 1
// if the word exists, increment the count
// determine if the word is shorter or longer than the shortest or longest word so far
// if it is the case, remember the word's index or a pointer to it
}

Line Breaks when reading an input file by character in C++

Ok, just to be up front, this IS homework, but it isn't due for another week, and I'm not entirely sure the final details of the assignment. Long story short, without knowing what concepts he'll introduce in class, I decided to take a crack at the assignment, but I've run into a problem. Part of what I need to do for the homework is read individual characters from an input file, and then, given the character's position within its containing word, repeat the character across the screen. The problem I'm having is, the words in the text file are single words, each on a different line in the file. Since I'm not sure we'll get to use <string> for this assignment, I was wondering if there is any way to identify the end of the line without using <string>.
Right now, I'm using a simple ifstream fin; to pull the chars out. I just can't figure out how to get it to recognize the end of one word and the beginning of another. For the sake of including code, the following is all that I've got so far. I was hoping it would display some sort of endl character, but it just prints all the words out run together style.
ifstream fin;
char charIn;
fin.open("Animals.dat");
fin >> charIn;
while(!fin.eof()){
cout << charIn;
fin >> charIn;
}
A few things I forgot to include originally:
I must process each character as it is input (my loop to print it out needs to run before I read in the next char and increase my counter). Also, the length of the words in 'Animals.dat' vary which keeps me from being able to just set a number of iterations. We also haven't covered fin.get() or .getline() so those are off limits as well.
Honestly, I can't imagine this is impossible, but given the restraints, if it is, I'm not too upset. I mostly thought it was a fun problem to sit on for a while.
Why not use an array of chars? You can try it as follow:
#define MAX_WORD_NUM 20
#define MAX_STR_LEN 40 //I think 40 is big enough to hold one word.
char words[MAX_WROD_NUM][MAX_STR_LEN];
Then you can input a word to the words.
cin >> words[i];
The >> operator ignores whitespace, so you'll never get the newline character. You can use c-strings (arrays of characters) even if the <string> class is not allowed:
ifstream fin;
char animal[64];
fin.open("Animals.dat");
while(fin >> animal) {
cout << animal << endl;
}
When reading characters from a c-string (which is what animal is above), the last character is always 0, sometimes represented '\0' or NULL. This is what you check for when iterating over characters in a word. For example:
c = animal[0];
for(int i = 1; c != 0 && i < 64; i++)
{
// do something with c
c = animal[i];
}

Tokenization of a text file with frequency and line occurrence. Using C++

once again I ask for help. I haven't coded anything for sometime!
Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.
What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.
I already am getting the words via the following code:
vector<string> words;
string currentWord;
while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord);
}
This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.
Would I have to get the entire line, then process it into words to make this possible?
Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.
If you need like by line input, use std::getline(std::istream&, std::string&), like this:
std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
lines.push_back(currentLine);
You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)
EDIT: This is the same thing as in #dash-tom-bang's answer, but modified to be correct with respect to error handing:
vector<string> words;
int currentLine = 1; // or 0, however you wish to count...
string line;
while (getline(inputFile, line))
{
istringstream inputString(line);
string word;
while (inputString >> word)
words.push_back(pair(word, currentLine));
}
Short and sweet.
vector< map< string, size_t > > line_word_counts;
string line, word;
while ( getline( cin, line ) ) {
line_word_counts.push_back();
map< string, size_t > &word_counts = line_word_counts.back();
istringstream line_is( line );
while ( is >> word ) ++ word_counts[ word ];
}
cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
<< " times\n";
You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.
This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.
Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:
First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
Then read non-whitespace characters, putting them into the string object you'll be returning.
If it runs out of stuff to read, read the next block and continue.
If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)
Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.
An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.

C++: Why does space always terminate a string when read?

Using type std::string to accept a sentence, for practice (I haven't worked with strings in C++ much) I'm checking if a character is a vowel or not. I got this:
for(i = 0; i <= analyse.length(); i++) {
if(analyse[i] == 'a' || analyse[i] == 'e' [..etc..]) {
...vowels++;
} else { ...
...consonants++;
}
This works fine if the string is all one word, but the second I add a space (IE: aeio aatest) it will only count the first block and count the space as a consonant, and quit reading the sentence (exiting the for loop or something).
Does a space count as no character == null? Or some oddity with std::string?, It would be helpful to know why that is happening!
EDIT:
I'm simply accepting the string through std::cin, such as:
std::string analyse = "";
std::cin >> analyse;
I'd guess you're reading your string with something like your_stream >> your_string;. Operator >> for strings is defined to work (about) the same as scanf's %s conversion, which reads up until it encounters whitespace -- therefore, operator>> does the same.
You can read an entire line of input instead with std::getline. You might also want to look at an answer I posted to a previous question (provides some alternatives to std::getline).
I can't tell from the code that you have pasted, but I'm going to go out on a limb and guess that you're reading into the string using the stream extraction operator (stream >> string).
The stream extraction operator stops when it encounters whitespace.
If this isn't what's going on, can you show us how you're populating your string, and what its contents are?
If I'm right, then you're going to want a different method of reading content into the string. std::getline() is probably the easiest method of reading from a file. It stops at newlines instead of at whitespace.
Edit based on edited question:
use this (doublecheck the syntax. I'm not in front of my compiler.):
std::getline(std::cin, analyze);
This ought to stop reading when you press "enter".
If you want to read in an entire line (including the blanks) then you should read using getline. Schematically it looks like this:
#include <string>
istream& std::getline( istream& is, string& s );
To read the whole line you do something like this:
string s;
getline( cin, s );
cout << "You entered " << s << endl;
PS: the word is "consonant", not "consenent".
The >> operator on an istream separates strings on whitespace. If you want to get a whole line, you can use readline(cin,destination_string).