Read number of lines, words, characters from a file - c++

I can read the number of lines easy, using:
ifstream in(file);
string content;
while(getline(in, content))
{
// do stuff
}
Or I can read the number of words and characters easy using something like:
ifstream in(file)
string content;
int numOfCharacters = 0;
int numOfWords = 0;
while(in >> content)
{
++numOfWords;
numOfCharacters += content.size();
}
But I dont want to read the file twice. How can I read the file once, and find out the number of lines, words and characters?
PS: I would welcome a Boost sugestion, if there is a easy way.
Thank you.

Read the line and for each line count the words. See stringstream for the second part.
(I'm not giving more information, that looks too much like an homework).

This could be done with a trivial boost.spirit.qi parser.

Sticking with the iostreams solution: you could create a strstream out of each line read via getline(), and do the word/char counting operations on it, accumulating across all the lines.

Related

Reading / Writing Files for a calculator: atof error

I currently have a text file that is as follows:
12 6 4 9
It is a very simple text file since I want to just get one line working and then maybe expand to multiple lines later. Extra aside: this is for a RPN calculator I am working on.
I want to go through this text file character by character. The way I currently have it implemented is with a simple while loop:
string line;
while (!infile.eof()){
getline(infile, line);
if (isdigit(line[0])){
rpn_stack.push_back(atof(line.c_str()));
}
}
rpn_stack is a vector since I will not be using the built in stack libraries in C++.
The problem I am currently having is that the output is just outputting "12". Why is this?
Is there a way that I can traverse through the file character by character instead of reading as a line? Is it breaking because it finds a white space (would that be considered the EOF)?
EDIT:
The code has been rewritten to be as the following:
string line;
while (!infile.eof()){
getline(infile, line);
for (int i = 0; i < line.size(); i++){
if (isdigit(line[i])){
rpn_stack.push_back(atof(line.c_str()));
}
}
}
The output is 12 5 different times, which is obviously wrong. Not only are there 4 items in the txt document, but only one of them is a 12. Can someone give some insight?
This will read as many doubles from infile as possible (i.e. until the end of file or until it comes across a token that isn't a double), separated by whitespace.
for (double d; infile >> d;)
rpn_stack.push_back(d);
If you need parse line-by-line, as #ooga says you will need a two-stage reader that looks something like this:
for (std::string line; getline(infile, line);) {
std::istringstream stream{line};
for (double d; stream >> d;)
rpn_stack.push_back(d);
}
Bonus hint: don't use .eof()

Efficiently read CSV file with optional columns

I'm trying to write a program that reads in a CSV file (no need to worry about escaping anything, it's strictly formatted with no quotes) but any numeric item with a value of 0 is instead just left blank. So a normal line would look like:
12,string1,string2,3,,,string3,4.5
instead of
12,string1,string2,3,0,0,string3,4.5
I have some working code using vectors but it's way too slow.
int main(int argc, char** argv)
{
string filename("path\\to\\file.csv");
string outname("path\\to\\outfile.csv");
ifstream infile(filename.c_str());
if(!infile)
{
cerr << "Couldn't open file " << filename.c_str();
return 1;
}
vector<vector<string>> records;
string line;
while( getline(infile, line) )
{
vector<string> row;
string item;
istringstream ss(line);
while(getline(ss, item, ','))
{
row.push_back(item);
}
records.push_back(row);
}
return 0;
}
Is it possible to overload operator<< of ostream similar to How to use C++ to read in a .csv file and output in another form? when fields can be blank?
Would that improve the performance?
Or is there anything else I can do to get this to run faster?
Thanks
The time spent reading the string data from the file is greater than the time spent parsing it. You won't make significant time savings in the parsing of the string.
To make your program run faster, read bigger "chunks" into memory; get more data per read. Research on memory mapped files.
One alternative way to handle this to get better performance is to read the whole file into a buffer. Then go through the buffer and set pointers to where the values start, if you find a , or end of line put in a \0.
e.g. https://code.google.com/p/csv-routine/

Reading input into dynamically-sized array

What I've been trying to do, is read a line from stdin and split it, by using whitespace as seperators.
Let's say I have this as input:
2
1 2
3 4
The first line gives me the amount of lines I'd like to read, they're all lines with integers seperated by an unknown amount of whitespace (i.e. could be 1 space, but it could also be 10 spaces).
The thing I've been trying to do is reading those lines into dynamically sized arrays of integers.
This was extremely easy in Python:
foo = raw_input()
array = foo.split()
or even shorter:
foo = raw_input().split()
However, because of the circumstances, I have to learn the beauty of C++.
So I tried to create something akin to the above Python code:
#include <iostream>
using namespace std;
int lines;
int *array;
int main() {
cin >> lines;
for (int line = 0; line < lines; line++) {
// Something.
}
}
I don't seem to know a way to split the line of input. I know that std::cin reads until it reaches a whitespace. However, I can't seem to think of something to count the amount of numbers on the line...
A little nudge into the right direction would be appreciated, thanks.
so given all you wanted is a nudge, here are a couple of hints..
std::getline() - allows you to read from a stream into a std::string.
You can then construct a std::istringstream using this string which you've just read in. Then use this stream to read your ints
for example:
std::string line;
if(std::getline(std::cin, line))
{
std::istringstream str(line);
int lc;
if (str >> lc) // now you have the line count..
{
// now use the same technique above
}
}
oh and for your "dynamically sized array", you need to look at std::vector<>
In C++ you can access characters in a string with [], just as if that string were an array. I suggest you read a line from cin into a string, iterate over the string with a for loop and check each character to see whether it is whitespace. Whenever you find a non-whitespace character, store it in your array.

How to construct a parser for an input file

can i please get some guidance to constructing a parser for an input file, I've been looking for a help for weeks, the assignment is already past due, I would just like to know how to do it.
The commented code is what I've tried, but i have a feeling it is more serious than that. I have a text file and I want to parse it to count the number of times that words appear in the document.
Parser::Parser(string filename) {
//ifstream.open(filename);
// source (filename, fstream::in | fstream::out);
}
The commented code is what I've tried, but i have a feeling it is more serious than that.
I have a feeling you haven't tried a thing. So I am going to do the same.
Google is your friend.
To read a word:
std::ifstream file("FileName");
std::string word;
file >> word; // reads one word from a file.
// Testing a word:
if (word == "Floccinaucinihilipilification")
{
++count;
}
// Count multiple words
std::map<std::string, int> count;
// read a word
++count[word];
// To read many words from a file:
std::string word;
while(file >> word)
{
// You have now read a word from a file
}
Note: That is a real word :-)
http://dictionary.reference.com/browse/floccinaucinihilipilification
Take a look at the answers in How do you read a word in from a file in C++? . The easiest way is to use an ifstream and operator>> to read single words. You can then use a standard container like vector (as mentioned in the link above) or map<string, int> to remember the actual count.

Tokenization of a text file with frequency and line occurrence. Using C++

once again I ask for help. I haven't coded anything for sometime!
Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.
What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.
I already am getting the words via the following code:
vector<string> words;
string currentWord;
while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord);
}
This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.
Would I have to get the entire line, then process it into words to make this possible?
Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.
If you need like by line input, use std::getline(std::istream&, std::string&), like this:
std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
lines.push_back(currentLine);
You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)
EDIT: This is the same thing as in #dash-tom-bang's answer, but modified to be correct with respect to error handing:
vector<string> words;
int currentLine = 1; // or 0, however you wish to count...
string line;
while (getline(inputFile, line))
{
istringstream inputString(line);
string word;
while (inputString >> word)
words.push_back(pair(word, currentLine));
}
Short and sweet.
vector< map< string, size_t > > line_word_counts;
string line, word;
while ( getline( cin, line ) ) {
line_word_counts.push_back();
map< string, size_t > &word_counts = line_word_counts.back();
istringstream line_is( line );
while ( is >> word ) ++ word_counts[ word ];
}
cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
<< " times\n";
You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.
This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.
Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:
First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
Then read non-whitespace characters, putting them into the string object you'll be returning.
If it runs out of stuff to read, read the next block and continue.
If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)
Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.
An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.