can i please get some guidance to constructing a parser for an input file, I've been looking for a help for weeks, the assignment is already past due, I would just like to know how to do it.
The commented code is what I've tried, but i have a feeling it is more serious than that. I have a text file and I want to parse it to count the number of times that words appear in the document.
Parser::Parser(string filename) {
//ifstream.open(filename);
// source (filename, fstream::in | fstream::out);
}
The commented code is what I've tried, but i have a feeling it is more serious than that.
I have a feeling you haven't tried a thing. So I am going to do the same.
Google is your friend.
To read a word:
std::ifstream file("FileName");
std::string word;
file >> word; // reads one word from a file.
// Testing a word:
if (word == "Floccinaucinihilipilification")
{
++count;
}
// Count multiple words
std::map<std::string, int> count;
// read a word
++count[word];
// To read many words from a file:
std::string word;
while(file >> word)
{
// You have now read a word from a file
}
Note: That is a real word :-)
http://dictionary.reference.com/browse/floccinaucinihilipilification
Take a look at the answers in How do you read a word in from a file in C++? . The easiest way is to use an ifstream and operator>> to read single words. You can then use a standard container like vector (as mentioned in the link above) or map<string, int> to remember the actual count.
Related
I have got a text file with contents:
Artificial neural networks (ANNs) or connectionist systems are
computing systems vaguely inspired by the biological neural networks
that constitute animal brains.[1] Such systems "learn" (i.e.
progressively improve performance on) tasks by considering examples,
generally without task-specific programming. For example, in image
recognition, they might learn to identify images that contain cats by
analyzing example images that have been manually labeled as "cat" or
"no cat" and using the results to identify cats in other images. They
do this without any a priori knowledge about cats, e.g., that they
have fur, tails, whiskers and cat-like faces. Instead, they evolve
their own set of relevant characteristics from the learning material
that they process.
and I'm using this code to read the contents.
ifstream file("/Users/sourav/Desktop/stl/stl/stl/testdata.txt");
while (! file.eof()) {
string word;
file >> word ;
cout << word << "\n";
}
This is the first few lines of output:
Artificial
neural
(ANNs)
are
vaguely
and if you notice that the contents are not read properly. I don't see or connectionist systems are computing systems.
I'm missing few string values from the text file while reading it.
Note:I'm using Xcode.
ifstream file("/Users/sourav/Desktop/stl/stl/stl/dictionary.txt");
string line;
if (file.is_open()) // same as: if (myfile.good())
{
while(getline(file,line,'\r')){
transform(line.begin(), line.end(), line.begin(), ::tolower);
Dictionary.insert(line);
}
cout<<Dictionary.size()<<" words read from dictionary\n";
file.close();
Why does the dictionary.size() change in value when I transform it to lowercase
While this may not explain why it does not work, your code may look like:
ifstream file("testdata.txt");
do {
string word;
file >> word ;
if (!file.good()) break;
cout << word << "\n";
} while (!file.eof());
It is not correct to test for eof condition if you never tried to read something first.
This code (and yours while being logically incorrect) works perfectly. So something else is happening (that is not related to xcode).
Try using something along the lines of this:
ifstream file("/Users/sourav/Desktop/stl/stl/stl/testdata.txt");
string word;
while(file >> word) //While there is a word to get... get it and put it in word
{
cout << word <<"\n";
}
A little bit more of an explanation can be found on the accepted answer in question read word by word from file in C++
I don't see much of a difference in logic though between this and your logic.
Basically, I'm trying to read in the words from a file and, without punctuation, read each word into a multimap which is then inserted into a vector with each pair being a word and the line of the file that word is found. I've got the function to remove punctuation working perfectly and I'm fairly certain my insert code works properly, but I can't seem to get around the line number part. I've included this section of my code as follows:
ifstream in("textfile.txt");
string line;
string keys;
stringstream keystream;
int line_number = 1;
while (getline(in, line, '\n')) {
alphanum(line);
keystream << line;
while(getline(keystream, keys, ' '))
table.insert(keys, line_number); //this just inserts the pair into my vector (table is an instance of a class I created)
keystream.str("");
line_number++;
}
The problem seems to be related to the stringstream. It doesn't seem to clear when I use keystream.str(""). This particular method only seems to read line 1 in and then exits the loop, whereas some other variations I've tried (I can't remember exactly what I did) read the entire file but don't flush the stringstream so it reads like word 1, word 1, word 2, word 1, word 2, word 3, etc.. Anyway, if anyone could point me in the right direction or perhaps link to a guide specific to parsing input in c++ that would be greatly appreciated! Thanks!
Don't keep the string stream object; just make a new one in each round:
string line;
while (getline(in, line, '\n'))
{
alphanum(line);
istringstream keystream(line);
string keys;
while (getline(keystream, keys, ' ')) // or even "while (keystream >> keys)"
{
}
}
I think the problem is that the second getline() loop sets the EOF flag on the stringstream, and this is not cleared when you call str(). You need to call .clear() also on 'keystream'.
I can read the number of lines easy, using:
ifstream in(file);
string content;
while(getline(in, content))
{
// do stuff
}
Or I can read the number of words and characters easy using something like:
ifstream in(file)
string content;
int numOfCharacters = 0;
int numOfWords = 0;
while(in >> content)
{
++numOfWords;
numOfCharacters += content.size();
}
But I dont want to read the file twice. How can I read the file once, and find out the number of lines, words and characters?
PS: I would welcome a Boost sugestion, if there is a easy way.
Thank you.
Read the line and for each line count the words. See stringstream for the second part.
(I'm not giving more information, that looks too much like an homework).
This could be done with a trivial boost.spirit.qi parser.
Sticking with the iostreams solution: you could create a strstream out of each line read via getline(), and do the word/char counting operations on it, accumulating across all the lines.
How can I go about detecting a space OR another specific character/symbol in one line of a file using the fstream library?
For example, the text file would look like this:
Dog Rover
Cat Whiskers
Pig Snort
I need the first word to go into one variable, and the second word to go into another separate variable. This should happen for every line in the text file.
Any suggestions?
This is pretty simple.
string a;
string b;
ifstream fin("bob.txt");
fin >> a;
fin >> b;
If that's not quite what you want, please elaborate your question.
Perhaps a better way overall is to use a vector of strings...
vector<string> v;
string tmp;
ifstream fin("bob.txt");
while(fin >> tmp)
v.push_back(tmp);
This will give you a vector v that holds all the words in your file.
once again I ask for help. I haven't coded anything for sometime!
Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.
What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.
I already am getting the words via the following code:
vector<string> words;
string currentWord;
while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord);
}
This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.
Would I have to get the entire line, then process it into words to make this possible?
Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.
If you need like by line input, use std::getline(std::istream&, std::string&), like this:
std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
lines.push_back(currentLine);
You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)
EDIT: This is the same thing as in #dash-tom-bang's answer, but modified to be correct with respect to error handing:
vector<string> words;
int currentLine = 1; // or 0, however you wish to count...
string line;
while (getline(inputFile, line))
{
istringstream inputString(line);
string word;
while (inputString >> word)
words.push_back(pair(word, currentLine));
}
Short and sweet.
vector< map< string, size_t > > line_word_counts;
string line, word;
while ( getline( cin, line ) ) {
line_word_counts.push_back();
map< string, size_t > &word_counts = line_word_counts.back();
istringstream line_is( line );
while ( is >> word ) ++ word_counts[ word ];
}
cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
<< " times\n";
You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.
This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.
Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:
First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
Then read non-whitespace characters, putting them into the string object you'll be returning.
If it runs out of stuff to read, read the next block and continue.
If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)
Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.
An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.