Behaviour of fstream in C++ - c++

I have made the following script, that is supposed to read from a file:
char match[] = "match";
int a;
int b;
inp >> lin;
while(!inp.eof()) {
if(!strcmp(lin, match)) {
inp >> a >> b;
cout << a << " " << b <<endl;
}
inp >> lin;
}
inp.close();
return num_atm;
}
It is supposed to read all words, and if a line starts with match, it should then also print the rest of the line.
My input file is this:
match 1 2 //1
match 5 2 //2
nope 3 6 //3
match 5 //4
match 1 4 //5
match 5 9 //6
It will correctly print 1 2, 5 2, and skip 3 6. But then, it will get stuck and keep printing 5 0 and continue printing 5 0 for ever. I get that match is put into b, which is an integer, but I don't get why this is looped. Shouldn't the input read match 4 once, try to read/write 5 and match, and then be done with line 4 and the match from line 5? Then it should next read the number 1 and 4 and then match from number 6.
I would also understand that due to the word not fitting into the integer, it would read match in the fifth line again, but that's not what it does.
It goes back to the match in the fourth line which it already read, and reads it again. Why is this?

When you are reading with >> line enndings are handled the same as spaces: They are just more whitespace that is skipped. That means you see
match 1 2
match 5 2
nope 3 6
match 5
match 1 4
match 5 9
But the program sees
match 1 2 match 5 2 nope 3 6 match 5 match 1 4 match 5 9
Let's fast forward to where things go south
Contents of stream:
nope 3 6 match 5 match 1 4 match 5 9
Processing
inp >> lin; // reads nope stream: 3 6 match 5 match 1 4 match 5 9
if(!strcmp(lin, match)) { // nope != match skip body
}
inp >> lin; // reads 3 stream: 6 match 5 match 1 4 match 5 9
if(!strcmp(lin, match)) { // 3 != match skip body
}
inp >> lin; // reads 6 stream: match 5 match 1 4 match 5 9
if(!strcmp(lin, match)) { // 6 != match skip body
}
inp >> lin; // reads match stream: 5 match 1 4 match 5 9
if(!strcmp(lin, match)) { // match != match Enter body
inp >> a >> b; // reads 5 and fails to parse match into an integer.
// stream: match 1 4 match 5 9
// stream now in failure state
cout << a << " " << b <<endl; // prints 5 and garbage because b was not read
}
inp >> lin; // reads nothing. Stream failed
if(!strcmp(lin, match)) { // match != match Enter body
inp >> a >> b; // reads nothing. Stream failed
// stream: match 1 4 match 5 9
// stream now in failure state
cout << a << " " << b <<endl; // prints g and garbage because b was not read
}
Because nothing is ever read, while(!inp.eof()) is utterly worthless. The end of the file can never be reached. The program will loop forever, probably printing whatever it last read. Successfully read.
Fixing this depends entirely on what you want to do if you have a match line without 2 numbers on it, but a typical framework looks something like
std::string line;
while(std::getline(inp, line) // get a whole line. Exit if line can't be read for any reason.
{
std::istringstream strm(line);
std::string lin;
if(strm >> lin && lin == match) // enters if lin was read and lin == match
// if lin can't be read, it doesn't matter.
// strm is disposable
{
int a;
int b;
if (strm >> a >> b) // enters if both a and b were read
{
cout << a << " " << b <<"\n"; // endl flushes. Very expensive. just use a newline.
}
}
}
Output from this should be something like
1 2
5 2
1 4
5 9
If you want to make some use of match 5... Well it's up to you what you want to put in b if there is no b in the file.

Related

Read a file line by line with specific data C++

I have a file with this format:
11
1 0
2 8 0
3 8 0
4 5 10 0
5 8 0
6 1 3 0
7 5 0
8 11 0
9 6 0
10 5 7 0
11 0
The first line is the number of lines, so I can make a loop to read the file with the number of lines.
For the other lines, I would like to read the file line by line and store the data until I get a "0" on the line that's why there is a 0 at the end of each line.
The first column is the task name.
The others columns are the constraints name.
I tried to code something but It doesn't seem to work
printf("Constraints :\n");
for (int t = 1; t <= numberofTasks; t++)
{
F >> currentTask;
printf("%c\t", currentTask);
F >> currentConstraint;
while (currentConstraint != '0')
{
printf("%c", currentConstraint);
F >> currentConstraint;
};
printf("\n");
};
The "0" represents the end of the constraints for a task.
I think my code doesn't work properly because the constraint 10 for the task 4 contains a "0" too.
Thanks in advance for your help
Regards
The problem is that you are reading individual characters from the file, not reading whole integers, or even line-by-line. Change your currentTask and currentConstraint variables to int instead of char, and use std::getline() to read lines that you then read integers from.
Try this:
F >> numberofTasks;
F.ignore();
std::cout << "Constraints :" << std::endl;
for (int t = 1; t <= numberofTasks; ++t)
{
std::string line;
if (!std::getline(F, line)) break;
std::istringstream iss(line);
iss >> currentTask;
std::cout << currentTask << "\t";
while ((iss >> currentConstraint) && (currentConstraint != 0))
{
std::cout << currentConstraint << " ";
}
std::cout << std::endl;
}
Live Demo
That being said, the terminating 0 on each line is unnecessary. std::getline() will stop reading when it reaches the end of a line, and operator>> will stop reading when it reaches the end of the stream.
Live Demo

Reading from file returns a result I didn't expect, trying to understand why

I have this code which contains a class and a main function:
class Employee {
int m_id;
string m_name;
int m_age; public:
Employee(int id, string name, int age) :m_id(id), m_name(name), m_age(age) {}
friend ostream& operator<<(ostream& os, const Employee& emp)
{
os << emp.m_id << " " << emp.m_name << " " << emp.m_age
<< endl;
return os;
}
};
int main() {
const int Emp_Num = 3;
fstream fs("dataBase.txt", ios::out);
if (!fs) {
cerr << "Failed opening file. Aborting.\n";
return -1;
}
Employee* list[Emp_Num] =
{ new Employee(1234, "Avi", 34),
new Employee(11111, "Beni", 24),
new Employee(5621, "Reut", 26) };
for (int i = 0; i < Emp_Num; i++)
{
fs << (*list[i]);
delete list[i];
}
fs.close();
fs.open("dataBase.txt");
if (!fs) {
cerr << "Failed opening file. Aborting.\n";
return -1;
}
fs.seekg(4);
string strRead;
fs >> strRead;
cout << strRead << endl;
fs.seekg(6, ios::cur);
fs >> strRead;
cout << strRead << endl;
fs.seekg(-9, ios::end);
fs >> strRead;
cout << strRead << endl;
}
Here is how I understand it, after the first file open and close, the file dataBase.txt should look like this:
1234 Avi 34
11111 Beni 24
5621 Reut 26
My problem is with the reading and output to the console.
After I open the file, the pointer of my current position is at the first byte, which is the 1 before the 1234.
I seek 4 from the beginning of the file,
so my pointer should be at (before) the space between the 1234 and Avi.
Now I get the next string into my string variable strRead,
now strRead contains "Avi" and the pointer should be between the i of Avi and the space after it.
Now i seek 6 from my current position,
by my count those are the 6 bytes i pass through:
Space
3
4
Line break (return)
1
1
So my pointer should be at the second line, after the two first ones.
I mean like this:
11|111 Beni 24
Now I get a string to strRead, by my understanding of the code strRead should now contain "111", instead, for some reason, it contains and output later "1111".
Could someone explain me why does it work that way?
There is no character between the first line drop and the first letter of the second line, so it should count as only 1 byte...
I did the following test:
I have run the second part of your code (which read from file) on a file with the text:
1234 Avi 34 11111 Beni 24 5621 Reut 26
So, I have replaced end of lines with spaces, and the code print to console output the expected result 111. I then began to be suspicious about seek skipping end of lines.
Then I changed the code instead (without file modifications) and worked with the file in binary mode:
//...
fstream fs("dataBase.txt", ios::out | ios::binary);
//...
fs.open("dataBase.txt", ios::in | ios::binary );
//...
Again the result was the expected: 111.
What change in both cases?
Well, in plain text (not in binary mode) the end of line are actually 2 chars (this can vary for other platforms, I'm reproducing this on Windows): \r and \n. Thats why you're reading four ones (1111) instead three (111).
Counting 6 positions from the space after Avi:
A v i _ 3 4 \r \n 1 1 1 1 1
^
1 2 3 4 5 6 7 8
In the first test I performed, the a space (only one character) replaced two of them.
A v i _ 3 4 _ 1 1 1 1 1
^
1 2 3 4 5 6 7 8
And in binary mode both of characters are represented as a single one unit to read(I don't have investigated if this is platform dependant).
A v i _ 3 4 B 1 1 1 1 1
^
1 2 3 4 5 6 7 8
B stands here for some binary code.

File stops being read after a newline character

I'm trying to create a spam filter. I need to train the model first. I read the words from a text file which has the word "spam" or "ham" as the first word of a paragraph, and then the words in the mail and number of its occurrences just after the word. There are paragraphs in the file. My program is able to read the first paragraph that is the words and their number of occurrences.
The problem is that, the file stops reading after encountering the newline that and doesn't read the next paragraph. Although I have a feeling that the way I am checking for a newline character that is the end of a paragraph is not entirely correct.
I have given two paragraphs so you just get the idea of the train text.
Train text file.
/000/003 ham need 1 fw 1 35 2 39 1 thanks 1 thread 2 40 1 copy 1 else 1 correlator 1 under 1 companies 1 25 1 he 2 26 2 168 1 29 2 content 4 1 1 6 1 5 1 4 1 review 2 we 1 john 3 17 1 use 1 15 1 20 1 classes 1 may 1 a 1 back 1 l 1 01 1 produced 1 i 1 yes 1 10 2 713 2 v6 1 p 1 original 2
/000/031 ham don 1 kim 5 dave 1 39 1 customer 1 38 2 thanks 1 over 1 thread 2 year 1 correlator 1 under 1 williams 1 mon 2 number 2 kitchen 1 168 1 29 1 content 4 3 2 2 6 system 2 1 2 7 1 6 1 5 2 4 1 9 1 each 1 8 1 view 2
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
int V = 0; // Total number of words
ifstream fin;
fin.open("train", ios::in);
string word;
int wordnum;
int N[2] = {0};
char c, skip;
for (int i = 0; i < 8; i++) fin >> skip; // There are 8 characters before the first word of the paragraph
while (!fin.fail())
{
fin >> word;
if (word == "spam") N[0]++;
else if (word == "ham") N[1]++;
else
{
V++;
fin >> wordnum;
}
int p = fin.tellg();
fin >> c; //To check for newline. If its there, we skip the first eight characters of the new paragraph because those characters aren't supposed to be read
if (c == '\n')
{
for (int i = 0; i < 8; i++) fin >> skip;
}
else fin.seekg(p);
}
cout << "\nSpam: " << N[0];
cout << "\nHam :" << N[1];
cout << "\nVocab: " << V;
fin.close();
return 0;
}
std::ifstream::operator>>() doesn't read \n in the variable; it drops it. If you need to manipulate with whitespaces and \n symbols, you can use std::ifstream::get()

Read from string into stringstream

When I try to parse whitespace seperated double values from a string, I found this curious behaviour that the string is read out in a cyclic manner.
Here's the program:
stringstream ss;
string s("1 2 3 4");
double t;
list<double> lis;
for(int j=0; j!=10; ++j){
ss << s;
ss >> t;
lis.push_back(t);
}
for(auto e : lis){
cout << e << " ";
}
Here the output:
1 2 3 41 2 3 41 2 3 41
If I append a trailing space as s= "1 2 3 4 "; I get
1 2 3 4 1 2 3 4 1 2
Now the questions:
1) If I don't know how many entries are in the string s, how do I read all into the list l?
2) which operator<< am I actually calling in ss << s;? Is it specified to read circularly?
3) Can I do the parsing in a better way?
Thanks already!
Here's the fixed code (thanks to timrau):
// declarations as before
ss << s;
while(ss >> t){
lis.push_back(t);
}
// output as before
This produces:
1 2 3 4
as desired. (Don't forget to clear your stringstream by ss.clear() before treating the next input. ;))
Another useful comment from HeywoodFloyd: One could also use boost/tokenizer to "split" the string, see this post
You can test the return value of >>.
while (ss >> t) {
lis.push_back(t);
}
It's not specified to read circularly. It's ss << s appending "1 2 3 4" to the end of the stream.
Before the 1st loop:
""
After 1st ss << s:
"1 2 3 4"
After 1st ss >> t:
" 2 3 4"
After 2nd ss << s:
" 2 3 41 2 3 4"
Then it's clear why you get 1 2 3 41 2 3 41 2 3 41 if there is no trailing space in s.
then use s.length() for strings containing unknown number of entries, if you use your approach.
Or, as suggested by timrau, just initialize your stringstream once.
stringstream ss;
string s("1 2 3 4 5 6 7 8");
ss << s;
double t;
list<double> lis;
while (ss >> t) {
lis.push_back(t);
}
for(auto e : lis){
cout << e << " ";
}
This stackoverflow post includes a boost tokenizer example. You may want to tokenize your string and iterate through it that way. That will solve the no trailing space problem timrau pointed out.

C++ Splitting the input problem

I am being given input in the form of:
(8,7,15)
(0,0,1) (0,3,2) (0,6,3)
(1,0,4) (1,1,5)
(2,1,6) (2,2,7) (2,5,8)
(3,0,9) (3,3,10) (3,4,11) (3,5,12)
(4,1,13) (4,4,14)
(7,6,15)
where I have to remember the amount of triples there are. I wrote a quick testing program to try read the input from cin and then split string up to get the numbers out of the input. The program doesn't seem to read all the lines, it stops after (1,1,5) and prints out a random 7 afterwards
I created this quick testing function for one of the functions I am trying to create for my assignment:
int main ()
{
string line;
char * parse;
while (getline(cin, line)) {
char * writable = new char[line.size() + 1];
copy (line.begin(), line.end(), writable);
parse = strtok (writable," (,)");
while (parse != NULL)
{
cout << parse << endl;
parse = strtok (NULL," (,)");
cout << parse << endl;
parse = strtok (NULL," (,)");
cout << parse << endl;
parse = strtok (NULL," (,)");
}
}
return 0;
}
Can someone help me fix my code or give me a working sample?
You can use this simple function:
istream& read3(int& a, int& b, int& c, istream& stream = cin) {
stream.ignore(INT_MAX, '(');
stream >> a;
stream.ignore(INT_MAX, ',');
stream >> b;
stream.ignore(INT_MAX, ',');
stream >> c;
stream.ignore(INT_MAX, ')');
return stream;
}
It expects the stream to start at a (, so it skips any characters and stops after the first ( it sees. It reads in an int into a which is passed by reference (so the outside a is affected by this) and then reads up to and skips the first comma it sees. Wash, rinse, repeat. Then after reading the third int in, it skips the closing ), so it is ready to do it again.
It also returns an istream& which has operator bool overloaded to return false when the stream is at its end, which is what breaks the while loop in the example.
You use it like this:
// don't forget the appropriate headers...
#include <iostream>
#include <sstream>
#include <string>
int a, b, c;
while (read3(a, b, c)) {
cout << a << ' ' << b << ' ' << c << endl;
}
That prints:
8 7 15
0 0 1
0 3 2
0 6 3
1 0 4
1 1 5
2 1 6
2 2 7
2 5 8
3 0 9
3 3 10
3 4 11
3 5 12
4 1 13
4 4 14
7 6 15
When you give it your input.
Because this is an assignment, I leave it to you to add error handling, etc.
I've written a blog 9 days back exactly to parse such inputs:
Playing around with Boost.Spirit - Parsing integer triplets
And you can see the output here for your input : http://ideone.com/qr4DA