Reading/parsing text file input c++ - c++

A little background: I am working on a sliding block puzzle for a school project and this is our first using C++ instead of Java. This is the first time I have had to implement something that reads data from a file.
I have a simple question regarding reading input from a text file.
I understand how to read the file line by line and hold each line in a string, I want to know if I can parse the string into different data types as the file is read.
Currently I am reading each line and storing them as strings in a vector for parsing later, and I know there must be a much simpler way to implement this
The first line holds 2 integers which will indicate the length and width of the grid, the following lines will have 4 integers and a char for use as arguments when creating the blocks.
My question is this, if I read the file character by character instead, is there a function I can use that will detect if the character is an integer or a char (and ignore the spaces) so I can store them immediately and create the block objects as the file is read? How would i deal with integers >10 in this case?
EDIT: Just noting I am using fstream to read the files, I am unfamiliar with other input methods
A sample input:
4 4
3 1 2 1 b
1 1 1 1 a

To detect whether a piece of string can be parsed as an integer, you just have to parse it and see if you succeed. The best function for that would probably be std::strtoul(), since it can be made to tell you how many characters it consumed, so that you can continue parsing after that. (See the man page for details.)
However, if you already know the format of your file, you can use iostream formatted extraction. This is quite straightforward:
#include <fstream>
std::ifstream infile("thefile.txt");
int n1, n2, x1, x2, x3, x4;
char c;
if (!(infile >> n1 >> n2)) { /* error, could not read first line! Abort. */ }
while (infile >> x1 >> x2 >> x3 >> x4 >> c)
{
// successfully extracted one line, data is in x1, ..., x4, c.
}
Another alternative is to read every line into a string (using std::getline), then creating a stringstream from that line, and parsing the stringstream with >>. This has the added benefit that you can discover and skip bad lines and recover, while in the direct formatted extraction I presented above, you cannot recover from any error.

If you can assert each type, I suggest using stream operators, like you would with cin.
#include <fstream>
using namespace std;
int main()
{
fstream fileStream;
fileStream.open("inputfile.txt");
int firstNumber;
fileStream >> firstNumber;
char firstChar;
fileStream >> firstChar;
}
This way, you can read by value, instead of reading by line and then parsing the line. Just read in every value you need into a variable, as you discover you need it, like that.

I would read each line into a string (as you have been doing).
Then I would read the tokens from that line into the appropriate variables.
The operator>> when applied to a stream will convert the next value in a stream into the correct type. If this is not possable it sets flags on the stream indicating failure that are easy to test.
int x;
stream >> x; // ignores white space then: reads an integer from the stream into x
char c;
stream >> c; // ignores white space then: reads an char from the stream into c
double d;
stream >> d; // ignores white space then: reads an double from the stream into d
Assuming your input:
4 4
3 1 2 1 b
1 1 1 1 a
Not knowing what the the values mean I will put my assumptions in comments.
// Assume that stream is a std::fstream already opened to you file.
std::string line1;
std::getline(stream, line1); // reads "4 4" into `line1`
std::stringstream line1stream(line1); // convert line1 into a stream for reading.
int a;
int b;
line1stream >> a >> b; // reads "4 4" from 'line1' into a (now 4) b (now 4)
if (!stream || !line1stream)
{
// failed reading the first line.
// either reading the line failed (!stream)
// or reading 2 integers from line1stream failed (!line1stream)
throw SomeException("Failed");
}
std::string line2;
std::getline(stream, line2); // reads "3 1 2 1 b" into line2
std::stringstream line2stream(line2); // convers line2 into a stream for reading.
int data[4];
char c;
line2stream >> data[0] >> data[1] >> data[2] >> data[3] >> c;
if (!stream || !line2stream)
{
// failed reading the second line.
// either reading the line failed (!stream)
// or reading 4 integers and one char from line2stream failed (!line2stream)
throw SomeException("Failed");
}

ifstreams are also istreams, so you can use the same operator >> as with std::cin.
int main()
{
std::ifstream s("test.txt");
if (s.is_open())
{
int i, j, k;
s >> i >> j >> k;
}
}
Note that this is way not the fastest way of parsing, but that is probably irrelevant to you.

Related

unexpected behavior when reading from istringstream

I have a question on the stream behavior, see the following example. What I was expecting is, since there are only 5 chars in the string, and stream read will get stuck as I am trying to read 10 chars. Instead, the output is "hellooooo" ... the last char get repeated.
My questions are two folds: first, why? second, is there anyway to make stream behave as if no more repeating of last char?
#include <sstream>
#include <iostream>
using namespace std;
int main(void) {
char c;
string msg("hello");
istringstream iss(msg);
unsigned int i = 0;
while (i < 10) {
iss >> c;
cout << c;
i++;
}
cout << endl;
return 0;
}
What you see is the result of reading form a stream in an erronous state. When you read past the last element in the stream (this being a string stream), the stream becomes erroneous and any other attempt to read from it will fail (and leave the extraction variable untouched).
You will have to check if the extraction operation succeeded before reading further:
if (iss >> c) {
// succeess
} else {
// failed to extract, handle error
}
Were you to use a stream connected to the console (for an example) your call to >> would have blocked as you expected. The behavior of stringstream is different (you cannot expect to micraculously contain more data)
The reason is that when you've read to the end of the stream, all attempts to read after that just fail, leaving the last value read in your c.
If you want to read at most 10 characters:
while (i < 10 && is >> c) {
cout << c;
i++;
}
This works because a stream can be converted to bool, and it's true if the stream is in a "good" state.
"the last char get repeated"
When iss >> c fails, c stays unmodified.
Check whether extraction of value succeeded by directly evaluating this expression: if (iss >> c), but don't even think about calling iss.good(). Check this answer and also have a look at:
How does that funky while (std::cin >> foo) syntax work?
Why does my input seem to process past the end of file?

Using C++ ifstream extraction operator>> to read formatted data from a file

As my learning, I am trying to use c++ ifstream and its operator>> to read data from a text file using code below. The text file outdummy.txt has following contents:
just dummy
Hello ofstream
555
My questions is how to read char data present in the file into a char array or string. How to do this using the ifstream::operator>> in code below.
#include <iostream>
#include <fstream>
int main()
{
int a;
string s;
char buf[100];
ifstream in("outdummy.txt",ios_base::in);
in.operator>>(a); //How to read integer? How to read the string data.??
cout << a;
in.close();
getchar();
return 0;
}
If you want to use formatted input, you have to know in advance what data to expect and read it into variables of the according data type. For example, if you know that the number is always the fifth token, as in your example, you could do this:
std::string s1, s2, s3, s4;
int n;
std::ifstream in("outdummy.txt");
if (in >> s1 >> s2 >> s3 >> s4 >> n)
{
std::cout << "We read the number " << n << std::endl;
}
On the other hand, if you know that the number is always on the third line, by itself:
std::string line;
std::getline(in, line); // have line 1
std::getline(in, line); // have line 2
std::getline(in, line); // have line 3
std::istringstream iss(line);
if (iss >> n)
{
std::cout << "We read the number " << n << std::endl;
}
As you can see, to read a token as a string, you just stream it into a std::string. It's important to remember that the formatted input operator works token by token, and tokens are separated by whitespace (spaces, tabs, newlines). The usual fundamental choice to make is whether you process a file entirely in tokens (first version), or line by line (second version). For line-by-line processing, you use getline first to read one line into a string, and then use a string stream to tokenize the string.
A word about validation: You cannot know whether a formatted extraction will actually succeed, because that depends on the input data. Therefore, you should always check whether an input operation succeeded, and abort parsing if it doesn't, because in case of a failure your variables won't contain the correct data, but you have no way of knowing that later. So always say it like this:
if (in >> v) { /* ... */ } // v is some suitable variable
else { /* could not read into v */ }
if (std::getline(in, line)) { /* process line */ }
else { /* error, no line! */ }
The latter construction is usually used in a while loop, to read an entire file line by line:
while (std::getline(in, line)) { /* process line */ }
ifstream has ios_base::in by default. You don't need to specify it.
operator>> can be invoked directly as an operator: in >> a.
Reading strings is the same: in >> s, but the caveat is that it is whitespace-delimited, so it will read "just" by itself, without "dummy".
If you want to read complete lines, use std::getline(in, s).
Since you have elected to use C-strings, you can use the getline method of your ifstream object (not std::getline() which works with std::strings), which will allow you to specify the C-string and a maximum size for the buffer.
Based on what you had, and adding an additional buffer for the second line:
char buf[100];
char buf2[100];
in.getline(buf,sizeof(buf));
in.getline(buf2,sizeof(buf2));
in >> a;
However, as the other poster has proposed, try using the std::string and its methods, it will make your life easier.
You can read file contents and use a Finite State Machine for parsing.
Example:
void Parse(const char* buffer, size_t length);
size_t GetBufferSize();
size_t bufferSize = GetBufferSize();
char* buffer = new char[bufferSize];
std::ifstream in("input.txt");
while(in.getline(buffer, bufferSize)) {
Parse(buffer, in.gcount());
}
Alternatively, you can use a tool like Flex to write your parser.

read space delimited number from file up to newline character

I have a text file that contains the following data.
The first line is this:
5 4 3 2 1
The second line is this:
1 2 3 4 5
I am trying to read data from one line at a time because my first linked-list object is going to use the data from the first line and my second linked-list object is going to use data from the second line. The best I have been able to come up with is the following function:
void polynomial::allocate_poly(std::ifstream& in, const char* file, const char* number)
{
in.open(file);
std::string str;
char b;
int m = 0;
for(int i = 0; !in.eof(); ++i)
{
in >> b;
m = b - '0';
a.insert(m);
}
There is a few problems with this approach. I have tried different binary operators in my for loop such as b == '\n' and none of them seem to trigger when b is a newline character.
Also allocating the numbers from the file this way it looks like
5 5 4 3 2 1 1 2 3 4 5 , so it seems to be copying an extra 5 somewhere, I am not sure if this is the eof bit or not.
I have also attempted to use the getline function but for some reason it seems to only copy the first integer and then dumps the rest of the file. I know certainly I am not using it correctly but all the examples I can find are for typing the file name such as cin.getline and I want to be able to pass my file name as a command line argument when running the program instead.
My question is how can I allocate the numbers on the first row up to the newline char and then pass the ifstream in variable to another object to allocate the second line? Thanks for your help.
You don't say what a is, but never mind: If you want line based parsing, you need to have getline in the outer loop. Also, never use eof, as it doesn't do what you want. Rather use the implicit conversion to bool to check if an operation succeeded.
Here's the typical gig:
#include <sstream>
#include <fstream>
#include <string>
std::string line;
std::ifstream infile("myfile.txt");
while (std::getline(infile, line)) // this does the checking!
{
std::istringstream iss(line);
char c;
while (iss >> c)
{
int value = c - '0';
// process value
}
}
However, this conversion from char to int is cumbersome and fragile. Why not read an integer directly?
int value;
while (iss >> value) { /* ... */ }
Edit: Based on your comment, here's the code to read exactly two lines:
std::string line;
int val;
if (std::getline(infile, line))
{
std::istringstream iss(line);
while (iss >> value) { /* process first line */ }
}
if (std::getline(infile, line))
{
std::istringstream iss(line);
while (iss >> value) { /* process second line */ }
}

How to read integers elegantly using C++ stream?

I have a file full of lines in this format:
1 - 2: 3
I want to only load numbers using C++ streams. Whats the most elegant way to do it? I only thought about cin.get() and checikng each char if it is number or not.
I think this one would be the fastest -yet elegant- way:
int a, b, c;
scanf("%d-%d:%d", &a, &b, &c);
You can use a locale to change what things are read from the file as it is being read. That is, you will filter out all non-numeric values:
struct numeric_only: std::ctype<char>
{
numeric_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['0'], &rc[':'], std::ctype_base::digit);
return &rc[0];
}
};
std::fstream myFile("foo.txt");
myfile.imbue(std::locale(std::locale(), new numeric_only()));
Then when you read your file, it'll convert all non digits to spaces while leaving you only the numbers. After that, you can simply use your normal conversions to transform what is being read into ints.
std::vector<int> intFromFile;
std::istream_iterator<int> myFileIter(myFile);
std::istream_iterator<int> eos;
std::copy(myFileIter, eos, std::back_inserter(intFromFile));
Response to the comments below:
Here is what I did to get it to work
int main(int args, char** argv){
std::fstream blah;
blah.open("foo.txt", std::fstream::in);
if(!blah.is_open()){
std::cout << "no file";
return 0;
}
blah.imbue(std::locale(std::locale(), new numeric_only()));
std::vector<int> intFromFile;
std::istream_iterator<int> myFileIter(blah);
std::istream_iterator<int> eos;
std::copy(myFileIter, eos, std::back_inserter(intFromFile));
return 0;
}
And this put only the ints into the vector, nothing more, nothing less. The reason it wasn't working before was two fold:
I was filling up to '9' but not '9' itself. I've changed the fill to ':'
Numbers larger than what an int can hold are a problem. I'd suggest using longs.
I would recommend doing at least cursory sanity checks when reading this:
int a, b, c;
char dash, colon;
if (not (cin >> a >> dash >> b >> colon >> c) or dash != '-' or colon != ':')
Failure. Do something.
Sorry Konrad, but I recommend: never on pain of death, never never never (is that clear enough? :-) read formatted data from a file. Just don't.
There is only one correct way to do input of formatted data: read chunks of characters (typically lines but you can also read fixed length blocks).
Then parse the input text. You're not going to do a cursory check, you going to use a parser that guarantees to catch any formatting error, and report that error in a comprehensible way, take appropriate action (termination, skip the line and continue, whatever).
Separate input (the I/O operation) from parsing.
This advice from decades of experience as a commerical programmer: reading formatted input is for micky mouse proof-of-principal programs. Even if you have exclusive control of the creation of the file, always parse and check and report errors: after all, stuff changes, it may work today but not tomorrow.
I've been writing C++ for decades and I've never read an integer.
Simply,
ifstream file("file.txt");
int n1, n2, n3;
char tmp;
while (file.good()) {
file >> n1 >> tmp >> n2 >> tmp >> n3;
}
int a,b,c;
cin >> a;
cin.ignore(100,'-');
cin >> b;
cin.ignore(100,':');
cin >> c;
cout << "a = "<< a <<endl;
cout << "b = "<< b <<endl;
cout << "c = "<< c <<endl;
Input:
1 - 2: 3
Output:
a = 1
b = 2
c = 3
See yourself here : http://www.ideone.com/DT9KJ
Note: this can handle extra spaces also. So you can read even this:
1 - 2 : 3
Similar topic:
Using ifstream as fscanf

new >> how would i read a file that has 3 columns and each column contains 100 numbers into an array?

int exam1[100];// array that can hold 100 numbers for 1st column
int exam2[100];// array that can hold 100 numbers for 2nd column
int exam3[100];// array that can hold 100 numbers for 3rd column
int main()
{
ifstream infile;
int num;
infile.open("example.txt");// file containing numbers in 3 columns
if(infile.fail()) // checks to see if file opended
{
cout << "error" << endl;
}
while(!infile.eof()) // reads file to end of line
{
for(i=0;i<100;i++); // array numbers less than 100
{
while(infile >> [exam]); // while reading get 1st array or element
???// how will i go read the next number
infile >> num;
}
}
infile.close();
}
int exam1[100];// array that can hold 100 numbers for 1st column
int exam2[100];// array that can hold 100 numbers for 2nd column
int exam3[100];// array that can hold 100 numbers for 3rd column
int main() // int main NOT void main
{
ifstream infile;
int num = 0; // num must start at 0
infile.open("example.txt");// file containing numbers in 3 columns
if(infile.fail()) // checks to see if file opended
{
cout << "error" << endl;
return 1; // no point continuing if the file didn't open...
}
while(!infile.eof()) // reads file to end of *file*, not line
{
infile >> exam1[num]; // read first column number
infile >> exam2[num]; // read second column number
infile >> exam3[num]; // read third column number
++num; // go to the next number
// you can also do it on the same line like this:
// infile >> exam1[num] >> exam2[num] >> exam3[num]; ++num;
}
infile.close();
return 0; // everything went right.
}
I assume you always have 3 numbers per line. If you know the exact number of lines, replace the while with a for from 0 to the number of lines.
Rule # 1 about reading data from a file: don't trust the contents of the file. You never know with absolute certainty what is in the file until you've read it
That said, one correct way to read lines of data from a file, where each line is composed of multiple whitespace-delimited fields would be to use a combination of getline and stringstream:
std::string line;
while (std::getline(infile, line))
{
std::stringstream ss(line);
int a, b, c;
if (ss >> a >> b >> c)
{
// Add a, b, and c to their respective arrays
}
}
In English, we get each line from the file stream using getline, then parse the line into three integers using a stringstream. This allows us to be certain that each line is formatted correctly.
We check to ensure the extraction of the integers succeeded before we add them to the arrays to ensure that the arrays always have only valid data.
There is other error handling that might be desirable:
In the example, if extraction of the integers from the line fails, we just ignore that line; it could be a good idea to add logic to abort the process or report an error.
After we get three integers, we ignore the rest of the line; it might be a good idea to add checks to ensure that there is no more data on the line after the required integers, depending on how strict the file's formatting needs to be.
After we finish reading the file, we should test to be sure eof() is set and not fail() or bad(); if one of those two flags is set, some error occurred when reading the file.