Calling unget when EOF is triggered - c++

I am reading characters from an ifstream, if those characters don't match a certain criteria, then I unget() a number of times equal to those characters. This all works fine up until I get to the end of the file. Then if I try to unget(), the good bit is set to 0.
This means no characters are unget and when I read again I get blank characters. How can I unget without properly if I reach EOF?
Thanks :)
EDIT:
I have attempted to replace my unget() with putback(), however no I am encountering the error that some rather than putting back characters it seems to be overwriting characters in the ifstream. This is the code I am using:
if (substr.length() > 0)
{
for (int i = substr.length()-1; i >= 0; --i)
{
std::cout << "at " << i << ": " << substr.at(i) << " ";
infile.putback(substr.at(i));
}
}
Where substr is the string to putback to the stream. I put it back in reverse order, so it is read in the correct order next time.
If the stream has the following string up next ", r1", and I am putting " #1" back to the stream. After the putback, I expect it to read the next 5 characters as " #1, ", however it is reading it as "#r1"... Somewhere the "1, " is being overwritten.
EDIT/SOLVED:
Rather than using unget() or putback(), I instead used seekg() and tellg(). I store the starting position with tellg(), then read however many characters, find the matching string and seek back to the start position + the length of the matching string.

First of all, you need to be sure that no iostate bits are set (fail, bad or eof. In C++11 eof bit will be cleared automatically) before calling unget.
Then check stram state after operation. If failbit is set, then you forgot to clear stream state before calling unget. If badbit is set, then there is an internal error. Possibly stream cannot back any more.
You are not guaranteed that you can unget even one character, unlike putback, where putting back at least one character should be supported by the library.

Related

How to detect the newline character with gzgetc in c/c++

I want to read an entire line of a file character by character using gzgetc and stop when the newline is encountered. I know there is a function to grab the entire line but I would like to try to do it this way first. I tried:
Int c;
do {
c = gzgetc((gzFile) fp);
cout << c;
} while (c != '\n');
The result was an infinite loop. I tried adding (char) before c, still the same result. What am I doing wrong? The data file I am trying to read is encoded in base64 and I want to read in each token separated by space. Some of the lines are variable length and have a mixture of encoded and not encoded data which I set up an algorithm for I just need to know how to stop at newline.
You need to also check for gzgetc() returning -1, which indicates an error or end of file, and exiting the loop in that case. Your infinite loop is likely due to one of those.

How Can I Detect That a Binary File Has Been Completely Consumed?

If I do this:
ofstream ouput("foo.txt");
output << 13;
output.close();
ifstream input("foo.txt");
int dummy;
input >> dummy;
cout << input.good() << endl;
I'll get the result: "0"
However if I do this:
ofstream ouput("foo.txt", ios_base::binary);
auto dummy = 13;
output.write(reinterpret_cast<const char*>(&dummy), sizeof(dummy));
output.close();
ifstream input("foo.txt", ios_base::binary);
input.read(reinterpret_cast<char*>(&dummy), sizeof(dummy));
cout << input.good() << endl;
I'll get the result: "1"
This is frustrating to me. Do I have to resort to inspecting the ifstream's buffer to determine whether it has been entirely consumed?
Regarding
How Can I Detect That a Binary File Has Been Completely Consumed?
A slightly inefficient but easy to understand way is to measure the size of the file:
ifstream input("foo.txt", ios_base::binary);
input.seekg(0, ios_base::end); // go to end of the file
auto filesize = input.tellg(); // current position is the size of the file
input.seekg(0, ios_base::beg); // go back to the beginning of the file
Then check current position whenever you want:
if (input.tellg() == filesize)
cout << "The file was consumed";
else
cout << "Some stuff left in the file";
This way has some disadvantages:
Not efficient - goes back and forth in the file
Doesn't work with special files (e.g. pipes)
Doesn't work if the file is changed (e.g. you open your file in read-write mode)
Only works for binary files (seems your case, so OK), not text files
So better just use the regular way people do it, that is, try to read and bail if it fails:
if (input.read(reinterpret_cast<char*>(&dummy), sizeof(dummy)))
cout << "I have read the stuff, will work on it now";
else
cout << "No stuff in file";
Or (in a loop)
while (input.read(reinterpret_cast<char*>(&dummy), sizeof(dummy)))
{
cout << "Working on your stuff now...";
}
You are doing totally different things.
The operator>> is greedy and will read as much as possible into dummy. It so happens that while doing so, it runs into the end of file. That sets the input.eof(), and the stream is no longer good(). As it did find some digits before the end, the operation is still successful.
In the second read, you ask for a specific number of bytes (4 most likely) and the read is successful. So the stream is still good().
The stream interface doesn't predict the outcome of any future I/O, because in the general case it cannot know. If you use cin instead of input there might now be more to read, if the user continued typing.
Specifically, the eof() state doesn't appear until someone tries to read past end-of-file.
For text streams, as you have written only the integer value and not even a space not an end of line, at read time, the library must try to read one character passed the 1 and 3 and hits the end of file. So the good bit is false and the eof is true.
For binary streams, you have written 4 bytes (sizeof(int)) assuming ints are 32 bits large, and you read 4 bytes. Fine. No problem has still occured and the good bit is true and eof false. Only next read will hit the end of file.
But beware. In text example, if you open the text file in a editor and simply save it without changing anything, chances are that the editor automacally adds an end of line. In that case, the read will stop on the end of line and as for the binary case the good bit will be true and eof false. Same is you write with output << 13 << std::endl;
All that means that you must never assume that a read is not the last element of a file when good it true and eof is false, because the end of file may be hit only on next read even if nothing is returned then.
TL/DR: the only foolproof way to know that there is nothing left in a file is when you are no longer able to read something from it.
You do not need to resort to inspecting the buffer. You can determine if the whole file has been consumed: cout << (input.peek() != char_traits<char>::eof()) << endl This uses: peek, which:
Reads the next character from the input stream without extracting it
good in the case of the example is:
Returning false after the last extraction operation, which occurs because the int extraction operator has to read until it finds a character that is not a digit. In this case that's the EOF character, and when that character is read even as a delimiter the stream's eofbit is set, causing good to fail
Returning true after calling read, because read extracts exactly sizeof(int)-bytes so even if the EOF character is the next character it is not read, leaving the stream's eofbit unset and good passing
peek can be used after either of these and will correctly return char_traits<char>::eof() in both cases. Effectively this is inspecting the buffer for you, but with one vital distinction for binary files: If you were to inspect a binary file yourself you'd find that it may contain the EOF character. (On most systems that's defined as 0xFF, 4 of which are in the binary representation of -1.) If you are inspecting the buffer's next char you won't know whether that's actually the end of the file or not.
peek doesn't just return a char though, it returns an int_type. If peek returns 0x000000FF then you're looking at an EOF character, but not the end of file. If peek returns char_traits<char>::eof() (typically 0xFFFFFFFF) then you're looking at the end of the file.

input from a file using get line

I am trying to read from a file, and I have separated them by a new line character. I am using these code :
fstream input("wordfile.dat", ios::in);
char b[10];
while (!input.eof())
{
input.getline(b, 10);
cout << b << endl;
}
If I change the loop statement from while(!input.eof()) to while(input) , the program will output a blank line before the loop ends. But now it won't. The question is, in both statements the while condition must first input a line and by inputting it, it will know if it has reached end of file or if there is still more information. So input.eof() must act just like the other statement and output a blank line. First I thought it was a mistake, but I wondered why it was acting correctly. What is the difference between these two conditions?
Looking at operator bool we see ...
Notice that this function does not return the same as member good [...]
... that if (stream) is not the same as if (stream.good()), but also learn that it ...
Returns whether an error flag is set (either failbit or badbit).
So it's basically the same as not stream.fail() (which is true if either failbit or badbit is set).
This also explains the different behavior between while (stream) and while (not stream.eof()):
When the input file does not end with a newline, then stream.getline(buffer, size) will encounter the end of file before reaching the delimiting newline character (or the 10 character limit) and thus set the eofbit. Testing the stream with its operator bool will then be still true (since neither failbit nor badbit are set), and only after trying to read more using getline will set the failbit since no characters are extracted.
But when testing with not stream.eof(), the eofbit alone will end the loop.
If the stream is good, which is what you're testing with,
if (stream) // ...
then, this means that the stream is neither at the end of file (eof), nor bad nor failed.
So when it's not at the end of file, then it could still have failed or be in a bad state.
See the table here.
When reading (or writing) a stream, test for good unless you have a specific reason not to do so.
As a side note, this happens when you do input like the following, since getline returns a reference to the instance it's called on:
while (stream.getline(buffer, size)) {
// ..
}

Having problems with 0x0A character in C++ even in binary mode. (interprets it as new file)

Hi this might seem a bit noobie, but here we go. Im developing a program that downloads leaderboards of a certain game from the internet and transforms it into a proper format to work with it (elaborate rankings, etc).
The files contains the names, ordered by rank, but between each name there are 7 random control codes (obivously unprintable). The txt file looks like this:
..C...hName1..)...&Name2......)Name3..é...þName4..Ü...†Name5..‘...QName6..~...bName7..H...NName8..|....Name9..v...HName10.
Checked via an hexEditor and saw the first control code after each name is always a null character (0x00). So, what I do is read everything, and then cout every character. When a 0x00 character is found, skip 7 characters and keep couting. Therefore you end up with the list, right?
At first I had the problem that on those random control codes, sometimes you would find like a "soft EOF" (0x1A), and the program would stop reading there. So I finally figured out to open it in binary mode. It worked, and then everything would be couted... or thats what I thought.
But I came across another file which still didn't work, and finally found out that there was an EOF character! (0x0A) Which doesn't makes sense since Im opening it in binary mode. But still, after reading that character, C++ interprets that as a new file, and hence skips 7 characters, so the name after that character will always appear cut.
Here's my current code:
#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;
int main () {
string scores;
system("wget http://certainwebsite/001.txt"); //download file
ifstream highin ("001.txt", ios::binary);
ofstream highout ("board.txt", ios::binary);
if (highin.is_open())
{
while ( highin.good() )
{
getline (highin, scores);
for (int i=0;i<scores.length(); i++)
{
if (scores[i]==0x00){
i=i+7; //skip 7 characters if 'null' is found
cout << endl;
highout << endl;
}
cout << scores[i];
highout << scores[i]; //cout names and save them in output file
}
}
highin.close();
}
else cout << "Unable to open file";
system("pause>nul");
}
Not sure how to ignore that character if being already in binary mode doesn't work. Sorry for the long question but I wanted to be detailed and specific. In this case, the EOF character is located before the Name3, and hence this is how the output looks like:
http://i.imgur.com/yu1NjoZ.png
By default getline() reads until the end of line and discards the newline character. However, the delimiter character could be customized (by supplying the third parameter). If you wish to read until the null character (not until the end of line), you could try using getline (highin, scores, '\0'); (and adjusting the logic of skipping the characters).
I'm glad you figured it out and it doesn't surprise me that getline() was the culprit. I had a similar issue dealing with the newline character when I was trying to read in a CSV file. There are several different getline() functions in C++ depending on how you call the function and each seems to handle the newline character differently.
As a side note, in your for loop, I'd recommend against performing a method call in your test. That adds unnecessary overhead to the loop. It'd be better to call the method once and put that value into a variable, then enter the loop and test i against the length variable. Unless you expect the length to change, calling the length() method each iteration is a waste of system resources.
Thank you all guys, it worked, it was the getline() which was giving me problems indeed. Due to the 'while' loop, each time it found a new line character, it restarted the process, hence skipping those 7 characters.

Actual difference between end of line and end of file under windows?

I understand EOF and EOL but when I was reading this question (second part of answer) and i got my concepts broken :
Specially the para :
It won't stop taking input until it finds the end of file(cin uses
stdin, which is treated very much like a file)
so i want to know when we do some thing like in c++ under windows :
std::cin>>int_var; , and we press enter , this end the input but according to reference link it should only stop taking input after hitting ctrl+z.
So i would love to know how std::*stream deal with EOF and EOL.
Second part:
please have a look at this example :
std::cin.getline(char_array_of_size_256 ,256);
cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
cout << "artist is " << artist << endl;
If i remove std::cin.ignore() it simply stops taking input (which is known case) but when i keep it , it waits for a new input which is ended by '\n' . But it should simply clear up stream rather then waiting for any new input ending-up with '\n'.
Thanks for giving you time)
End-of-line and end-of-file are very different concepts.
End-of-line is really just another input character (or character sequence) that can appear anywhere in an input stream. If you're reading input one character at a time from a text stream, end-of-line simply means that you'll see a new-line ('\n') character. Some input routines treat this character specially; for example, it tells getline to stop reading. (Other routines treat ' ' specially; there's no fundamental difference.)
Different operating systems use different conventions for marking the end of a line. On Linux and other Unix-like systems, the end of a line in a file is marked with a single ASCII linefeed (LF, '\n') character. When reading from a keyboard, both LF and CR are typically mapped to '\n' (try typing either Enter, Control-J, or Control-M). On Windows, the end of a line in a file is marked with a CR-LF pair (\r\n). The C and C++ I/O systems (or the lower-level software they operate on top of) map all these markers to a single '\n' character, so your program doesn't have to worry about all the possible variations.
End-of-file is not a character, it's a condition that says there are no more characters available to be read. Different things can trigger this condition. When you're reading from a disk file, it's just the physical end of the file. When you're reading from a keyboard on Windows, control-Z denotes end-of-file; on Unix/Linux, it's typically control-D (though it can be configured differently).
(You'll usually have an end-of-line (character sequence) just before end-of-file, but not always; input can sometimes end in an unterminated line, on some systems.)
Different input routines have different ways of indicating that they've seen an end-of-file condition. Read the documentation for each one for the details.
As for EOF, that's a macro defined in <stdio.h> or <cstdio>. It expands to a negative integer constant (typically -1) that's returned by some functions to indicate that they've reached an end-of-file condition.
EDIT: For example, suppose you're reading from a text file containing two lines:
one
two
Let's say you're using C's getchar(), getc(), or fgetc() function to read one character at a time. The values returned on successive calls will be:
'o', 'n', 'e', '\n', 't', 'w', 'o', '\n', EOF
Or, in numeric form (on a typical system):
111, 110, 101, 10, 116, 119, 111, 10, -1
Each '\n', or 10 (0x0a) is a new-line character read from the file. The final -1 is the value of EOF; this isn't a character, but an indication that there are no more characters to be read.
Higher-level input routines, like C's fgets() and C++'s std::cin >> s or std::getline(std::cin, s), are built on top of this mechanism.
First "part"
so i want to know when we do some thing like in c++ under windows : std::cin>>int_var; , and we press enter , this end the input but according to reference link it should only stop taking input after hitting ctrl+z.
No, you're confusing formatted input operations with stream iterators. The following will use the formatted input operation (operator>>) repeatedly until the end of file is reached because the "end iterator" represents the end of the stream.
std::vector<int> integers;
std::copy(
std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::back_inserter(integers));
If you use the following:
int i = 0;
std::cin >> i;
in an interactive shell (e.g. in console mode), std::cin will block on user input which is acquired line by line. So, if no data (or only white space) is available, this operation will actually force the user to type a line of input and press the enter key.
However,
int i = 0;
int j = 0;
std::cin >> i >> j;
may block on one or two lines of input, depending on what the user types. In particular, if the user types
1<space>2<enter>
then the two input operations will be applies using the same line of input.
Second "part"
Considering the code snippet:
std::cin.getline(char_array_of_size_256 ,256);
cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
cout << "artist is " << artist << endl;
If the line contains 255 or less lines of character data, std::cin.getline() will consume the end-of-line character. Thus, the second line will consume all characters until the next line is completed. If you want to capture only the current line and ignore all characters past 256, I suggest you use something like:
std::cin.getline(char_array_of_size_256 ,256);
if (std::cin.gcount() == 256) {
cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
cout << "artist is " << artist << endl;
On the second part:
When the linked answer said "read into a string", I guess they meant
std::string s;
std::getline(std::cin, s);
which always reads the entire line into the string s (while setting s to the proper size).
That way there is nothing left over from the input line to clean up.