C++ file stream - c++

Could someone please explain how c++ reads in files? I'm not asking the code to read in a file but after the ifstream >> variable, what are the rules to how c++ grabs the data ?
Is the file read like how a cin would read in the user input? meaning it stops after each whitespace? What happens after it reaches the end of a line? does it automatically proceed to the next line or do I have to write code for that? I know that it stops after eof, but I'm unsure of the process of extracting data and I can't write code if I don't understand the process. Thanks

Yes the input operator >> always "tokenizes" (stops at) whitespace. And reading from any input stream is working the same.
For very good information I suggest this reference. Especially the reference for the input operator is very detailed.

Basically when you call ifstream f;, you are creating a variable with access to the library. From there you must declare your intentions with that variable. Using f.open(fileName, ios::in); you can input from fileName using the >> operator, which actually operates like cin. It stops at white spaces like you'd expect. Once it reaches the end of a line, it continues as long as you have code that asks the operator to extract more. You dont have to do anything extra to tell it to move on to the next line.
More info can be found here.

The iostream formatted input and output operators are essentially defined in terms of the C library functions strtol/strtoul/strtod (cf. 22.4.2.1.2) and sprintf (cf. 22.4.2.2.2), respectively.

In C/C++ there is generally no difference in any stream input (besides details eg.: seeking).
Having C++ there are two distinguished ways of input:
Formatted Input
Unformatted Input
All formatted input operations involve an operator stream& operator >> (stream&, T). However, not all stream& operator >> (stream&, T) are performing formatted input (eg.: some are involving manipulators or a stream buffer)
Each formatted input starts with skipping white spaces and stops at the first character not being part of the input format (Note: It may be any character, it is not limited to white spaces).
Unformatted input reads all characters (does not ignore any white space) and stops if a requested amount of characters is retrieved or the stream reaches the end (EOF). Specialized functions (like std::getline) might stop early and ignore the delimiting condition character.

Related

Is there any way to read characters that satisfy certain conditions only from stdin in C++?

I am trying to read some characters that satisfy certain condition from stdin with iostream library while leave those not satisfying the condition in stdin so that those skipped characters can be read later. Is it possible?
For example, I want characters in a-c only and the input stream is abdddcxa.
First read in all characters in a-c - abca; after this input finished, start read the remaining characters dddx. (This two inputs can't happen simultaneously. They might be in two different functions).
Wouldn't it be simpler to read everything, then split the input into the two parts you need and finally send each part to the function that needs to process it?
Keeping the data in the stdin buffer is akin to using globals, it makes your program harder to understand and leaves the risk of other code (or the user) changing what is in the buffer while you process it.
On the other hand, dividing your program into "the part that reads the data", "the part that parses the data and divides the workload" and the "part that does the work" makes for a better structured program which is easy to understand and test.
You can probably use regex to do the actual split.
What you're asking for is the putback method (for more details see: http://www.cplusplus.com/reference/istream/istream/putback/). You would have to read everything, filter the part that you don't want to keep out, and put it back into the stream. So for instance:
cin >> myString;
// Do stuff to fill putbackBuf[] with characters in reverse order to be put back
pPutbackBuf = &putbackBuf[0];
do{
cin.putback(*(pPutbackBuf++));
while(*pPutbackBuf);
Another solution (which is not exactly what you're asking for) would be to split the input into two strings and then feed the "non-inputted" string into a stringstream and pass that to whatever function needs to do something with the rest of the characters.
What you want to do is not possible in general; ungetc and putback exist, but they're not guaranteed to work for more than one character. They don't actually change stdin; they just push back on an input buffer.
What you could do instead is to explicitly keep a buffer of your own, by reading the input into a string and processing that string. Streams don't let you safely rewind in many cases, though.
No, random access is not possible for streams (except for fstream an stringstream). You will have to read in the whole line/input and process the resulting string (which you could, however, do using iostreams/std::stringstream if you think it is the best tool for that -- I don't think that but iostreams gurus may differ).

Why does string extraction from a stream set the eof bit?

Let's say we have a stream containing simply:
hello
Note that there's no extra \n at the end like there often is in a text file. Now, the following simple code shows that the eof bit is set on the stream after extracting a single std::string.
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&) is defined as behaving like a formatted input function. This means it constructs a sentry object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry construction completes with no problems. When converted to a bool, the sentry object gives true, so the extractor continues to get on with the actual extraction of the string.
The extraction is then defined as:
Characters are extracted and appended until any of the following occurs:
n characters are stored;
end-of-file occurs on the input sequence;
isspace(c,is.getloc()) is true for the next available input character c.
After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.5.5.4).
Nothing here actually causes the eof bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof bit should only be set if we do another ss >> result;, because when the sentry attempts to gobble up whitespace, the following situation will occur:
If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(failbit | eofbit)
However, this is definitely not happening yet because the failbit isn't being set.
The consequence of the eof bit being set is that the only reason the evil-idiom while (!stream.eof()) doesn't work when reading files is because of the extra \n at the end and not because the eof bit isn't yet set. My compiler is happily setting the eof bit when the extraction stops at the end of file.
So should this be happening? Or did the standard mean to say that setstate(eofbit) should occur?
To make it easier, the relevant sections of the standard are:
21.4.8.9 Inserters and extractors [string.io]
27.7.2.2 Formatted input functions [istream.formatted]
27.7.2.1.3 Class basic_istream::sentry [istream::sentry]
std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).
27.7.2.1 Class template basic_istream
2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as
explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_-
base::failure (27.5.5.4), before returning.
Also, "extracting" means calling these two functions.
3 Two groups of member function signatures share common properties: the formatted input functions (or
extractors) and the unformatted input functions. Both groups of input functions are described as if they
obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use
other public members of istream.
So eof must be set.
Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.
I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.

Read binary data from std::cin

What is the easiest way to read binary (non-formated) data from std::cin into either a string or a stringstream?
std::cin is not opened with ios_binary. If you must use cin, then you need to reopen it, which isn't part of the standard.
Some ideas here: https://comp.unix.programmer.narkive.com/jeVj1j3I/how-can-i-reopen-std-cin-and-std-cout-in-binary-mode
Once it's binary, you can use cin.read() to read bytes. If you know that in your system, there is no difference between text and binary (and you don't need to be portable), then you can just use read without worrying.
For windows, you can use the _setmode function in conjunction with cin.read(), as already mentioned.
_setmode(_fileno(stdin), _O_BINARY);
cin.read(...);
See solution source here: http://talmai-oliveira.blogspot.com/2011/06/reading-binary-files-from-cin.html
cin.read would store a fixed number of bytes, without any logic searching for delimiters of the type that #Jason mentioned.
However, there may still be translations active on the stream, such as CRLF -> NL, so it still isn't ideal for binary data.
On a Unix/POSIX system, you can use the cin.get() method to read byte-by-byte and save the data into a container like a std::vector<unsigned int>, or you can use cin.read() in order to read a fixed amount of bytes into a buffer. You could also use cin.peek() to check for any end-of-data-stream indicators.
Keep in mind to avoid using the operator>> overload for this type of operation ... using operator>> will cause breaks to occur whenever a delimiter character is observed, and it will also remove the delimiting character from the stream itself. This would include any binary values that are equivalent to a space, tab, etc. Thus the binary data your end up storing from std::cin using that method will not match the input binary stream byte-for-byte.
All predefined iostream objects are obligated to be bound to corresponding C streams:
The object cin controls input from a stream buffer associated with the object stdin, declared in <cstdio>.
http://eel.is/c++draft/narrow.stream.objects
and thus the method of obtaining binary data is same as for C:
Basically, the best you can really do is this:
freopen(NULL, "rb", stdin);
This will reopen stdin to be the same input stream, but in binary
mode. In the normal mode, reading from stdin on Windows will convert
\r\n (Windows newline) to the single character ASCII 10. Using the
"rb" mode disables this conversion so that you can properly read in
binary data.
https://stackoverflow.com/a/1599093/6049796
cplusplus.com:
Unformatted input
Most of the other member functions of the istream
class are used to perform unformatted input, i.e. no interpretation is
made on the characters got form the input. These member functions can
get a determined number of characters from the input character
sequence (get, getline, peek, read, readsome)...
As Lou Franco pointed out, std::cin isn't opened with std::ios_base::binary, but one of those functions might get you close to the behavior you're looking for.
With windows/mingw/msys/bash, if you need to pipe different commands with binary streams in between, you need to manipulate std::cin and std::cout as binary streams.
The _setmode solution from Mikhail works perfectly.
Using MinGW, the neaded headers are the following:
#include <io.h>
#include <fcntl.h>

Why doesn't whitespace act as a delimiter when printing strings with ostream operator<<?

I know that when writing code like std::cin >> some_var; , where some_var is a string variable, only the first word that was inputted will be stored in some_var. But I do not understand why std::cout << "something here"; does not only output "something". Am I missing something?
When reading input using cin >> some_var, the delimiter is space by default (you can change this though), while when printing, cout prints till its find \0 which is end of the string.
If you want to read till it finds \0 in the input stream, then you've to write this:
std::getline(std::cin, some_var, '\0');
You can give any other character as delimiter as third argument of std::getline function.
Note that there is a member function with same name getline which is slightly different than the one I used above which is a free standalone function.
Compare:
std::getline - free standalone function. An overloaded function is also available.
istream::getline - member function of std::istream
I used the first one.
It was deemed useful that cin, when reading into a string terminates at whitespace. But it wasn't deemed useful that only the first word is printed when you print a string.
After all you said print "something here" to cout· In the other case, you just said read something from cin. The choice between a word, between a line and between the whole content of stdin (until an EOF is received) is arbitrary and the design happens to be to read a word. That makes it easy to quickly read a record line like "john 10 2.15" (first read into string, then into an int, and then into a float). Use std::getline to read a whole line into a string.
Because when you take input from std::cin, it needs to know when to stop taking input. If it didn't stop at the first word, when would it stop? However, when you output a variable, the answer is easy- you output that one variable.
Yes... cin stops at a space because, by default, it thinks you want a word. cout doesn't follow the same restriction, because, well, why would it? Think about it for a second — would it make any sense whatsoever if printing out "Hello, world!" actually just printed out "Hello,"? Of course it doesn't make any sense. The developer knows what they want to output (at least, we hope :D).
I think they took a convention and stick to it. In the case of <<, it is clear that you specify what to be written. In the case of a >>, they have to decide where to stop. Newline? Maybe this is the most natural alternative, but they just decided to stop reading at a space.
The delimiters are different for cin and for cout. For cin, input is split at whitespaces, for cout, at end of lines. You can change those default delimiters though.

How do I set EOF on an istream without reading formatted input?

I'm doing a read in on a file character by character using istream::get(). How do I end this function with something to check if there's nothing left to read in formatted in the file (eg. only whitespace) and set the corresponding flags (EOF, bad, etc)?
Construct an istream::sentry on the stream. This will have a few side effects, the one we care about being:
If its skipws format flag is set, and the constructor is not passed true as second argument (noskipws), all leading whitespace characters (locale-specific) are extracted and discarded. If this operation exhausts the source of characters, the function sets both the failbit and eofbit internal state flags
You can strip any amount of leading (or trailing, as it were) whitespace from a stream at any time by reading to std::ws. For instance, if we were reading a file from STDIN, we would do:
std::cin >> std::ws
Credit to this comment on another version of this question, asked four years later.
How do I end this function with something to check if there's nothing left to read in formatted in the file (eg. only whitespace)?
Whitespace characters are characters in the stream. You cannot assume that the stream will do intelligent processing for you. Until and unless, you write your own filtering stream.
By default, all of the formatted extraction operations (overloads of operator>>()) skip over whitespace before extracting an item -- are you sure you want to part ways with this approach?
If yes, then you could probably achieve what you want by deriving a new class, my_istream, from istream, and overriding each operator>>() to call the following method at the end:
void skip_whitespace() {
char ch;
ios_base old_flags = flags(ios_base::skipws);
*this >> ch; // Skips over whitespace to read a character
flags(old_flags);
if (*this) { // I.e. not at end of file and no errors occurred
unget();
}
}
It's quite a bit of work. I'm leaving out a few details here (such as the fact that a more general solution would be to override the class template basic_istream<CharT, Traits>).
istream is not going to help a lot - it functions as designed. However, it delegates the actual reading to streambufs. If your streambuf wrapper trims trailing whitespace, an istream reading from that streambuf won't notice it.