C++ getline()'s undocumented behavior - c++

In C++ when you use getline() with delimiter on stringstream there are few things that I didn't found documented, but they have some non-error handy behaviour when:
delimiter is not found => then simply whole string/rest of it is returned
there is delimiter but nothing before it => empty string is returned
getting something that isn't really there => returns the last thing that could be read with it
Some test code (simplified):
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
string test(const string &s, char delim, int parseIndex ){
stringstream ss(s);
string parsedStr = "";
for( int i = 0; i < (parseIndex+1); i++ ) getline(ss, parsedStr, delim);
return parsedStr;
}
int main() {
stringstream ss("something without delimiter");
string s1;
getline(ss,s1,';');
cout << "'" << s1 << "'" << endl; //no delim
cout << endl;
string s2 = "321;;123";
cout << "'" << test(s2,';',0) << "'" << endl; //classic
cout << "'" << test(s2,';',1) << "'" << endl; //nothing before
cout << "'" << test(s2,';',2) << "'" << endl; //no delim at the end
cout << "'" << test(s2,';',3) << "'" << endl; //this shouldn't be there
cout << endl;
return 0;
}
Test code output:
'something without delimiter'
'321'
''
'123'
'123'
Test code fiddle: http://ideone.com/ZAuydR
The Question
The question is - can this be relied on? If so, where is it documented - is it?
Thanks for answers and clarifying :)

The behavior of getline is explicitly documented in the standard (C++11 §21.4.8.9 ¶7-10), which is the only normative document about C++.
The behavior your asked about in the first two questions is guaranteed, while the third one is a consequence of how your test rig is made.
template<class charT, class traits, class Allocator>
basic_istream<charT,traits>&
getline(basic_istream<charT,traits>& is,
basic_string<charT,traits,Allocator>& str,
charT delim);
template<class charT, class traits, class Allocator>
basic_istream<charT,traits>&
getline(basic_istream<charT,traits>&& is,
basic_string<charT,traits,Allocator>& str,
charT delim);
Effects: Behaves as an unformatted input function (27.7.2.3), except that it does not affect the value
returned by subsequent calls to basic_istream<>::gcount(). After constructing a sentry object,
if the sentry converts to true, calls str.erase() and then extracts characters from is and appends
them to str as if by calling str.append(1, c) until any of the following occurs:
end-of-file occurs on the input sequence (in which case, the getline function calls is.setstate(ios_base::eofbit)).
traits::eq(c, delim) for the next available input character c (in which case, c is extracted but
not appended) (27.5.5.4)
str.max_size() characters are stored (in which case, the function calls is.setstate(ios_base::failbit)) (27.5.5.4)
The conditions are tested in the order shown. In any case, after the last character is extracted, the
sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios_base::failbit) which may throw
ios_base::failure (27.5.5.4).
Returns: is.
Coming to your questions:
delimiter is not found => then simply whole string/rest of it is returned
That's a consequence of the first exit condition - when the input string terminates the string stream goes in end-of-file, so the extraction terminates (after having added all the preceding characters to the output string).
there is delimiter but nothing before it => empty string is returned
That's just a special case of the second point - the extraction terminates when the delimiter is found (traits::eq(c, delim) normally boils down to c==delim), even if no other character has been extracted before.
getting something that isn't really there => returns the last thing that could be read with it
It doesn't go exactly like this. If the stream is in an error condition (the sentry object does not convert to true, in the description above) - in your case you have an EOF -, getline leaves your string alone and returns. In your test code you see the last read data just because you are recycling the same string without clearing it between the various tests.

The behavior of C++ facilities is described by the ISO C++ standard. But, it's not the most readable resource. In this case, cppreference.com has good coverage.
Here's what they have to say. The quote blocks are copy-pasted; I've interspersed explanations to your questions.
Behaves as UnformattedInputFunction, except that input.gcount() is not affected. After constructing and checking the sentry object, performs the following:
"Constructing and checking the sentry" means that if an error condition has been detected on the stream, the function will return without doing anything. This is why in #3 you observe the last valid input when "nothing should be there."
1) Calls str.erase()
So, if nothing is subsequently found before the delimiter, you'll get an empty string.
2) Extracts characters from input and appends them to str until one of the following occurs (checked in the order listed)
a) end-of-file condition on input, in which case, getline sets eofbit.
This is an error condition which causes the string local variable to be unchanged by subsequent getlines.
It also allows you to observe the last segment of input before the end, so you may treat the end-of-file as a delimiter if you wish.
b) the next available input character is delim, as tested by Traits::eq(c, delim), in which case the delimiter character is extracted from input, but is not appended to str.
c) str.max_size() characters have been stored, in which case getline sets failbit and returns.
3) If no characters were extracted for whatever reason (not even the discarded delimiter), getline sets failbit and returns.

Related

It seems to me as if std::getline doesn't handle new lines correctly?

With the code below, I want the user to write a text in the terminal, and then print out the last sentence of the text. Maybe I should mention that I'm running on a Linux desktop.
#include <string>
#include <iostream>
int main()
{
std::string user_text{};
while(std::getline(std::cin, user_text))
{
}
std::cout << "Text: " << user_text << std::endl;
return 0;
}
Anyways if I, after running the program, write for example:
Hi my name is
And then press 'ctrl+d', the output will indeed be "Text: Hi my name is"
However if I instead do this:
Hi my name is 'press enter'
Name my is hi 'press enter'
And then press 'ctrl+d'. The output will be "Text: ". Why is this? Shouldn't getline stop when I have pressed 'ctrl+d'?
Thanks in advance!
std::getline() erases the output std::string before attempting to read from the stream.
In your second case, the first 2 calls to std::getline() have already read everything you have typed in, there is nothing left when you press CTRL-D during the 3rd call, so there is nothing for std::getline() to output into the std::string.
Save the last successful line read to a separate variable, eg:
std::string user_text, line;
while(std::getline(std::cin, line))
{
user_text = line;
}
std::cout << "Text: " << user_text << std::endl;
std::getline is working as intended: it's getting a line. If you press enter, it creates a new, empty line; if you then press ctrl+d, you're terminating std::getline, which returns that (empty) line's contents.
From the docs:
getline reads characters from an input stream and places them into a string:
Behaves as UnformattedInputFunction, except that input.gcount() is not affected. After constructing and checking the sentry object, performs the following:
Calls str.erase()
Extracts characters from input and appends them to str until one of the following occurs (checked in the order listed)
a) end-of-file condition on input, in which case, getline sets eofbit.
b) the next available input character is delim, as tested by Traits::eq(c, delim), in which case the delimiter character is extracted from input, but is not appended to str.
c) str.max_size() characters have been stored, in which case getline sets failbit and returns.
If no characters were extracted for whatever reason (not even the discarded delimiter), getline sets failbit and returns.
Same as getline(input, str, input.widen('\n')), that is, the default delimiter is the endline character.
Ctrl+D causes the process's read from the terminal to return immediately. If you press Ctrl+D after typing: Hi my name is, the process will read: Hi my name is. getline will not find a \n and will restart reading. Then you press Ctrl+D a second time (you didn't say it but I am sure you did). And this will interrupt the read, causing it to return 0, which is as-if the terminal was closed. getline will then return the current value: Hi my name is.
In the second case, you haven't typed anything since the last \n, so when you press Ctrl+D, read directly returns 0 and getline returns with an empty string.

Why is C++'s file I/O ignoring initial empty lines when reading a text file? How can I make it NOT do this?

I'm trying to build myself a mini programming language using my own custom regular expression and abstract syntax tree parsing library 'srl.h' (aka. "String and Regular-Expression Library") and I've found myself an issue I can't quite seem to figure out.
The problem is this: When my custom code encounters an error, it obviously throws an error message, and this error message contains information about the error, one bit being the line number from which the error was thrown.
The issue comes in the fact that C++ seems to just be flat out ignoring the existence of lines which contain no characters (ie. line that are just the CRLF) until it finds a line which does contain characters, after which point it stops ignoring empty lines and treats them properly, thus giving all errors thrown an incorrect line number, with them all being incorrect by the same offset.
Basically, if given a file which contains the contents "(crlf)(crlf)abc(crlf)def", it'll be read as though its content were "abc(crlf)def", ignoring the initial new lines and thus reporting the wrong line number for any and all errors thrown.
Here's a copy of the (vary messily coded) function I'm using to get the text of a text file. If one of y'all could tell me what's going on here, that'd be awesome.
template<class charT> inline std::pair<bool, std::basic_string<charT>> load_text_file(const std::wstring& file_path, const char delimiter = '\n') {
std::ifstream fs(file_path);
std::string _nl = srl::get_nlp_string<char>(srl::newline_policy);
if (fs.is_open()) {
std::string s;
char b[SRL_TEXT_FILE_MAX_CHARS_PER_LINE];
while (!fs.eof()) {
if (s.length() > 0)
s += _nl;
fs.getline(b, SRL_TEXT_FILE_MAX_CHARS_PER_LINE, delimiter);
s += std::string(b);
}
fs.close();
return std::pair<bool, std::basic_string<charT>>(true, srl::string_cast<char, charT>(s));
}
else
return std::pair<bool, std::basic_string<charT>>(false, std::basic_string<charT>());
}
std::ifstream::getline() does not input the delimiter (in this case, '\n') into the string and also flushes it from the stream, which is why all the newlines from the file (including the leading ones) are discarded upon reading.
The reason it seems the program does not ignore newlines between other lines is because of:
if (s.length() > 0)
s += _nl;
All the newlines are really coming from here, but this cannot happen at the very beginning, since the string is empty.
This can be verified with a small test program:
#include <iostream>
#include <fstream>
#include <string>
int main()
{
std::ifstream inFile{ "test.txt" }; //(crlf)(crlf)(abc)(crlf)(def) inside
char line[80]{};
int lineCount{ 0 };
std::string script;
while (inFile.peek() != EOF) {
inFile.getline(line, 80, '\n');
lineCount++;
script += line;
}
std::cout << "***Captured via getline()***" << std::endl;
std::cout << script << std::endl; //prints "abcdef"
std::cout << "***End***" << std::endl << std::endl;
std::cout << "Number of lines: " << lineCount; //result: 5, so leading /n processed
}
If the if condition is removed, so the program has just:
s += _nl;
, newlines will be inserted instead of the discarded ones from the file, but as long as '\n' is the delimiter, std::ifstream::getline() will continue discarding the original ones.
As a final touch, I would suggest using
while (fs.peek() != EOF){};
instead of
while(fs){}; or while(!fs.eof()){};
If you look at int lineCount's final value in the test program, the latter two give 6 instead of 5, as they make a redundant iteration in the end.

Why does reading '\n' character from keyboard into string variable not work & how can I do that? [duplicate]

How do I also read a new line using C++ >> operator?
ifstream input("doc.txt".c_str());
vector<string> contents;
while (input >> word) {
contents.push_back(word);
}
For a file:
hello
world
C++ is the best tool
should return
hello
\n
world
\n
C++
is
the
best
tool
P/S: this is a reduced problem from a bigger one. The way I parse file lead to this problem.
You can use std::getline, and push_back the "\n" yourself, as mentioned by jaggedSpire:
std::ifstream input("doc.txt");
std::vector<std::string> contents;
for (std::string line; std::getline(input, line);) {
std::istringstream str(line);
for (std::string word; str >> word;) {
contents.push_back(word);
}
contents.push_back("\n");
}
If you're looking to specifically use operator>> and you don't technically need to use strings specifically, you can simply make a custom class with the behavior you want when it's read in from an istream. It can even be (mostly) a wrapper for a string, with custom behavior when reading initial whitespace.
class StringAndNewline{
std::string str_;
friend std::istream& operator>>(std::istream& in, StringAndNewline& str);
public:
StringAndNewline() : str_(){}
StringAndNewline(std::string str) : str_(str){}
const std::string& str() const noexcept {return str_;}
std::string release() {return std::move(str_);}
};
The string read in operator automatically ignores all preceding whitespace to a sequence of non-whitespace characters, as defined by the present locale. This is the behavior you want to change, and as it turns out it's pleasantly simple to do so.
Disposal of the initial whitespace is commonly preformed by something called a sentry object, which also checks that the stream is valid and sets the stream's failbit if it's at the end of the file. While its default behavior is to consume whitespace until it encounters a non-whitespace character, this is controlled by a flag in its constructor, so we can use that very nice encapsulated stream validity check it offers.
The string overload of operator>> makes and checks a sentry, then reads until it encounters whitespace, the end of the stream, or a read fails. We can simply ensure that its sentry never encounters whitespace by dealing with it ourselves.
Thus the ultimate read-in structure for our custom class' custom operator>> will look something like this:
make non-whitespace eating sentry
check sentry, returning the failed stream if it's invalid
deal with whitespace
read data into wrapped string
return the stream
Since we're only concerned with the '\n' characters in our whitespace that's simple too: just loop while the stream is valid (if it runs out of space before hitting either of our conditions, it sets failbit like we would want) and exit the loop if one of two conditions are net: we get a newline character, or we get a non-whitespace character. Again, pleasantly simple:
std::istream& operator>>(std::istream& in, StringAndNewline& str){
std::istream::sentry sentry{in, true}; // make a sentry that doesn't eat whitespace
if(!sentry){return in;} // check the sentry
std::locale
presentLocale{}; // get the present locale
char presentChar;
while(in.get(presentChar)){ // while the stream is valid
if(presentChar == '\n'){ // if we get a newline
str.str_ = "\\n"; // set the string to an escaped newline
break; // exit the loop
}
// if we get a non-whitespace character
else if(!std::isspace(presentChar, presentLocale)){
in.unget(); // replace the character in the stream
in >> str.str_; // take advantage of the existing string operator
break; // done with loop
}
}
return in; // return the istream, whatever state it might be in
}
Once this is done, we set up an ostream operator for ease of printing:
std::ostream& operator<<(std::ostream& out, const StringAndNewline& str){
return out << str.str();
}
and test our code:
int main (){
std::istringstream file(
"hello\n"
"world\n"
"C++ is the best tool"
);
StringAndNewline
wordOrNewline;
while(file >> wordOrNewline){
std::cout << wordOrNewline << '\n';
}
}
which prints this:
hello
\n
world
\n
C++
is
the
best
tool
just like we wanted! Live on Coliru
You could even write a string operator if you really wanted to to easily convert the wrapper class to strings, but I'll leave that to you.
Try using getline (http://www.cplusplus.com/reference/istream/istream/getline/). getline will go through each line (until it sees the new line character) and returns 0 when it reaches end of file. So after each call to getline and printing it print the \n as well. Here is an example for your problem, randFile is a random file with text in it.
1 #include <iostream>
2 #include <fstream>
3 int main(){
4
5 std::ifstream myFile("randFile", std::ifstream::in);
6 char s[BUFSIZ];
7
8 while(myFile.getline(s, BUFSIZ)){
9 std::cout << s << std::endl;
10 std::cout << "\\n"<< std::endl;
11 }
12
13 return 0;
14 }
First off , you are already passing as const char * in the constructor of the stream.
Secondly , the stream reader reads characters not space , this is how it knows when to cut to strings.
Usually we read a binary file , there is a character known by the reader that tells when we skip a line the famous \n but its differents from platforms(Win , Unix).

Why istringstream appends '-1' character at the end of stream?

I've noticed that when I'm using istringstream eof() doesn't return true even if the whole string is "consumed". For example:
char ch;
istringstream ss{ "0" };
ss >> ch;
cout << ss.peek() << " " << (ss.eof() ? "true" : "false");
Outputs(VS2015):
-1 false
eof() isn't supposed to return true when all the data is consumed. It's supposed to return true when you attempt to read more data than is available.
In this example, you never do that.
In particular, peek is a "request", that won't set EOF even when there's nothing left to read; because you're, well, peeking. However, it will return the value of the macro EOF (commonly -1), which is what you're seeing when you output peek()'s result. Nothing is "appended" to the stream.
Read the documentation for functions that you use.
std::istream::peek
Peek next character Returns the next character in the input sequence,
without extracting it: The character is left as the next character to
be extracted from the stream.
If any internal state flags is already set before the call or is set
during the call, the function returns the end-of-file value (EOF).
http://www.cplusplus.com/reference/istream/istream/peek/

Understanding a C++ program [ Bjarne Stroustrup's book ]

i need your precious help for a small question!
I'm reading the Bjarne Stroustrup's book and i found this exemple:
int main()
{
string previous = " ";
string current;
while (cin >> current) {
if(previous == current)
cout << "repeated word: " << current << '\n';
previous = current;
}
return 0;
}
My question is: What does string previous = " "; do?
It initializes previous to the character space (like when you press space). But I thought in C++ it doesn't read it, something about the compiler skipping over whitespace. Why initialize it to that then?
I have tried to write like that: string previous; and the program still work properly... so? What is the differnece? Please enlighten me x)
You seam to be confused on what it means in C++ to ignore whitespace. In C++
std::string the_string = something;
is treated the same as
std::string the_string=something ;
No when you have a string literal the whitespace in the literal is not ignored as it is part of the charcters of the string. So
std::string foo = " ";
Creates a string with one space where as
std::string foo = " ";
Creates a string with 4 spaces in it.
You are right, a whitespace is something you will never get when reading input using std::cin. Therefore, a previous string is initialized with a value that could never (i.e. when reading the first word) possibly match word read into current string.
In this case previous could alsobe initalized to an empty string, because istream::operator>> skips all the whitespace and you would never get an empty like by reading from std::cin that way. However, there are other ways of using std::cin (e.g. together with getline()), which may lead to reading an empty string.
The book explains this example in every detail.
string previous = " ";
assigns a space to the string variable 'previous'.
It may still 'work', but if you were to simply press enter on the first try, the 'repeated word' message should appear.
He could just write :)
string previous;
The idea is that the operator >> can not enter an empty string if by default there is set to skip white spaces.
So any comparison current with an empty string or a string that contains white spaces will yield false.