Why is getline written so strangely? - c++

I don't understand the design decisions behind the C++ getline function.
Why does it take a stream and a string by reference as arguments, only to return the same stream that was passed in? It seems more intuitive to only take the stream as an argument, then return the string that was read. Returning the same stream lets you chain the call, but would anyone really want to use getline(getline(stream, x), y)?
Additionally, why is the function not in the std namespace like the rest of the standard library?

If the function returned a string, there would be no way of indicating that the read failed, as all string values are valid values that could be returned by this (or any other) function. On the other hand, a stream has lots of error indicator flags that can be tested by the code that calls getline. So people can write code like:
while( std::getline( std::cin, somestring )) {
// do stuff with somestring
}
and it is hard to see how you could write similar code if getline returned a string.
why is the function not in the std namespace like the rest of the standard library?
It is in the std namespace - what makes you think otherwise?

Why does it take a stream and a string by reference as arguments, only to return the same stream that was passed in?
It is a common pattern in the stream library to do that. It means you can test the operation being performed as you perform it. For example:
std::string line;
while(std::getline(std::cin, line))
{
// use line here because we know the read succeeded
}
You can also make succinct parsers by "chaining" stream functions:
std::string key, value;
if(std::getline(std::getline(in, key, '='), value))
my_map[key] = value;
It seems more intuitive to only take the stream as an argument, then return the string that was read.
The problem with returning a new string every call is that you are constantly allocating new memory for them instead of reusing the memory already allocated to the string you passed in or that it gained while iterating through a loop.
// Here line will not need to allocate memory every time
// through the loop. Only when it finds a longer line than
// it has capacity for:
std::string line;
while(std::getline(std::cin, line))
{
// use line here because we know the read succeeded
}

Related

Wrapping std::getline()

I am struggling with the problem of reading input from file on a per-line basis, in a cross-platform way.
Different platforms use different sequences of characters to represent a new line/end of line.
std::getline doesn't deal with these in a cross platform way.
What do I mean by this?
std::getline changes its behavior depending on the platform on which an executable is compiled. On Windows platforms, it expects to see CRLF to denote line endings. On Linux, it expects just LF.
It does not handle cases where a file contains a line ending which is not what the platform expects. For example a file created on a Windows machine is likely to have CRLF line endings. If that file is copied to a Linux machine without changing the line ending format then std::getline "breaks".
It seemed to me that the easiest way to work around this would be to create a new function which wraps std::getline. Something like this:
return_type GetLine(stream_type ifs, string_type s)
{
return_type ret = std::getline(ifs, s);
s.erase(std::remove(s.begin(), s.end(), '\r' ), s.end());
s.erase(std::remove(s.begin(), s.end(), '\n' ), s.end());
return ret;
}
However at this point I'm stuck. From some searching, although getline returns a stream object (?) it also has an implicit cast-to-bool operator.
I could force return_type to be bool, but then this prevents my wrapper function from returning a stream object, if such a thing were to be required in future.
I also haven't been able to make sense of the STL templates in a sufficient enough way to determine what stream_type and string_type should be. I can force them to be std::ifstream and std::string, but I think this decision would also make the function less generic.
How should I proceed here?
You should take the stream by reference because streams typically cannot be copied. Also the string should be passed by reference because you want to write to it.
To be generic you can use the same interface as std::getline does. As you want to use specific delimiters, they need not be passed as arguments. If you make the function a template then it will work with any stream that also works for std::getline:
#include <iostream>
#include <sstream>
#include <string>
template< class CharT, class Traits, class Allocator >
std::basic_istream<CharT,Traits>& my_getline(
std::basic_istream<CharT,Traits>& input,
std::basic_string<CharT,Traits,Allocator>& str)
{
return std::getline(input,str);
}
int main() {
std::istringstream s{"hello world"};
std::string foo;
my_getline(s,foo);
std::cout << foo;
}
However at this point I'm stuck. From some searching, although getline returns a stream object (?) it also has an implicit cast-to-bool operator.
It's not getline that converts to bool but the stream returned by getline can be converted to bool. Your line is almost correct, but it needs to be a reference (and you need not spell out the type explicitly):
auto& ret = std::getline(ifs, s);
// more code
return ret;
Note that I didn't address the actual issue of extracting characters until any of the delimiters is encountered (rather than only the platform specific newline that you already get with bare std::getline).

How to use stringstream constructor in getline?

Following up https://stackoverflow.com/a/1120224/390066.
Why can't I use
getline(stringstream(line),cell,','){}
instead of
stringstream lineStream(line);
getline(lineStream,cell,','){}
?
update
I should have clarified that I want to use getline within a loop.
Furthermore, I should have also noted that my initial intention was to read a file line-by-line using getline and use the line from that in the new getline that would divide on ',', which is more intuitive imo.
From what I understood so far, getline is not designed for that because it takes a non-const input and gives const token; therefore, getline cannot be blindly recursed.
As show by #James Kanze you can.
The question is do you really want to?
The stream is destroyed at the end of the expression so you are only reading one cell from it.
If we look at this in the context of the original question:
i.e. You can not use that in a loop:
std::string line = /* Init */;
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream, cell, ','))
{
// Do stuff with cell.
}
If you place your code into this context it will not work as expected:
std::string cell;
while(std::getline(std::istringstream(line).flush(), cell, ','))
{
// Do stuff with cell.
}
As the expression inside the while() will be fully evaluated each time. So you go into an infinte loop reading the first cell continuously.
You can, but it's ugly:
std::getline( std::istringstream( line ).flush(), cell, ',' );
The problem is that std::getline takes a non-const reference (which is
logical, since it is going to modify the stream), and you cannot
initialize a non-const reference with a temporary. You can, however,
call member functions on it. std::istream::flush is a member
function, which returns a non-const reference to the stream on which it
was called (and if that stream is an std::istringstream, doesn't do
anything else).
FWIW: you'd probably find:
cell = std::string( line.cbegin(), std::find( line.cbegin(), line.cend(), ',' ) );
a bit more efficient. And, at least in my opinion, easier to read and
maintain.

C++ stringstream tellg()

I'm writing a little parser in c++98 (yup, cannot use 11).
I'm working with a std::stringstream which I pass by reference to different functions, let's call them subparsers.
In order to know which subparser to call i need to know the next word in the stringstream.
As stringstream is an istream it does have a peek function which returns the next character without moving the iterator / pointer / whatever it is that marks the current location within the stringstream, but as I need the next word I wrote a function peekWord
(ignore the commented line for now):
std::string Parser::peekWord(std::stringstream& sstream){
std::string myString = "EOF";
if(!sstream.eof()){
unsigned pos = sstream.tellg();
sstream >> myString;
//sstream.tellg();
sstream.seekg(pos);
}
return myString;
}
which seems to work nicely.
While debugging I noticed, that as soon as i call tellg() after the pointer/marker/thing has been moved past the final word (which the returns -1), seekg(xBeforeLastPosition) doesn't work anymore and still sets the position to -1.
Does the call of tellg() at the end of a stringstream set that failbit or something like that? I would intuitively had hoped that the void function tellg() has no side effects.
Looking forward to hearing from you guys :)
pip
tellg is specified as such:
Returns: After constructing a sentry object, if fail() != false, returns pos_type(-1) to indicate failure. Otherwise, returns rdbuf()->pubseekoff(0, cur, in).
(istream::sentry objects are used to check that input is available.)
So, yes, it will set failbit on EOF. You can detect this however by checking eof() and using clear() to return to normal processing.

performance overhead of c++ string tokenize via istringstream

I would like to know what's the performance overhead of
string line, word;
while (std::getline(cin, line))
{
istringstream istream(line);
while (istream >> word)
// parse word here
}
I think this is the standard c++ way to tokenize input.
To be specific:
Does each line copied three times, first via getline, then via istream constructor, last via operator>> for each word?
Would frequent construction & destruction of istream be an issue? What's the equivalent implementation if I define istream before the outer while loop?
Thanks!
Update:
An equivalent implementation
string line, word;
stringstream stream;
while (std::getline(cin, line))
{
stream.clear();
stream << line;
while (stream >> word)
// parse word here
}
uses a stream as a local stack, that pushes lines, and pops out words.
This would get rid of possible frequent constructor & destructor call in the previous version, and utilize stream internal buffering effect (Is this point correct?).
Alternative solutions, might be extends std::string to support operator<< and operator>>, or extends iostream to support sth. like locate_new_line. Just brainstorming here.
Unfortunately, iostreams is not for performance-intensive work. The problem is not copying things in memory (copying strings is fast), it's virtual function dispatches, potentially to the tune of several indirect function calls per character.
As for your question about copying, yes, as written everything gets copied when you initialize a new stringstream. (Characters also get copied from the stream to the output string by getline or >>, but that obviously can't be prevented.)
Using C++11's move facility, you can eliminate the extraneous copies:
string line, word;
while (std::getline(cin, line)) // initialize line
{ // move data from line into istream (so it's no longer in line):
istringstream istream( std::move( line ) );
while (istream >> word)
// parse word here
}
All that said, performance is only an issue if a measurement tool tells you it is. Iostreams is flexible and robust, and filebuf is basically fast enough, so you can prototype the code so it works and then optimize the bottlenecks without rewriting everything.
When you define a variable inside a block, it will be allocated on the stack. When you are leaving the block it will get popped from the stack. Using this code you have a lot of operation on the stack. This goes for 'word' too. You can use pointers and operate on pointers instead of variables. Pointers are stored on the stack too but where they are pointing to is a place inside the heap memory.
Such operations can have overhead for making the variables, pushing it on the stack and popping it from the stack again. But using pointers you allocate the space once and you work with the allocated space in the heap. As well pointers can be much smaller than real objects so their allocation will be faster.
As you see getLine() method accepts a reference(some kind of pointers) to line object which make it work with it without creating a string object again.
In your code , line and word variables are made once and their references are used. The only object you are making in each iteration is ss variable. If you want to not to make it in each iteration you can make it before loop and initialize it using its relates methods. You can search to find a suitable method to reassign it not using the constructor.
You can use this :
string line, word ;
istringstream ss ;
while (std::getline(cin, line))
{
ss.clear() ;
ss.str(line) ;
while (ss >> word) {
// parse word here
}
}
Also you can use this reference istringstream
EDIT : Thanks for comment #jrok. Yes, you should clear error flags before assigning new string. This is the reference for str() istringstream::str

Reading a fixed number of chars with << on an istream

I was trying out a few file reading strategies in C++ and I came across this.
ifstream ifsw1("c:\\trys\\str3.txt");
char ifsw1w[3];
do {
ifsw1 >> ifsw1w;
if (ifsw1.eof())
break;
cout << ifsw1w << flush << endl;
} while (1);
ifsw1.close();
The content of the file were
firstfirst firstsecond
secondfirst secondsecond
When I see the output it is printed as
firstfirst
firstsecond
secondfirst
I expected the output to be something like:
fir
stf
irs
tfi
.....
Moreover I see that "secondsecond" has not been printed. I guess that the last read has met the eof and the cout might not have been executed. But the first behavior is not understandable.
The extraction operator has no concept of the size of the ifsw1w variable, and (by default) is going to extract characters until it hits whitespace, null, or eof. These are likely being stored in the memory locations after your ifsw1w variable, which would cause bad bugs if you had additional variables defined.
To get the desired behavior, you should be able to use
ifsw1.width(3);
to limit the number of characters to extract.
It's virtually impossible to use std::istream& operator>>(std::istream&, char *) safely -- it's like gets in this regard -- there's no way for you to specify the buffer size. The stream just writes to your buffer, going off the end. (Your example above invokes undefined behavior). Either use the overloads accepting a std::string, or use std::getline(std::istream&, std::string).
Checking eof() is incorrect. You want fail() instead. You really don't care if the stream is at the end of the file, you care only if you have failed to extract information.
For something like this you're probably better off just reading the whole file into a string and using string operations from that point. You can do that using a stringstream:
#include <string> //For string
#include <sstream> //For stringstream
#include <iostream> //As before
std::ifstream myFile(...);
std::stringstream ss;
ss << myFile.rdbuf(); //Read the file into the stringstream.
std::string fileContents = ss.str(); //Now you have a string, no loops!
You're trashing the memory... its reading past the 3 chars you defined (its reading until a space or a new line is met...).
Read char by char to achieve the output you had mentioned.
Edit : Irritate is right, this works too (with some fixes and not getting the exact result, but that's the spirit):
char ifsw1w[4];
do{
ifsw1.width(4);
ifsw1 >> ifsw1w;
if(ifsw1.eof()) break;
cout << ifsw1w << flush << endl;
}while(1);
ifsw1.close();
The code has undefined behavior. When you do something like this:
char ifsw1w[3];
ifsw1 >> ifsw1w;
The operator>> receives a pointer to the buffer, but has no idea of the buffer's actual size. As such, it has no way to know that it should stop reading after two characters (and note that it should be 2, not 3 -- it needs space for a '\0' to terminate the string).
Bottom line: in your exploration of ways to read data, this code is probably best ignored. About all you can learn from code like this is a few things you should avoid. It's generally easier, however, to just follow a few rules of thumb than try to study all the problems that can arise.
Use std::string to read strings.
Only use fixed-size buffers for fixed-size data.
When you do use fixed buffers, pass their size to limit how much is read.
When you want to read all the data in a file, std::copy can avoid a lot of errors:
std::vector<std::string> strings;
std::copy(std::istream_iterator<std::string>(myFile),
std::istream_iterator<std::string>(),
std::back_inserter(strings));
To read the whitespace, you could used "noskipws", it will not skip whitespace.
ifsw1 >> noskipws >> ifsw1w;
But if you want to get only 3 characters, I suggest you to use the get method:
ifsw1.get(ifsw1w,3);