Following up https://stackoverflow.com/a/1120224/390066.
Why can't I use
getline(stringstream(line),cell,','){}
instead of
stringstream lineStream(line);
getline(lineStream,cell,','){}
?
update
I should have clarified that I want to use getline within a loop.
Furthermore, I should have also noted that my initial intention was to read a file line-by-line using getline and use the line from that in the new getline that would divide on ',', which is more intuitive imo.
From what I understood so far, getline is not designed for that because it takes a non-const input and gives const token; therefore, getline cannot be blindly recursed.
As show by #James Kanze you can.
The question is do you really want to?
The stream is destroyed at the end of the expression so you are only reading one cell from it.
If we look at this in the context of the original question:
i.e. You can not use that in a loop:
std::string line = /* Init */;
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream, cell, ','))
{
// Do stuff with cell.
}
If you place your code into this context it will not work as expected:
std::string cell;
while(std::getline(std::istringstream(line).flush(), cell, ','))
{
// Do stuff with cell.
}
As the expression inside the while() will be fully evaluated each time. So you go into an infinte loop reading the first cell continuously.
You can, but it's ugly:
std::getline( std::istringstream( line ).flush(), cell, ',' );
The problem is that std::getline takes a non-const reference (which is
logical, since it is going to modify the stream), and you cannot
initialize a non-const reference with a temporary. You can, however,
call member functions on it. std::istream::flush is a member
function, which returns a non-const reference to the stream on which it
was called (and if that stream is an std::istringstream, doesn't do
anything else).
FWIW: you'd probably find:
cell = std::string( line.cbegin(), std::find( line.cbegin(), line.cend(), ',' ) );
a bit more efficient. And, at least in my opinion, easier to read and
maintain.
Related
There's a newer version of std::getline with C++11. It accepts a rvalue "input"-parameter. But why there is a rvalue-parameter, i.e. the fuction would consume the "input"-parameter, leaving the input-parameter empty after the call ?
I imagine it's for code where the stream is constructed as part of the call to getline, e.g
std::string s = ...;
std::string s2;
getline(std::istringstream(s), s2);
This code would take the first line from s and put it in s2, for instance.
Code like this would not be legal with the older version of getline because the first parameter of that is a non-const reference.
I don't understand the design decisions behind the C++ getline function.
Why does it take a stream and a string by reference as arguments, only to return the same stream that was passed in? It seems more intuitive to only take the stream as an argument, then return the string that was read. Returning the same stream lets you chain the call, but would anyone really want to use getline(getline(stream, x), y)?
Additionally, why is the function not in the std namespace like the rest of the standard library?
If the function returned a string, there would be no way of indicating that the read failed, as all string values are valid values that could be returned by this (or any other) function. On the other hand, a stream has lots of error indicator flags that can be tested by the code that calls getline. So people can write code like:
while( std::getline( std::cin, somestring )) {
// do stuff with somestring
}
and it is hard to see how you could write similar code if getline returned a string.
why is the function not in the std namespace like the rest of the standard library?
It is in the std namespace - what makes you think otherwise?
Why does it take a stream and a string by reference as arguments, only to return the same stream that was passed in?
It is a common pattern in the stream library to do that. It means you can test the operation being performed as you perform it. For example:
std::string line;
while(std::getline(std::cin, line))
{
// use line here because we know the read succeeded
}
You can also make succinct parsers by "chaining" stream functions:
std::string key, value;
if(std::getline(std::getline(in, key, '='), value))
my_map[key] = value;
It seems more intuitive to only take the stream as an argument, then return the string that was read.
The problem with returning a new string every call is that you are constantly allocating new memory for them instead of reusing the memory already allocated to the string you passed in or that it gained while iterating through a loop.
// Here line will not need to allocate memory every time
// through the loop. Only when it finds a longer line than
// it has capacity for:
std::string line;
while(std::getline(std::cin, line))
{
// use line here because we know the read succeeded
}
I am trying to write my program so that it can process either StdIn or a file specified on the command line.
I'm doing this by trying to initialize a reference to an istream to either refer to cin or an ifstream, using a conditional.
(similar techniques are described here and here)
But when I try with ifstream, I seem to get an error that the basic_istream move-constructor is declared protected.
istream& refToCIN ( cin ); // This is OK
const istream& refToFile = ifstream(args[1]); // This is OK
const istream& inStream ( FileIsProvided()? ifstream(args[1]) : cin );
// This causes error:
// std::basic_istream<char,std::char_traits<char>>::basic_istream' :
// cannot access protected member declared in class std::basic_istream<char,std::char_traits<char>>
ProcessStream(inStream); // This could either be a file or cin
Can this be reasonably done this way? Is there a good alternative I'm overlooking?
The problem with your code is following:
Your left-hand side of the ternary operator is a temporary (rvalue). However, your right hand-side is an lvalue (cin is an lvalue). As a result, compiler is trying to create a temporary out of cin, and fails because of copy constructor being not available.
As for the sultions - you can simply replace rdbuf() of cin with rdbuf() of your file, and use cin everywhere.
Here's the ultimate solution OP came up with:
ifstream file;
std::streambuf* old_cin_buf = cin.rdbuf(); // Store the old value
if (FileIsProvided())
{
file.open(args[1]);
old_cin_buf = cin.rdbuf(file.rdbuf()); // Replace the ReadBuffer on cin.
// Store the previous value as well.
}
// Use cin for all operations now. It will either use the File or StdIn as appropriate.
...
// Restore the original value, in case it was changed by using a file.
cin.rdbuf(old_cin_buf); // This is better be done before file object here goes out of scope
This smells like an XY problem because you don't need a ternary conditional or reference here.
As a matter of convention, many programs use - to denote stdin rather than omitting a filename. That's one possible avenue. On a similar line of thought, I would use Boost.ProgramOptions or getopt instead of manually parsing the command line. This will indirectly solve your XY problem as it'll make the FileIsProvided() function redundant and you'll be getting your options via other methods than using argv[1] directly.
If you have C++11, there's smart pointers or std::reference_wrapper, which allows you to "reseat" references.
As a anti-motivator, consider that classes like ostream_joiner keep a pointer to their internal stream objects, not a reference. Besides, I doubt that you enjoy the thought of having to deal with dangling references from innocuous looking code.
Otherwise...
if (FileIsProvided())
{
std::ifstream ifs(argv[1]);
if (ifs)
{
ProcessStream(ifs);
}
} else {
ProcessStream(std::cin);
}
I've been using this:
ifstream in("file.txt")
string line;
getline(in,line);
istringstream iss(line);
...
for some simple parsing.
I would like to avoid unnecessary copying in order to improve performance so I tried:
ifstream in("huge_line.txt");
string line;
getline(in,line);
istringstream ss;
ss.rdbuf()->pubsetbuf(const_cast<char*>(line.c_str()), line.size());
...
and it seems to do the job (significantly improve performance, that is). My question is, is this safe given the const_cast?
I mean, as long as I'm working with an istrinstream, the internal buffer should never get written to by the istringstream class, so the ss variable should remain in a valid state as long as the line variable is valid and unchanged, right?
The const_cast is safe, because the underlying buffer of std::string is not const. And yes, as long as line does not expire while ss is being read from, your program should be fine.
The effect of ss.rdbuf()->pubsetbuf is implementation-defined and hence doesn't necessarily do what you expect.
So, the effect of your altered code doesn't need to be equivalent to the initial one.
I would like to know what's the performance overhead of
string line, word;
while (std::getline(cin, line))
{
istringstream istream(line);
while (istream >> word)
// parse word here
}
I think this is the standard c++ way to tokenize input.
To be specific:
Does each line copied three times, first via getline, then via istream constructor, last via operator>> for each word?
Would frequent construction & destruction of istream be an issue? What's the equivalent implementation if I define istream before the outer while loop?
Thanks!
Update:
An equivalent implementation
string line, word;
stringstream stream;
while (std::getline(cin, line))
{
stream.clear();
stream << line;
while (stream >> word)
// parse word here
}
uses a stream as a local stack, that pushes lines, and pops out words.
This would get rid of possible frequent constructor & destructor call in the previous version, and utilize stream internal buffering effect (Is this point correct?).
Alternative solutions, might be extends std::string to support operator<< and operator>>, or extends iostream to support sth. like locate_new_line. Just brainstorming here.
Unfortunately, iostreams is not for performance-intensive work. The problem is not copying things in memory (copying strings is fast), it's virtual function dispatches, potentially to the tune of several indirect function calls per character.
As for your question about copying, yes, as written everything gets copied when you initialize a new stringstream. (Characters also get copied from the stream to the output string by getline or >>, but that obviously can't be prevented.)
Using C++11's move facility, you can eliminate the extraneous copies:
string line, word;
while (std::getline(cin, line)) // initialize line
{ // move data from line into istream (so it's no longer in line):
istringstream istream( std::move( line ) );
while (istream >> word)
// parse word here
}
All that said, performance is only an issue if a measurement tool tells you it is. Iostreams is flexible and robust, and filebuf is basically fast enough, so you can prototype the code so it works and then optimize the bottlenecks without rewriting everything.
When you define a variable inside a block, it will be allocated on the stack. When you are leaving the block it will get popped from the stack. Using this code you have a lot of operation on the stack. This goes for 'word' too. You can use pointers and operate on pointers instead of variables. Pointers are stored on the stack too but where they are pointing to is a place inside the heap memory.
Such operations can have overhead for making the variables, pushing it on the stack and popping it from the stack again. But using pointers you allocate the space once and you work with the allocated space in the heap. As well pointers can be much smaller than real objects so their allocation will be faster.
As you see getLine() method accepts a reference(some kind of pointers) to line object which make it work with it without creating a string object again.
In your code , line and word variables are made once and their references are used. The only object you are making in each iteration is ss variable. If you want to not to make it in each iteration you can make it before loop and initialize it using its relates methods. You can search to find a suitable method to reassign it not using the constructor.
You can use this :
string line, word ;
istringstream ss ;
while (std::getline(cin, line))
{
ss.clear() ;
ss.str(line) ;
while (ss >> word) {
// parse word here
}
}
Also you can use this reference istringstream
EDIT : Thanks for comment #jrok. Yes, you should clear error flags before assigning new string. This is the reference for str() istringstream::str