I have an instance of stringstream that I am reading from. At a certain point of getting data out of the stream, I need to read an identifier that may or may not be there. The logic is something like this:
std::string identifier;
sstr >> identifier;
if( identifier == "SomeKeyword" )
//process the the rest of the string stream using method 1
else
// back up to before we tried to read "identifier" and process the stream using method 2
How can I achieve the above logic?
Use the stream's tellg() and seekg() methods, eg:
std::string identifier;
std::stringstream::pos_type pos = sstr.tellg();
sstr >> identifier;
if (identifier == "SomeKeyword")
{
//process the rest of the string stream using method 1
}
else
{
// back up to before we tried to read "identifier
sstr.seekg(pos);
// process the stream using method 2
}
You can get the get pointer in the stream before you get the identifier and restore the position if the identifier is wrong:
std::string identifier;
std::stringstream::pos_type pos = sstr.tellg();
sstr >> identifier;
if( identifier == "SomeKeyword") {
// process the the rest of the string stream using method 1
} else {
sstr.clear();
sstr.seekg(pos, sstr.beg);
// process the stream using method 2
}
The page on tellg at cplusplus.com has a very nice example. The purpose of calling clear() is to ensure that seekg works even if the previous read reached end-of-file. This is only necessary for versions of C++ before C++ 11. If you are using C++11 or newer, seekg clears the EOF bit automatically and you should not include the line with clear() in your solution. Thanks to #metal for pointing this out.
You can directly inspect a stringstream's contents. That may be a clearer approach than extracting and rolling back, as you aren't guaranteed your stringstreams condition after extraction. For example, if your string only contained one word, extracting it would have set the ios_base::iostate::eofbit flag.
You could accomplish inspecting the stringstream's contents like this:
if(sstr.str().compare(0, identifier.length(), identifier) == 0) {
sstr.ignore(identifier.length());
// process the the rest of the string stream using method 1
} else {
// process the stream using method 2
}
One risk this takes on is, if you were depending upon the stringstream's extraction operator to eliminate leading white-space you'll need to purge before doing the compare. This can be done by before your if-block with the command sstr >> skipws;.
While I do consider this method safer, it should be noted that if you are dependent upon leading white-space being in sstr for "method 2" then you should use one of the other answers (but you should also reconsider your use of stringstream since all the extraction operators first eat white-space.)
Related
I don't understand the design decisions behind the C++ getline function.
Why does it take a stream and a string by reference as arguments, only to return the same stream that was passed in? It seems more intuitive to only take the stream as an argument, then return the string that was read. Returning the same stream lets you chain the call, but would anyone really want to use getline(getline(stream, x), y)?
Additionally, why is the function not in the std namespace like the rest of the standard library?
If the function returned a string, there would be no way of indicating that the read failed, as all string values are valid values that could be returned by this (or any other) function. On the other hand, a stream has lots of error indicator flags that can be tested by the code that calls getline. So people can write code like:
while( std::getline( std::cin, somestring )) {
// do stuff with somestring
}
and it is hard to see how you could write similar code if getline returned a string.
why is the function not in the std namespace like the rest of the standard library?
It is in the std namespace - what makes you think otherwise?
Why does it take a stream and a string by reference as arguments, only to return the same stream that was passed in?
It is a common pattern in the stream library to do that. It means you can test the operation being performed as you perform it. For example:
std::string line;
while(std::getline(std::cin, line))
{
// use line here because we know the read succeeded
}
You can also make succinct parsers by "chaining" stream functions:
std::string key, value;
if(std::getline(std::getline(in, key, '='), value))
my_map[key] = value;
It seems more intuitive to only take the stream as an argument, then return the string that was read.
The problem with returning a new string every call is that you are constantly allocating new memory for them instead of reusing the memory already allocated to the string you passed in or that it gained while iterating through a loop.
// Here line will not need to allocate memory every time
// through the loop. Only when it finds a longer line than
// it has capacity for:
std::string line;
while(std::getline(std::cin, line))
{
// use line here because we know the read succeeded
}
I'm writing a little parser in c++98 (yup, cannot use 11).
I'm working with a std::stringstream which I pass by reference to different functions, let's call them subparsers.
In order to know which subparser to call i need to know the next word in the stringstream.
As stringstream is an istream it does have a peek function which returns the next character without moving the iterator / pointer / whatever it is that marks the current location within the stringstream, but as I need the next word I wrote a function peekWord
(ignore the commented line for now):
std::string Parser::peekWord(std::stringstream& sstream){
std::string myString = "EOF";
if(!sstream.eof()){
unsigned pos = sstream.tellg();
sstream >> myString;
//sstream.tellg();
sstream.seekg(pos);
}
return myString;
}
which seems to work nicely.
While debugging I noticed, that as soon as i call tellg() after the pointer/marker/thing has been moved past the final word (which the returns -1), seekg(xBeforeLastPosition) doesn't work anymore and still sets the position to -1.
Does the call of tellg() at the end of a stringstream set that failbit or something like that? I would intuitively had hoped that the void function tellg() has no side effects.
Looking forward to hearing from you guys :)
pip
tellg is specified as such:
Returns: After constructing a sentry object, if fail() != false, returns pos_type(-1) to indicate failure. Otherwise, returns rdbuf()->pubseekoff(0, cur, in).
(istream::sentry objects are used to check that input is available.)
So, yes, it will set failbit on EOF. You can detect this however by checking eof() and using clear() to return to normal processing.
I'm having a problem reading from a binary file (*.dat) using the .read(reinterpret_cast (&x),sizeof(x)) command but there is always an error about the existence of the file even when the file exist or has been created successfully. Here is the code:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
struct x{
char name[10],pass[10];
};
int main()
{
x x1,x2;
fstream inout;
inout.open("test.dat" ,ios::binary);
if(!inout)
{
cout<<"Error";
exit(1);
}
cout<<"Enter your name:";
cin>>x1.name;
inout.write(reinterpret_cast <const char*> (&x1.name), sizeof(x1));
cout<<"Enter your name:";
cin>>x1.pass;
inout.write(reinterpret_cast <const char*> (&x1.pass), sizeof(x1));
while(inout.read(reinterpret_cast <char*> (&x2.name), sizeof(x1)))
{
cout<<x2.name;//here is my problem cannot read!!
}
inout.close();
}
Use std:flush after your write operations.
// ... Write x1.name and x1.pass
inout << std::flush;
// ... Read x2.name in while loop.
inout.close();
There is a problem with your output to the file.
First you are writing the struct x1 to the file where only the name field is filled
inout.write(reinterpret_cast <const char*> (&x1.name), sizeof(x1));
and afterwards:
inout.write(reinterpret_cast <const char*> (&x1.pass), sizeof(x1));
You start writing from the address of x1.pass but you are writing sizeof(x1) bytes.
sizeof(x1) is 20 here but its only 10 bytes from the start of x1.pass to the end of the struct, so you are writing 10 bytes of unknown data from the stack into your file.
So this is the first thing that your file may not contain what you expect it to contain.
The next thing is that after writing your data the stream is sitting at the end of the file and you try to read from there. You have to move the position back to the beginning of the stream to read the stuff you just wrote. For example use:
inout.seekg(std::ios::beg);
If you mess with read and write to the same stream, you'd rather use flush or file positioning functions.
MSDN says:
When a basic_fstream object is used to perform file I/O, although the underlying buffer contains separately designated positions for reading and writing, the current input and current output positions are tied together, and therefore, reading some data moves the output position.
GNU Stdlib:
As you can see, ‘+’ requests a stream that can do both input and output. When using such a stream, you must call fflush (see Stream Buffering) or a file positioning function such as fseek (see File Positioning) when switching from reading to writing or vice versa. Otherwise, internal buffers might not be emptied properly.
Reading into raw C-style arrays from an input stream is not as idiomatic as a simple call to operator>>(). You also have to prevent buffer overruns by keeping track of the both the bytes allocated for the buffer, and the bytes being read into the buffer.
Reading into the buffer can be done by using the input stream method getline(). The following example shows the extraction into x1.name; the same would be done for x1.path:
if (std::cin.getline(x1.name, sizeof(x1.name))) {
}
The second argument is the maximum number of bytes to be read. It is useful in that the stream won't write pass the allocated bounds of the array. The next thing to do is just write it to the file as you have done:
if (std::cin.getline(x1.name, sizeof(x1.name))) {
inout.write(reinterpret_cast<char*>(&x1.name), std::cin.gcount());
}
std::cin.gcount() is the number of characters that were read from the input stream. It is a much more reliable alternative to sizeof(x1.name) in that it returns the number of characters written, not the characters allotted.
Now, bidirectional file streams are a bit tricky. They have be coordinated in the right way. As explained in the other answers, bidirectional file streams (or std::fstreams) share a joint buffer for both input and output. The position indicators that mark positions in the input and output sequence are both affected by any input and output operations that may occur. As such, the file stream position has to be "moved" back before performing input. This can be done by either a call to seekg() or seekp(). Either will suffice since, as I said, the position indicators are bound to each other:
if (std::cin.getline(x1.pass, sizeof(x1.pass))) {
inout.write(reinterpret_cast<char*>(&x1.pass), std::cin.gcount());
inout.seekg(0, std::ios_base::beg);
}
Notice how this was done after the extraction into x1.pass. We can't do it after x1.name because we would be overwriting the stream on the second call to write().
As you can see, extracting into raw C-style arrays isn't pretty, you have to manage more things than you should. Fortunately, C++ comes to the rescue with their standard string class std::string. Use this for more efficient I/O:
Make both name and pass standard C++ strings (std::string) instead of raw C-arrays. This allows you pass in the size as the second argument to your read() and write() calls:
#include <string>
struct x {
std::string name;
std::string pass;
};
// ...
if (std::cin >> x1.name) {
inout.write(x1.name.data(), x1.name.size());
}
if (std::cin >> x1.pass) {
inout.write(x1.name.data(), x1.name.size());
inout.seekg(0, std::ios_base::beg);
}
std::string allows us to leverage its dynamic nature and its capacity for maintaining the size of the buffer. We no longer have to use getline() but now a simple call to operator>>() and an if() check.
This was not possible before, but now that we're using std::string we can also combine both extractions to achieve the following:
if (std::cout << "Enter your name: " && std::cin >> x1.name &&
std::cout << "Enter your pass: " && std::cin >> x1.pass) {
inout.write(x1.name.data(), x1.name.size());
inout.write(x1.pass.data(), x1.pass.size());
inout.seekg(0, std::ios_base::beg);
}
And finally, the last extraction would simply be this:
while (inout >> x2.name)
{
std::cout << x2.name;
}
I am given a list of words in a text file, all seperated by newlines. Reading them using fstream and >>, and not knowing the amount of words there are. How do I tell the program when to stop? I've tested it out, and the value of the variable just stays the same of the last word read.
Checking the state of the stream after extraction is always a good idea. It tells you if there were any problems while performing the extraction, or whether the file stream has reached the end-of-file character (EOF).
The latter case is what you're dealing with. All you need to do is perform the extraction while the stream is in a good state, which is idiomatically done in the following way:
while (in >> str) {
// ...
}
After the stream performs the extraction, operator bool() is invoked, which calls !fail(). Using a while loop will allow the next extraction to be performed automatically. It will stop when the stream has performed an incorrect extraction, is perhaps out of memory, when it hits the EOF character, or some other user-defined situation.
You've failed the fundamental principle of I/O: You must check whether your input operation succeeds. You cannot know that in advance, you only learn that after you have tried:
for (std::string word; std::cin >> word; )
// ^^^^^^^^^^^^^^^^<----------- test for success
{
std::cout << "Here is one word: " << word << std::endl;
}
You have to remember that the input operator >> returns the stream it uses, and that streams can be used as boolean conditions. That means you can use it as a loop condition:
while (some_stream >> some_variable)
{
...
}
I would like to test if a std::istream has reached the end without reading from it.
I know that I can check for EOF like this:
if (is >> something)
but this has a series of problems. Imagine there are many, possibly virtual, methods/functions which expect std::istream& passed as an argument.
This would mean I have to do the "housework" of checking for EOF in each of them, possibly with different type of something variable, or create some weird wrapper which would handle the scenario of calling the input methods.
All I need to do is:
if (!IsEof(is)) Input(is);
the method IsEof should guarantee that the stream is not changed for reading, so that the above line is equivalent to:
Input(is)
as regards the data read in the Input method.
If there is no generic solution which would word for and std::istream, is there any way to do this for std::ifstream or cin?
EDIT:
In other words, the following assert should always pass:
while (!IsEof(is)) {
int something;
assert(is >> something);
}
The istream class has an eof bit that can be checked by using the is.eof() member.
Edit: So you want to see if the next character is the EOF marker without removing it from the stream? if (is.peek() == EOF) is probably what you want then. See the documentation for istream::peek
That's impossible. How is the IsEof function supposed to know that the next item you intend to read is an int?
Should the following also not trigger any asserts?
while(!IsEof(in))
{
int x;
double y;
if( rand() % 2 == 0 )
{
assert(in >> x);
} else {
assert(in >> y);
}
}
That said, you can use the exceptions method to keep the "house-keeping' in one place.
Instead of
if(IsEof(is)) Input(is)
try
is.exceptions( ifstream::eofbit /* | ifstream::failbit etc. if you like */ )
try {
Input(is);
} catch(const ifstream::failure& ) {
}
It doesn't stop you from reading before it's "too late", but it does obviate the need to have if(is >> x) if(is >> y) etc. in all the functions.
Normally,
if (std::is)
{
}
is enough. There is also .good(), .bad(), .fail() for more exact information
Here is a reference link: http://www.cplusplus.com/reference/iostream/istream/
There are good reasons for which there is no isEof function: it is hard to specify in an usable way. For instance, operator>> usually begin by skipping white spaces (depending on a flag) while some other input functions are able to read space. How would you isEof() handle the situation? Begin by skipping spaces or not? Would it depend on the flag used by operator>> or not? Would it restore the white spaces in the stream or not?
My advice is use the standard idiom and characterize input failure instead of trying to predict only one cause of them: you'd still need to characterize and handle the others.
No, in the general case there is no way of knowing if the next read operation will reach eof.
If the stream is connected to a keyboard, the EOF condition is that I will type Ctrl+Z/Ctrl+D at the next prompt. How would IsEof(is) detect that?