Pre-history: I'm trying to ensure that some function foo(std::stringstream&) consumes all data from the stream.
Answers to a previous question suggest that using stringstream::str() is the right way of getting content of a stringstream. I've also seen it being used to convert arbitrary type to string like this:
std::stringstream sstr;
sstr << 10;
assert(sstr.str() == std::string("10")); // Conversion to std::string for clarity.
However, the notion of "content" is somewhat vague. For example, consider the following snippet:
#include <assert.h>
#include <sstream>
#include <iostream>
int main() {
std::stringstream s;
s << "10 20";
int x;
s >> x;
std::cout << s.str() << "\n";
return 0;
}
On Ideone (as well as on my system) this snippet prints 10 20, meaning that reading from stringstream does not modify what str() returns. So, my assumption is that that str() returns some internal buffer and it's up to stringstream (or, probably, its internal rdbuf, which is stringbuf by default) to handle "current position in that buffer". It's a known thing.
Looking at stringbuf::overflow() function (which re-allocates the buffer if there is not enough space), I can see that:
this may modify the pointers to both the input and output controlled sequences (up to all six of eback, gptr, egptr, pbase, pptr, epptr).
So, basically, there is no theoretical guarantee that writing to stringstream won't allocate a bigger buffer. Therefore, even using stringstream::str() for converting int to string is flawed: assert(sstr.str() == std::string("10")) from my first snippet can fail, because internal buffer is not guaranteed to be precisely of the necessary size.
Question is: what is the correct way of getting the "content" of stringstream, where "content" is defined as "all characters which could be consumed from the steream"?
Of course, one can read char-by-char, but I hope for a less verbose solution. I'm interested in the case where nothing is read from stringstream (my first snippet) as I never saw it fail.
You can use the tellg() function (inherited from std::basic_istream) to find the current input position. If it returns -1, there are no further characters to be consumed. Otherwise you can use s.str().substr(s.tellg()) to return the unconsumed characters in stringstream s.
Related
I tested following codes to clarify my understanding to istream::getline():
#include <iostream>
#include <sstream>
using namespace std;
int main()
{
string s("abcd efgh\nijklmnopqrst");
string s1;
stringstream ss(s);
ss >> s1;
cout << s1 << endl;
ss.getline(&s1[0], 250, '\n');
cout << s1 << endl;
ss >> s1;
cout << s1 << endl;
getchar();
return 1;
}
then the console printed:
abcd
efg
ijklmnopqrst
but in my opinon it should be
abcd
efgh
ijklmnopqrst
Besides, I found the size of s1 after calling ss.getline() was the same as that after calling ss>>, but the size will be changed after calling ss>> once more. Can anyone help me parsing?
ss.getline(&s1[0], 250, '\n');
The first parameter to this getline() call is a char *. ss knows absolutely nothing about the fact that this char buffer actually comes from a std::string, and its actually its internal buffer.
Complicating this entire affair is the fact that this std::string is under the impression that it contains four characters. Because that's all it has, at this point.
And there is absolutely nothing, whatsoever, that could possibly lead this std::string to change its mind. Just because a pointer to its internal character buffer was passed to getline(), which proceeded to rather rudely scribble all over it (resulting in undefined behavior, as I'll extrapolate in a moment), the std::string still believes that it contains four characters only.
Meanwhile, the initial formatted input operator, >> extracted the initial character, but did not extract the following space, so when this stream, subsequently, had this getline() call, it started off its job of extracting characters starting with this space character, and up until the next newline character -- five characters (if I count on my fingers), but dumping it into a buffer that's guaranteed, by the std::string, to only be long enough to hold four characters (because, keep in mind, the initial formatted extract operator, >>, only dumped four characters inside it).
I'm ignoring some details, such as the fact that std::string takes care of automatically tacking on a trailing '\0', but the bottom line is that this is undefined behavior. The getline call extracts more characters that the buffer it's given is guaranteed to hold. Undefined behavior. A whole big heap of undefined behavior. It's not just the four characters in your second line of output is not the four characters you were expected to see, it's just that the getline() actually ended up extracting more characters, but the std::string that's being printed here has every right under the constitution to believe that it still has only four characters, and it's just that it's internal buffer got stomped all over.
Two things.
First, >> does not consume whitespace, so getline will retrieve it.
Secondly, this line is not correct:
ss.getline(&s1[0], 250, '\n');
Since getline expects a std::basic_string, just pass in the string:
ss.getline(s1, 250, '\n');
In your code, &s1[0] gets access to the underlying buffer, which is written to, but the string's length is stored separately, and is still what is was from the previous read (which is why the h gets dropped). Though, at this point you've already invoked undefined behaviour due to a buffer overflow.
I saw a lot of questions on the peek method, but mine concerns a topic which would be almost obvious, but nevertheless (I think) interesting.
Suppose you have a binary file to read, and that you choose to bring up it as a whole in the program memory and use an istringstream object to
perform the reading.
For instance, if you are searching for the position og a given byte in the stream, accessing the hard disk repeatedly would waste time and resources...
But once you create the istringstream object any eventual NULL byte is
treated as an EOF signal.
At least this is what happened to me in the following short code:
// obvious omissis
std::istringstream is(buffer);
// where buffer is declared as char *
// and filled up with the contents of
// a binary file
char sample = 'a';
while(!is.eof() && is.peek() != sample)
{ is.get(); }
std::cout << "found " << sample << " at " << is.tellg() << std::endl;
This code doesn't work neither with g++ 4.9 nor with clang 3.5 in the
hypothesis that there is a null byte inside buffer before a match
with sample can be found, since that null byte sets the eof bit.
So my question is: Is this kind of approach to be avoided at all or there is some way to teach peek that a null byte is not "necessarily" the end of the stream?
If you look at your std::istringstream constructors, you'll see (2) takes a std::string. That can have embedded NULs, but if you pass buffer and that's a character array or char*, the string constructor you implicitly invoke will use a strlen-style ASCIIZ length determination to work out how much data to load. You should instead specify the buffer size explicitly - something like:
std::string str(buffer, bytes);
std::istringstream is(str);
Then your while(!is.eof() is bodgy... there are hundreds of S.O. Q&A about that issue; one at random - here.
The following code seems to be running when it shouldn't. In this example:
#include <iostream>
using namespace std;
int main()
{
char data[1];
cout<<"Enter data: ";
cin>>data;
cout<<data[2]<<endl;
}
Entering a string with a length greater than 1 (e.g., "Hello"), will produce output as if the array were large enough to hold it (e.g., "l"). Should this not be throwing an error when it tried to store a value that was longer than the array or when it tried to retrieve a value with an index greater than the array length?
The following code seems to be running when it shouldn't.
It is not about "should" or "shouldn't". It is about "may" or "may not".
That is, your program may run, or it may not.
It is because your program invokes undefined behavior. Accessing an array element beyond the array-length invokes undefined behavior which means anything could happen.
The proper way to write your code is to use std::string as:
#include <iostream>
#include <string>
//using namespace std; DONT WRITE THIS HERE
int main()
{
std::string data;
std::cout<<"Enter data: ";
std::cin>>data; //read the entire input string, no matter how long it is!
std::cout<<data<<std::endl; //print the entire string
if ( data.size() > 2 ) //check if data has atleast 3 characters
{
std::cout << data[2] << std::endl; //print 3rd character
}
}
It can crash under different parameters in compilation or compiled on other machine, because running of that code giving undefined result according to documentaton.
It is not safe to be doing this. What it is doing is writing over the memory that happens to lie after the buffer. Afterwards, it is then reading it back out to you.
This is only working because your cin and cout operations don't say: This is a pointer to one char, I will only write one char. Instead it says: enough space is allocated for me to write to. The cin and cout operations keep reading data until they hit the null terminator \0.
To fix this, you can replace this with:
std::string data;
C++ will let you make big memory mistakes.
Some 'rules' that will save you most of the time:
1:Don't use char[]. Instead use string.
2:Don't use pointers to pass or return argument. Pass by reference, return by value.
3:Don't use arrays (e.g. int[]). Use vectors. You still have to check your own bounds.
With just those three you'll be writing some-what "safe" code and non-C-like code.
I was trying out a few file reading strategies in C++ and I came across this.
ifstream ifsw1("c:\\trys\\str3.txt");
char ifsw1w[3];
do {
ifsw1 >> ifsw1w;
if (ifsw1.eof())
break;
cout << ifsw1w << flush << endl;
} while (1);
ifsw1.close();
The content of the file were
firstfirst firstsecond
secondfirst secondsecond
When I see the output it is printed as
firstfirst
firstsecond
secondfirst
I expected the output to be something like:
fir
stf
irs
tfi
.....
Moreover I see that "secondsecond" has not been printed. I guess that the last read has met the eof and the cout might not have been executed. But the first behavior is not understandable.
The extraction operator has no concept of the size of the ifsw1w variable, and (by default) is going to extract characters until it hits whitespace, null, or eof. These are likely being stored in the memory locations after your ifsw1w variable, which would cause bad bugs if you had additional variables defined.
To get the desired behavior, you should be able to use
ifsw1.width(3);
to limit the number of characters to extract.
It's virtually impossible to use std::istream& operator>>(std::istream&, char *) safely -- it's like gets in this regard -- there's no way for you to specify the buffer size. The stream just writes to your buffer, going off the end. (Your example above invokes undefined behavior). Either use the overloads accepting a std::string, or use std::getline(std::istream&, std::string).
Checking eof() is incorrect. You want fail() instead. You really don't care if the stream is at the end of the file, you care only if you have failed to extract information.
For something like this you're probably better off just reading the whole file into a string and using string operations from that point. You can do that using a stringstream:
#include <string> //For string
#include <sstream> //For stringstream
#include <iostream> //As before
std::ifstream myFile(...);
std::stringstream ss;
ss << myFile.rdbuf(); //Read the file into the stringstream.
std::string fileContents = ss.str(); //Now you have a string, no loops!
You're trashing the memory... its reading past the 3 chars you defined (its reading until a space or a new line is met...).
Read char by char to achieve the output you had mentioned.
Edit : Irritate is right, this works too (with some fixes and not getting the exact result, but that's the spirit):
char ifsw1w[4];
do{
ifsw1.width(4);
ifsw1 >> ifsw1w;
if(ifsw1.eof()) break;
cout << ifsw1w << flush << endl;
}while(1);
ifsw1.close();
The code has undefined behavior. When you do something like this:
char ifsw1w[3];
ifsw1 >> ifsw1w;
The operator>> receives a pointer to the buffer, but has no idea of the buffer's actual size. As such, it has no way to know that it should stop reading after two characters (and note that it should be 2, not 3 -- it needs space for a '\0' to terminate the string).
Bottom line: in your exploration of ways to read data, this code is probably best ignored. About all you can learn from code like this is a few things you should avoid. It's generally easier, however, to just follow a few rules of thumb than try to study all the problems that can arise.
Use std::string to read strings.
Only use fixed-size buffers for fixed-size data.
When you do use fixed buffers, pass their size to limit how much is read.
When you want to read all the data in a file, std::copy can avoid a lot of errors:
std::vector<std::string> strings;
std::copy(std::istream_iterator<std::string>(myFile),
std::istream_iterator<std::string>(),
std::back_inserter(strings));
To read the whitespace, you could used "noskipws", it will not skip whitespace.
ifsw1 >> noskipws >> ifsw1w;
But if you want to get only 3 characters, I suggest you to use the get method:
ifsw1.get(ifsw1w,3);
So...
when I go:
cout<<stringName<<endl;
I get:
NT
But when I go:
cout<<stringName.c_str()<<endl;
I get:
NTNT
Why?
A quick test with the following code:
#include <string>
#include <iostream>
using namespace std;
int main(void) {
string str = "NT";
cout << str.c_str() << endl;
return 0;
}
produces one instance of NT so it looks like you probably have another output call somewhere.
A traditional C string (accessed through a char const*) has a sequence of characters terminated by a character 0. (Not the numeral 0, but an actual zero value, which we write as '\0'.) There's no explicit length — so various string operations just read one character at a time until it hits the '\0'.
A C++ std::string has an explicit length in its structure.
Is it possible that the memory layout of your string's characters looks like this:
'NTNT\0'
but the string's length is set to 2?
That would result in exactly this behavior — manipulating the std::string directly will act like it's just two characters long, but if you do traditional C operations using s.c_str(), it will look like "NTNT".
I'm not sure what shenanigans would get you into this state, but it would certainly match the symptoms.
One way you could get into this state would be to actually write to the string's characters, something like: strcat((char *)s.c_str(), "NT")
Show more code. It seems like you did cout << ealier and forgot that you did it. What does it print if you do cout<< "mofo" << stringName.c_str()<< "|||" << endl; Does it say NTmofoNT||| ? if so that may well be what happened ;)
This is not a problem with c_str(), but probably related to some other anomaly in the rest of the program.
Make a "hello world" application that does these same operations and you'll see it works fine there.