I am looking over some legacy code and there is a fair amount of stringstream usage. The code is generating messages generally from various types ok so far. Apart from the fact that it is in some cases doing the following:
std::ostringstream f1;
f1 << sDirectory << mFileName << sFileExtension << '\0';
and in others doing (Just illustration)
std::ostringstream f1;
f1 << sDirectory << mFileName << sFileExtension << std::ends;
I believe These calls are because further on it accesses f1.str().c_str() and needs to null terminate it.
Is there any difference in these calls ? I see from http://en.cppreference.com/w/cpp/io/manip/ends that std::ends doesn't flush, is std::ends different across different platforms (Linux/Windows/Mac)? Should I prefer one over the other?
Further to that I read that there should be a call to freeze(false) on the stringstream later in the scope (after str() use) to allow the buffer to be deallocated (http://en.cppreference.com/w/cpp/io/ostrstream/freeze). Again (possibly I misread or misunderstood) but there is no call to freeze(false) so does that indicate that every stream above is leaking?
N.B. FYI This is Visual Studio 2005/Windows 7 but I don't know if that has any baring.
Apologies if I'm being dense...
std::ends is defined as having the following effect:
Inserts a null character into the output sequence: calls os.put(charT()).
When charT is char, it is value initialized to have the value 0, which is equivalent to the character literal \0. So when charT is char, which it usually is, the two lines of code are exactly the same.
However, using std::ends will work well even when the character type of your stream is not char.
Related
Consider the following program:
#include <iostream>
#include <sstream>
#include <string>
int main(int, char **) {
std::basic_stringstream<char16_t> stream;
stream.put(u'\u0100');
std::cout << " Bad: " << stream.bad() << std::endl;
stream.put(u'\uFFFE');
std::cout << " Bad: " << stream.bad() << std::endl;
stream.put(u'\uFFFF');
std::cout << " Bad: " << stream.bad() << std::endl;
return 0;
}
The output is:
Bad: 0
Bad: 0
Bad: 1
It seems the reason the badbit gets set is because 'put' sets the badbit if the character equals std::char_traits::eof(). I can now no longer put to the stream.
At http://en.cppreference.com/w/cpp/string/char_traits it states:
int_type: an integer type that can hold all values of char_type plus
EOF
But if char_type is the same as int_type (uint_least16_t) then how can this be true?
The standard is quite explicit, std::char_traits<char16_t>::int_type is a typedef for std::uint_least16_t, see [char.traits.specializations.char16_t], which also says:
The member eof() shall return an implementation-defined constant that cannot appear as a valid UTF-16 code unit.
I'm not sure precisely how that interacts with http://www.unicode.org/versions/corrigendum9.html but existing practice in the major C++ implementations is to use the all-ones bit pattern for char_traits<char16_t>::eof(), even when uint_least16_t has exactly 16 bits.
After a bit more thought, I think it's possible for implementations to meet the Character traits requirements by making std::char_traits<char16_t>::to_int_type(char_type) return U+FFFD when given U+FFFF. This satisfies the requirement for eof() to return:
a value e such that X::eq_int_type(e,X::to_int_type(c)) is false for all values c.
This would also ensure that it's possible to distinguish success and failure when checking the result of basic_streambuf<char16_t>::sputc(u'\uFFFF'), so that it only returns eof() on failure, and returns u'\ufffd' otherwise.
I'll try that. I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80624 to track this in GCC.
I've also reported an issue against the standard, so we can fix the "cannot appear as a valid UTF-16 code unit" wording, and maybe fix it some other way too.
The behavior is interesting, that:
stream.put(u'\uFFFF');
sets the badbit, while:
stream << u'\uFFFF';
char16_t c = u'\uFFFF'; stream.write( &c, 1 );
does not set badbit.
This answer only focus on the differences.
So let's check gcc's source code in bits/ostream.tcc, line 164~165, we can see that put() checks if the value equals to eof(), and set the badbit.
if (traits_type::eq_int_type(__put, traits_type::eof())) // <== It checks the value!
__err |= ios_base::badbit;
From line 196, we can see write() does not have this logic, it only checks if all the chars are written to the buffer.
This explains the behavior.
From std::basic_ostream::put's description:
Internally, the function accesses the output sequence by first
constructing a sentry object. Then (if good), it inserts c into its
associated stream buffer object as if calling its member function
sputc, and finally destroys the sentry object before returning.
It does not tell anything about the check of eof().
So I would think this is either a bug in document or a bug in implementation.
That really depends on what you mean by "large enough". char16_t is not "a type large enough to hold any Unicode character including those which I'm not allowed to use". You chose to try to cram \uFFFF, which is "reserved for internal use", into a char16_t, and thus you are the one at fault. The program is simply doing as you instructed.
Pre-history: I'm trying to ensure that some function foo(std::stringstream&) consumes all data from the stream.
Answers to a previous question suggest that using stringstream::str() is the right way of getting content of a stringstream. I've also seen it being used to convert arbitrary type to string like this:
std::stringstream sstr;
sstr << 10;
assert(sstr.str() == std::string("10")); // Conversion to std::string for clarity.
However, the notion of "content" is somewhat vague. For example, consider the following snippet:
#include <assert.h>
#include <sstream>
#include <iostream>
int main() {
std::stringstream s;
s << "10 20";
int x;
s >> x;
std::cout << s.str() << "\n";
return 0;
}
On Ideone (as well as on my system) this snippet prints 10 20, meaning that reading from stringstream does not modify what str() returns. So, my assumption is that that str() returns some internal buffer and it's up to stringstream (or, probably, its internal rdbuf, which is stringbuf by default) to handle "current position in that buffer". It's a known thing.
Looking at stringbuf::overflow() function (which re-allocates the buffer if there is not enough space), I can see that:
this may modify the pointers to both the input and output controlled sequences (up to all six of eback, gptr, egptr, pbase, pptr, epptr).
So, basically, there is no theoretical guarantee that writing to stringstream won't allocate a bigger buffer. Therefore, even using stringstream::str() for converting int to string is flawed: assert(sstr.str() == std::string("10")) from my first snippet can fail, because internal buffer is not guaranteed to be precisely of the necessary size.
Question is: what is the correct way of getting the "content" of stringstream, where "content" is defined as "all characters which could be consumed from the steream"?
Of course, one can read char-by-char, but I hope for a less verbose solution. I'm interested in the case where nothing is read from stringstream (my first snippet) as I never saw it fail.
You can use the tellg() function (inherited from std::basic_istream) to find the current input position. If it returns -1, there are no further characters to be consumed. Otherwise you can use s.str().substr(s.tellg()) to return the unconsumed characters in stringstream s.
I saw a lot of questions on the peek method, but mine concerns a topic which would be almost obvious, but nevertheless (I think) interesting.
Suppose you have a binary file to read, and that you choose to bring up it as a whole in the program memory and use an istringstream object to
perform the reading.
For instance, if you are searching for the position og a given byte in the stream, accessing the hard disk repeatedly would waste time and resources...
But once you create the istringstream object any eventual NULL byte is
treated as an EOF signal.
At least this is what happened to me in the following short code:
// obvious omissis
std::istringstream is(buffer);
// where buffer is declared as char *
// and filled up with the contents of
// a binary file
char sample = 'a';
while(!is.eof() && is.peek() != sample)
{ is.get(); }
std::cout << "found " << sample << " at " << is.tellg() << std::endl;
This code doesn't work neither with g++ 4.9 nor with clang 3.5 in the
hypothesis that there is a null byte inside buffer before a match
with sample can be found, since that null byte sets the eof bit.
So my question is: Is this kind of approach to be avoided at all or there is some way to teach peek that a null byte is not "necessarily" the end of the stream?
If you look at your std::istringstream constructors, you'll see (2) takes a std::string. That can have embedded NULs, but if you pass buffer and that's a character array or char*, the string constructor you implicitly invoke will use a strlen-style ASCIIZ length determination to work out how much data to load. You should instead specify the buffer size explicitly - something like:
std::string str(buffer, bytes);
std::istringstream is(str);
Then your while(!is.eof() is bodgy... there are hundreds of S.O. Q&A about that issue; one at random - here.
I have a input stream IPCimstream, which returns a pointer to the character buffer of its stream with dataBuf() function.
Say I have
IPCimstream ims;
What is the difference between printing
1.
cout << ims.dataBuf() << endl;
and
2.
cout << (void*)ims.dataBuf() << endl;
If possible pls explain with an example. Say ims.dataBuf() has "Hello world\n", etc. or other examples which you feel explain the difference well. Sorry I am new to input stream and I couldnt come up with more interesting examples if there might be any.
Also, what would be the difference if IPCimstream is a character stream vs. binary stream. Thanks.
Well, the difference is that char* overload of cout::operator<< treats the pointer as a zero-terminated C string (well, C strings are just char pointers anyway), so it outputs the string itself. If your buffer is not a zero-terminated string, the cout's guess is wrong, so it will output some random garbage till the first \0.
The void* version of the same operator doesn't know what is the object behind the pointer, so everything it can do is just to output the pointer value.
You see, this behaviour is not connected with the IPCimstream class, it's just how cout works. (Look at teh example at http://ideone.com/1ErtV).
Edit:
In the case if dataBuf containing "Hello world\n" the char* version interprets the pointer as a zero-terminated string. So it will output the characters "Hello world", output the newline character, and than all the characters that happen to be in the memory after \n till the next \0. If there is no such character in the memory, the program may just crash. (For language purists: you'll get undefined behaviour.)
The void* version doesn't know how to treat the value pointed to by the pointer -- so it outputs the pointer value (i.e., the address) itself.
Edit 2:
The difference between the character stream and binary stream may be only in the data they hold. In any case, if dataBuf() returns a char*, the cout will output all the characters found in the buffer (and potentially beyond it) until the first \0 (or just nothing if \0 is at the beginning), and with the cast you'll get just the buffer's address output as string.
This is trivial, probably silly, but I need to understand what state cout is left in after you try to print the contents of a character pointer initialized to '\0' (or 0). Take a look at the following snippet:
const char* str;
str = 0; // or str = '\0';
cout << str << endl;
cout << "Welcome" << endl;
On the code snippet above, line 4 wont print "Welcome" to the console after the attempt to print str on line 3. Is there some behavior I should be aware of? If I substitute line 1-3 with cout << '\0' << endl; the message "Welcome" on the following line will be successfully printed to the console.
NOTE: Line 4 just silently fails to print. No warning or error message or anything (at least not using MinGW(g++) compiler). It spewed an exception when I compiled the same code using MS cl compiler.
EDIT: To dispel the notion that the code fails only when you assign str to '\0', I modified the code to assign to 0 - which was previously commented
If you insert a const char* value to a standard stream (basic_ostream<>), it is required that it not be null. Since str is null you violate this requirement and the behavior is undefined.
The relevant paragraph in the standard is at §27.7.3.6.4/3.
The reason it works with '\0' directly is because '\0' is a char, so no requirements are broken. However, Potatoswatter has convinced me that printing this character out is effectively implementation-defined, so what you see might not quite be what you want (that is, perform your own checks!).
Don't use '\0' when the value in question isn't a "character"
(terminator for a null terminated string or other). That is, I think,
the source of your confusion. Something like:
char const* str = "\0";
std::cout << str << std::endl;
is fine, where str points to a string which contains a '\0' (in this
case, two '\0'). Something like:
char const* str = NULL;
std::cout << str << std::endl;
is undefined behavior; anything can happen.
For historical reasons (dating back to C), '\0' and 0 will convert
implicitly to any pointer type, resulting in a null pointer.
A char* that points to a null character is simply a zero-length string. No harm in printing that.
But a char* whose value is null is a different story. Trying to print that would mean dereferencing a null pointer, which is undefined behavior. A crash is likely.
Assigning '\0' to a pointer isn't really correct, by the way, even if it happens to work: you're assigning a character value to a pointer variable. Use 0 or NULL, or nullptr in C++11, when assigning to a pointer.
Just regarding the cout << '\0' part…
"Terminating the string" of a file or stream in text mode has an undefined effect on its contents. The C++ standard defers to the C standard on matters of text semantics (C++11 27.9.1.1/2), and C is pretty draconian (C99 §7.19.2/2):
Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character.
Since '\0' is a control character and cout is a text stream, the resulting output may not read as you wrote it.
Take a look at this example:
http://ideone.com/8MHGH
The main problem you have is that str is pointer to a char not a char, so you should assign it to a string: str = "\0";
When you assign it to char, it remains 0 and then the fail bit of cout becomes true and you can no longer print to it. Here is another example where this is fixed:
http://ideone.com/c4LPh