Why does std::operator>>(istream&, char&) extract whitespace? - c++

I was compiling the following program and I learned that the extractor for a char& proceeds to extract a character even if it is a whitespace character. I disabled the skipping of leading whitespace characters expecting the proceeding read attempts to fail (because formatted extraction stops at whitespace), but was surprised when it succeeded.
#include <iostream>
#include <sstream>
int main()
{
std::istringstream iss("a b c");
char a, b, c;
iss >> std::noskipws;
if (iss >> a >> b >> c)
{
std::cout << "a = \"" << a
<< "\"\nb = \"" << b
<< "\"\nc = \"" << c << '\n';
}
}
Output:
a = "a"
b = " "
c = "b"
As you can see from the output, b was given the value of the space between "a" and "b"; and c was given the following character "b". I was expecting both b and c to not have a value at all since the extraction should fail because of the leading whitespace. What is the reason for this behavior?

In IOStreams, characters have virtually no formatting requirements. Any and all characters in the character sequence are valid candidates for an extraction. For the extractors that use the numeric facets, extraction is defined to stop at whitespace. However, the extractor for charT& works directly on the buffer, indiscriminately returning the next available character presumably by a call to rdbuf()->sbumpc().
Do not assume that this behavior extends to the extractor for pointers to characters as for them extraction is explicitly defined to stop at whitespace.

Related

When parsing a space delimitated string, is there any advantage using getline over stringstream::operator>>?

int main()
{
std::string s = "my name is joe";
std::stringstream ss{s};
std::string temp;
while(std::getline(ss, temp, ' '))
{
cout << temp.size() << " " << temp << endl;
}
//----------------------------//
ss = std::stringstream{s};
while(ss >> temp)
{
cout << temp.size() << " " << temp << endl;
}
}
I've always used the former, but I'm wondering if there's any advantage to using the latter? I've typically always used the former because I feel that if someone were to instead change the string to a comma delimitated string, then all I need to do is put in a new delimiter, whereas the operator>> would read in the commas. But for space delimitation, it seems there is no difference.
std::getline() and operator>> are intended for different purposes. It is not a matter of which one is more advantageous than the other. Use the one that is better suited for the task at hand.
operator>> is for formatted input. It reads in and parses many different data types, including strings. If there is no error state on the input stream, it skips leading whitespace (unless the skipws flag on the input stream is disabled, such as with the std::noskipws manipulator), and then it reads and parses characters until it encounters whitespace, a character that does not belong to the data type being parsed, or the end of the stream.
std::getline() is for unformatted input of strings only. If there is no error state on the input stream, it does not skip leading whitespace, and then it reads characters until it encounters the specified delimiter (or '\n' if not specified), or the end of the stream.

C++: how to input values separated by comma(,)

int a, b, c, d;
There are 4 variables.
I want user to input 4 values, and each value is separated by comma(,)
Just like this:
stdin:
1,2,3,4
The following code works in C
scanf("%d,%d,%d,%d", &a, &b, &c, &d);
But how should I code in C++?
I’m kind of surprised at the incorrect commentary here[1].
There are two basic routes you can take:
handle the separator with a manipulator-style object, or
imbue the stream with a specialized facet that requires whitespace to include a comma.
I will focus on the first; it is typically a bad idea to imbue shared streams with weird behaviors even temporarily (“shared” in the sense that other parts of your code have access to it as well; a local stringstream would be an ideal candidate for imbuing with specialized behaviors).
A ‘next item must be a comma’ extractor:
#include <cctype>
#include <iostream>
struct extract
{
char c;
extract( char c ): c(c) { }
};
std::istream& operator >> ( std::istream& ins, extract e )
{
// Skip leading whitespace IFF user is not asking to extract a whitespace character
if (!std::isspace( e.c )) ins >> std::ws;
// Attempt to get the specific character
if (ins.peek() == e.c) ins.get();
// Failure works as always
else ins.setstate( std::ios::failbit );
return ins;
}
int main()
{
int a, b;
std::cin >> a >> extract(',') >> b;
if (std::cin)
std::cout << a << ',' << b << "\n";
else
std::cout << "quiznak.\n";
}
Running this code, the extract manipulator/extractor/whatever will succeed only if the next non-whitespace item is a comma. It fails otherwise.
You can easily modify this to make the comma optional:
std::istream& operator >> ( std::istream& ins, optional_extract e )
{
// Skip leading whitespace IFF user is not asking to extract a whitespace character
if (!std::isspace( e.c )) ins >> std::ws;
// Attempt to get the specific character
if (ins.peek() == e.c) ins.get();
// There is no failure!
return ins;
}
...
std::cin >> a >> optional_extract(',') >> b;
Etc.
[1] cin >> a >> b; is not equivalent to scanf( "%d,%d", ...);. C++ does not magically ignore commas. Just as in C, you must treat them explicitly.
The same for the answer using getline() and a stringstream; while the combination is valid, the actual problem is just shifted from std::cin to another stream object, and still must be treated.

Inconsistent behavior when parsing numbers from stringstream on different platforms

In a project i'm using a stringstream to read numeric values using the operator>>. I'm now getting reports indicating that the parsing bahaviour is inconsistent across different platforms if additional characters are appended to the number (for instance "2i"). Compiling the sample below with GCC/VCC/LLVM on Linux results in:
val=2; fail=0
Compiling and running it on iOS with either GCC or LLVM reportedly yields:
val=0; fail=1
What does the standard say about the behavior of operator>> in such a case?
--- Sample Code ---------------------------------------------
#include <sstream>
#include <iostream>
int main(int argc, const char **args)
{
double val;
std::stringstream ss("2i");
ss >> val;
std::cout << "val=" << val << "; fail=" << ss.fail() << std::endl;
return 0;
}
According to this reference:
Thus, in either case:
if your compiler is pre C++11 and reading fails it would leave the value of val intact and flag failbit with 0.
if you compiler is post C++11 and reading fails it would set the value of val equal to 0 and flag failbit with 0.
However, operator>>extracts and parses characters sequentially with function num_get::get [27.7.2.2.2 Arithmetic extractors], from the stream as long as it can interpret them as the representation of a value of the proper type.
Thus, in your case operator>> will call num_get::get for the first character (i.e., 2) and the reading will succeed, then it will move on to read the next character (i.e., i). i doesn't fit a numerical value and consequently num_get::get will fail and reading will stop. However, there are already valid characters been read. These valid characters will be processed and assigned to val, the rest of the characters will remain in the stringstream. To illustrate this I'll give an example:
#include <sstream>
#include <iostream>
#include <string>
int main(int argc, const char **args)
{
double val(0.0);
std::stringstream ss("2i");
ss >> val;
std::cout << "val=" << val << "; fail=" << ss.fail() << std::endl;
std::string str;
ss >> str;
std::cout << str << std::endl;
return 0;
}
Output:
val=2; fail=0
i
You see that if I use extract operator again to a std::string, the character i is extracted.
The above however, doesn't explain why you don't get the same behaviour in ios.
This is a known bug with libc++ that was submitted to Bugzilla. The problem as I see it is with std::num_get::do_get()'s double overload somehow continuing to parse the characters a, b, c, d, e, f, i, x, p, n and their captial equivalents despite those being invalid characters for an integral type (other than e where it denotes scientific notation but must be followed by a numeric value otherwise failure). Normally do_get() would stop when it finds an invalid character and not set failbit as long as characters were sucessfully extracted (as explained above).

Demonstration of noskipws in C++

I was trying out the noskipws manipulator in C++ and I wrote following code.
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
string first, middle, last;
istringstream("G B Shaw") >> first >> middle >> last;
cout << "Default behavior: First Name = " << first << ", Middle Name = " << middle << ", Last Name = " << last << '\n';
istringstream("G B Shaw") >> noskipws >> first >> middle >> last;
cout << "noskipws behavior: First Name = " << first << ", Middle Name = " << middle << ", Last Name = " << last << '\n';
}
I expect the following output:
Expected Output
Default behavior: First Name = G, Middle Name = B, Last Name = Shaw
noskipws behavior: First Name = G, Middle Name = , Last Name = B
Output
Default behavior: First Name = G, Middle Name = B, Last Name = Shaw
noskipws behavior: First Name = G, Middle Name = , Last Name = Shaw
I modified this code to make it work for chars like this and it works perfectly fine.
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
char first, middle, last;
istringstream("G B S") >> first >> middle >> last;
cout << "Default behavior: First Name = " << first << ", Middle Name = " << middle << ", Last Name = " << last << '\n';
istringstream("G B S") >> noskipws >> first >> middle >> last;
cout << "noskipws behavior: First Name = " << first << ", Middle Name = " << middle << ", Last Name = " << last << '\n';
}
I know how cin works and I wasn't able to figure out why it works this way in case of string.
std::istringstream("G B S") >> std::noskipws >> first >> middle >> last;
When an extraction is performed on strings, the string is first cleared and characters are inserted into its buffer.
21.4.8.9 Inserters and extractors
template<class charT, class traits, class Allocator>
basic_istream<charT, traits>&
operator>>(basic_istream<charT, traits>& is,
basic_string<charT, traits, Allocator>& str);
Effects: Behaves as a formatted input function (27.7.2.2.1). After constructing a sentry object, if the sentry converts to true, calls str.erase() and then extracts characters from is and appends them to str as if by calling str.append(1, c). [...]
The first read will extract the string "G" into first. For the second extraction, nothing will be extracted because the std::noskipws format flag is set, disabling the clearing of leading whitespace. Because of this, the string is cleared and then the extraction fails because no characters were put in. Here is the continuation of the above clause:
21.4.8.9 Inserters and extractors (Cont.)
[...] Characters are extracted and appended until any of the following occurs:
n characters are stored;
end-of-file occurs on the input sequence;
isspace(c, is.getloc()) is true for the next available input
character c.
When the stream determines a failed extraction the std::ios_base::failbit is set in the stream state indicating an error.
From this point on any and all attempts at I/O will fail unless the stream state is cleared. The extractor becomes inoperable and it will not run given a stream state not cleared of all its errors. This means that the extraction into last doesn't do anything and it retains the value it had at the previous extraction (the one without std::noskipws) because the stream did not clear the string.
As for the reason why using char works: Characters have no formatting requirements in C or C++. Any and all characters can be extracted into a object of type char, which is the reason why you're seeing the correct output despite std::noskipws being set:
27.7.2.2.3/1 [istream::extractors]
template<class charT, class traits>
basic_istream<charT, traits>& operator>>(basic_istream<charT, traits>& in,
charT& c);
Effects: Behaves like a formatted input member (as described in 27.7.2.2.1) of in. After a sentry object is constructed a character is extracted from in, if one is available, and stored in c. Otherwise, the function calls in.setstate(failbit).
The semantics for the extractor will store a character into its operand if one is available. It doesn't delimit upon whitespace (or even the EOF character!). It will extract it just like a normal character.
The basic algorithm for >> of a string is:
1) skip whitespace
2) read and extract until next whitespace
If you use noskipws, then the first step is skipped.
After the first read, you are positionned on a whitespace, so the next (and all following) reads will stop immediatly, extracting nothing.
For more information you can see this.
Form cplusplus.com ,
many extraction operations consider the whitespaces themselves as the terminating character, therfore, with the skipws flag disabled, some extraction operations may extract no characters at all from the stream.
So , remove the noskipws , when using with strings .
The reason is that in the second example you are not reading into last variable at all and instead you are printing old value of it.
std::string first, middle, last;
std::istringstream iss("G B S");
^^^
iss >> first >> middle >> last;
std::cout << "Default behavior: First Name = " << first
<< ", Middle Name = " << middle << ", Last Name = " << last << '\n';
std::istringstream iss2("G B T");
^^^
iss2 >> std::noskipws >> first >> middle >> last;
std::cout << "noskipws behavior: First Name = " << first
<< ", Middle Name = " << middle << ", Last Name = " << last << '\n';
Default behavior: First Name = G, Middle Name = B, Last Name = S
noskipws behavior: First Name = G, Middle Name = , Last Name = S
This happen because after second read to variable last stream is positioned on whitespace.

c++ Reading text file into array of structs not working

I have been working on this for a while and can't fix it. I am very new to C++. So far I can get 10 things into my array but the output is not legible, it's just a bunch of numbers. I have read other posts with similar code but for some reason mine isn't working.
The input text file is 10 lines of fake data like this:
56790 "Comedy" 2012 "Simpsons" 18.99 1
56791 "Horror" 2003 "The Ring" 11.99 7
My code is here:
(My output is below my code)
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
struct DVD {
int barcode;
string type;
int releaseDate;
string name;
float purchaseprice;
int rentaltime;
void printsize();
};
int main () {
ifstream in("textfile.exe");
DVD c[10];
int i;
for (int i=0; i < 10; i++){
in >> c[i].barcode >> c[i].type >> c[i].releaseDate >>
c[i].name >> c[i].purchaseprice >> c[i].rentaltime;
}
for (int i=0;i< 10;i++) {
cout << c[i].barcode<<" ";
cout << c[i].type<<" ";
cout << c[i].releaseDate<<" ";
cout << c[i].name << " ";
cout << c[i].purchaseprice << " ";
cout << c[i].rentaltime << "\n";
}
return 0;
}
My output looks similar to garbage, but there are 10 lines of it like my array:
-876919876 -2144609536 -2.45e7 2046
A comment on what to study to modify my code would be appreciated.
As suggested by cmbasnett, ifstream in("textfile.exe") reads in an executable file. If you with for the program to read in a text file, changing it to ifstream in("textfile.txt") should work.
You always need to check that your input is actually correct. Since it may fail prior to reading 10 lines, you should probably also keep a count of how many entries you could successfully read:
int i(0);
for (; i < 10
&& in >> c[i].barcode >> c[i].type >> c[i].releaseDate
>> c[i].name >> c[i].purchaseprice >> c[i].rentaltime; ++i) {
// ???
}
You actual problem reading the second line is that your strings are quoted but the approach used for formatted reading of strings doesn't care about quotes. Instead, strings are terminated by a space character: the formatted input for strings will skip leading whitespace and then read as many characters until another whitespace is found. On your second line, it will read "The and then stop. The attempt to read the purchaseprice will fail because Ring isn't a value numeric value.
To deal with that problem you might want to make the name quotedstring and define an input and output operators for it, e.g.:
struct quoted_string { std::string value; };
std::istream& operator>> (std::istream& in, quoted_string& string) {
std::istream::sentry cerberos(in); // skips leading whitespace, etc.
if (in && in.peek() == '"') {
std::getline(in.ignore(), string.value, '"');
}
else {
in.setstate(std::ios_base::failbit);
}
return in;
}
std::ostream& operator<< (std::ostream& out, quoted_string const& string) {
return out << '"' << string.value << '"';
}
(note that the code isn't test but I'm relatively confident that it might work).
Just to briefly explain how the input operator works:
The sentry is used to prepare the input operation:
It flushes the tie()d std::ostream (if any; normally there is none except for std::cin).
It skips leading whitespace (if any).
It checks if the stream is still not in failure mode (i.e., neither std::ios_base::failbit nor `std::ios_base::badbit are set).
To see if the input starts with a quote, in.peek() is used: this function returns an int indicating either that the operation failed (i.e., it returns std::char_traits<char>::eof()) or the next character in the stream. The code just checks if it returns " as it is a failure if the stream returns an error or any other character is present.
If there is a quote, the quote is skipped using file.ignore() which by default just ignores one character (it can ignore more characters and have a character specified when to stop).
After skipping the leading quote, std::getline() is used to read from file into string.value until another quote is found. The last parameter is defaulted to '\n' but for reading quoted string using a '"' is the correct value to use. The terminating character is, conveniently, not stored.