When to use a blank cin.get()? - c++

As the title says - when should I use a blank cin.get() ?
I encountered situations when the program acted strange until I added a few blank cin.get()s between reading lines. (e.g. in a struct when reading its fields I had to enter a cin.get() between each non-blank cin.get())
So what does a blank cin.get() do and when should I use it?
Thanks.

There are two broad categories of stream input operations: formatted and unformatted. Formatted operations expect input in a particular form; they start out by skipping whitespace, then looking for text that matches what they expect. They typically are written as extractors; that's the >> that you see so often:
int i;
std::cin >> i; // formatted input operation
Here, the extractor is looking for digits, and will translate the digits that it sees into an integer value. When it sees something that isn't a digit it stops looking.
Unformatted input operations just do something, without regard to any rules about what the input should look like. basic_istream::get() is one of those: it simply reads a character or a sequence of characters. If you ask it to read a sequence it doesn't care what's in that sequence, except that the form that takes a delimiter looks for that delimiter. Other than that, it just copies text.
When you mix formatted and unformatted operations they fight with each other.
int i;
std::cin >> i;
If std::cin is reading from the console (that is, you haven't redirected it at the command line), you'll typically type in some digits followed by the "Enter" key. The extractor reads the digits, and when it hits the newline character (that's what the "Enter" key looks like on input) it stops reading, and leaves the newline character alone. That's fine, if the next operation on that stream is also a formatted extractor: it skips the newline character and any other whitespace until it hits something that isn't whitespace, and then it starts translating the text into the appropriate value.
There's a problem, though, if you use a formatted operation followed by an unformatted operation. This is a common problem when folks mix extractors (>>) with getline(): the extractor reads up to the newline, and the call to getline() reads the newline character, says "Hey, I've got an empty line", and returns an empty string.
Same thing for the version of basic_istream::get() that reads a sequence of characters: when it hits the delimiter (newline if you haven't specified something else) it stops reading. If that newline was a leftover from an immediately preceding formatted extractor, it's probably not what you're looking for.
One (really really ugly) solution is the brute force cin.ignore(256, '\n');, which ignores up to 256 sequential newline characters.
A more delicate solution is to not create the problem in the first place. If you need to read lines, read lines. If you need to read lines and sometimes extract values from the text in a line, read the line, then create a std::stringstream object and extract from that.

Related

Two cins in a row, what exactly happens with whitespaces?

cin >> name;
cin >> age;
cout << name << age;
What exactly is happening here if I type a string, then some whitespace and a number? For example Something 20. Does it read Something then sees the whitespace and goes okay that's the end of this first line because a whitespace terminates the reading of the string, goes to the next input and reads 20?
But I also wanna be a bit more specific. Is it okay to say at first when I'm in the console typing Something, that's going into the standard input stream, then getting stored in the buffer and when I press that space it's like pressing enter? And that Something gets extracted and assigned to name? Then that 20 I type is like a whole new unrelated line because I pressed space earlier and so that gets extracted and assigned to age?
How they'll get extracted
The integer gets extracted via std::basic_istream::operator::>>:
Extracts values from an input stream
1-4 ) Extracts an integer value potentially skipping preceding
whitespace. The value is stored to a given reference value.
This function behaves as a FormattedInputFunction. After constructing and
checking the sentry object, which may skip leading whitespace,
extracts an integer value by calling std::num_get::get().
The string gets extracted via std::basic_string::operator>>:
2 ) Behaves as a FormattedInputFunction. After constructing and
checking the sentry object, which may skip leading whitespace, first
clears str with str.erase(), then reads characters from is and appends
them to str as if by str.append(1, c), until one of the following
conditions becomes true:
N characters are read, where N is is.width() if is.width() > 0,
otherwise N is str.max_size()
the end-of-file condition occurs in the stream is
std::isspace(c,is.getloc()) is true for the next character c
in is (this whitespace character remains in the input stream).
And in FormattedInputFunction:
if ios_base::skipws flag is set on this input stream, extracts and
discards characters from the input stream until one of the following
becomes true:
the next available character on the input stream is not
a whitespace character, as tested by the std::ctype facet of the
locale currently imbued in this input stream. The non-whitespace
character is not extracted.
the end of the stream is reached, in which
case failbit and eofbit are set and if the stream is on for exceptions
on one of these bits, ios_base::failure is thrown.
And as stated in Basic Input/Output from cplusplus.com:
...Note that the characters introduced using the keyboard are only transmitted to the
program when the ENTER (or RETURN) key is pressed.
...
...cin extraction always considers spaces (whitespaces, tabs,
new-line...) as terminating the value being extracted, and thus
extracting a string means to always extract a single word, not a
phrase or an entire sentence.
Testing
Compiling and testing your program with leading and trailing whitespaces via MSVC-v142 compiler:
AA 123 some trailing whitespaces
Prints out:
AA123
Read also
Stackoverflow: Clarify the difference between input/output stream and input/output buffer
Learn cpp: Input and output streams

What does the format string "%*[^\n]" in a scanf() statement instruct? How do assignment suppressor (*) and negated scanset ([^) work together?

I know about the introduction of the scanset with the [ conversion specifier which subsequent indicate characters to match or not to match with an additional interposition of the ^ symbol.
For this, in ISO/IEC 9899/1999 (C99) is stated:
The characters between the brackets (the scanlist) compose the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all characters that do not appear in the scanlist between the circumflex and the right bracket.
So, the expression [^\n] means, that it is scanning characters until a \n character is found in the according stream, here at scanf(), stdin. \n is not taken from stdin and scanf() proceeds with the next format string if any remain, else it skips to the next C statement.
Next there is the assignment-suppression-operator *:
For this, in ISO/IEC 9899/1999 (C99) is stated:
Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result.
Meaning in the case of f.e. scanf("%*100s",a); that a sequence of 100 characters is taken from stdin unless a trailing white-space character is found but not assigned to a if a is a proper-defined char array of 101 elements (char a[101];).
But what does now the format string "%*[^\n]" in a scanf()-statement achieve?
Does \n remain instdin?
How do assignment supressor * and negated scanset [^ work together?
Does it mean, that:
By using * all characters matching to this format string are taken from stdin, but are sure not assigned?, and
\n isn't taken from stdin but it is used to determine the scan-operation for the according format string?
I know what each of those [^ and * do alone, but not together. The question is what is the result of the mix of those two together, incorporated with the negated scanset of \n.
I know that there is a similar question on Stack Overflow which covers the understanding of %[^\n] only, here: What does %[^\n] mean in a scanf() format string. But the answers there do not help me with my problem.
%[^\n] reads up to but not including the next \n character. In plain English, it reads a line of text. Normally, the line would be stored in a char * string variable.
char line[SIZE];
scanf("%[^\n]", line);
The * modifier suppresses that behavior. The line is simply discarded after being read and no variable is needed.
scanf("%*[^\n]");
* doesn't alter how the input is processed. In either case, everything up to but not including the next \n is read from stdin. Assuming no I/O errors, it is guaranteed that the next read from stdin will see either \n or EOF.
Which scanf() statement should I use if I want to read and thereafter discard every character in stdin including the \n character?
Add %*c to also consume the \n.
scanf("%*[^\n]%*c");
Why %*c instead of just \n? If you used \n it wouldn't just consume a single newline character, it would consume any number of spaces, tabs, and newlines. \n matches any amount of whitespace. It's better to use %*c to consume exactly 1 character.
// Incorrect
scanf("%*[^\n]\n");
See also:
How to skip a line when fscanning a text file?
Could I use fflush() instead?
No, don't. fflush(stdin) is undefined.
Isn't the negated scanset of [\n] completely redundant because scanf() terminate the scan process of the according format string at first occurrence of a white space character by default?
With %s, yes, it will stop reading at the first whitespace character. %s only reads a single word. %[^\n], by contrast, reads an entire line. It will not stop at spaces or tabs, only newlines.
More generally, with square brackets only the exact characters listed are relevant. There is no special behavior for whitespace. Unlike %s it does not skip leading whitespace, nor does it stop processing early if it encounters whitespace.
Does \n remain in stdin?
Yes, it does.
But what does now the format string "%*[^\n]" in a scanf()-statement achieve?
It reads all characters from the input stream until it reaches a newline and discards them, without removing the newline from the input stream.
By using * all characters matching to this format string are taken from stdin, but are not assigned?
Correct.
\n isn't taken from stdin but it is used to determine the scan-operation for the according format string?
Exactly. When \n is reached, most scanfs use ungetc to push the character back to the input stream.
I know what each of those [^ and * do alone, but not together.
Putting * before [^ does exactly what [^ alone does except that it does not read the input into an argument and instead discards it.
If you want to discard the \n afterwards, use this format string:
"%*[^\n]%*c"
Since it doesn't appear to be covered, the working way to read everything before newline, and then the newline, is:
scanf("%*[^\n]%*c");
%*[^\n] reads and discards until next character is newline
%*c reads and discards just one character, which per above will be newline
You could also read the newline with %c to a variable and see if you really get a newline successfully, but you could also just directly check for EOF or error directly and not bother with this this.
%[^\n] tells scanf to read everything until a newline character ('\n') and store it in its corresponding argument.
%*[^\n] tells scanf to read everything until a newline character ('\n') and discard it instead of storing it.
Examples:
Input Hi there\n into scanf("%[^\n]", buffer); results in buffer content Hi there and leftover stdin content \n
Input Hi there\n into scanf("%*[^\n]"); results in Hi there getting scanned and discarded from the stdin and leftover stdin content \n.
Note that both %[^\n] and %*[^\n] will fail if the first character that it encounters is a \n character. Once it fails, the stdin is left untouched and the scanf returns resulting in the rest of the format string getting ignored.
If you wish to remove clear a line of stdin upto and including the newline character using scanf, use
scanf("%*[^\n]"); /* Read and discard everything until a newline character */
scanf("%*c"); /* Discard the newline character */

istringstream ignores first letter

I am trying to access different words in a string using std::istringstream and I am also doing so with multiple test cases.
int t;
cin>>t;
while(t--)
{
string arr;
cin.ignore();
getline(cin,arr);
istringstream iss(arr);
string word;
while(iss>>word)
{
cout<<word;
}
}
For the first test case, everything is perfect (i.e. it outputs the correct words). But for every subsequent test case, the first letter of the first word is left out.
Example:
Input:
4
hey there hey
hi hi hi
my name is xyz
girl eats banana
And I'm getting:
Output:
hey there hey
i hi hi
y name is xyz
irl eats banana
Can anyone please suggest me what to do and why is this error occurring?
Your problem is that formatted input, i.e., something like in >> value conventionally skips leading whitespace before attempting to read. Unformatted input, on the other hand, doesn't skip leading whitespace. With the std::cin.ignore(); in your loop you make the assumption that std::getline(std::cin, arr) would leave the newline in the input like the input of t does. That is not so. std::getline() extracts and stores all characters up to the first newline where it stop, still extracting the newline. So, you'd remove the cin.ignore(); from the loop.
The key question becomes how to switch between formatted input and unformatted input. Since the newline upon entry of a numeric value may be preceded with arbitrary spaces which you probably also want to ignore, there are essentially to ways:
std::cin >> std::ws; skips all leading whitespace. That may include multiple newlines and spaces at the beginning of the line. Skipping those may not necessarily desirable.
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); ignores all characters up to and including the first newline. That would allow for empty lines to follow up as well as lines starting with leading whitespace.
This line is the culprit: cin.ignore();.
When std::basic_istream::ignore is called without any arguments, it ignores exactly 1 character.
In your case, std::cin.ignore() will ignore the first letter, but not for the first test case, because at that point std::cin is empty, so there is nothing to ignore. But then, std::cin has the other words in it, so it ignores 1 character from the first word.
According to the documentation of std::basic_istream::ignore:
Extracts and discards characters from the input stream until and
including delim. ignore behaves as an UnformattedInputFunction
Its worth to mention that std::basic_istream::ignore will block and wait for user input if there is nothing to ignore in the stream.
With this in mind, lets break down what your code does:
the first time you call this function in your loop, it is going to
ignore the new line character that is still in the buffer from the
previous cin>>t operation. Then the getline statment will wait and read a line from the user.
The next time around, since there is nothing in the buffer to
ignore(as std::getline doesn't leave the new line character in the
buffer), it is going to block and wait for input to ignore. So
the next time the program block and waits for input, it is because
of the ignore() function,not the getline function as you would
have hoped, and the moment you provide an input(i.e you second test
case),one character from the input is going to be ignored.
The next getline function will not block since there is something
in the buffer left by the previous ignore function after it
ignores the first character of the input so getline will read the
remaining string which will happen to be one character short.
The process continues from step 2 until your loop terminates.
int t;
cin>>t;//this leaves new line character in the buffer
while(t--)
{
string arr;
cin.ignore();//this will ignore one character from the buffer,so the first time
//it will ignore the new line character from the previous cin input
//but next time it will block and wait for input to ignore
getline(cin,arr);//this will not block if there is something in the buffer
//to read
...
}
The solution would be to move the ignore statement out of the loop and next to your cin>>t statement. It's also better write ignore(INT_MAX,'\n'); in this case. You might also want to read this answer to see when and how to use ignore.

Reading whole line with std::cin

I would like to figure out how to read a whole line (including spaces) with std::cin. I am aware of the existence of std::getline, I would just like to figure out how to do it with std::cin so I can better understand iostream in C++. I've tried using a for loop with std::cin, however it keeps reading past the end of the line. Any help would be greatly appreciated.
Also the cin << only allows us to enter one word into a string.
However, there is a cin function that reads text containing blanks.
std::cin.get(name, MAX);
get will read all characters including spaces until Max characters have been read or the end of line character (‘\n’) is reached and will put them into the name variable.
You should decide what is MAX.

Differentiating between delimiter and newline in getline

ifstream file;
file.open("file.csv");
string str;
while(file.good())
{
getline(file,str,',')
if (___) // string was split from delimiter
{
[do this]
}
else // string was split from eol
{
[do that]
}
}
file.close();
I'd like to read from a csv file, and differentiate between what happens when a string is split off due to a new line and what happens when it is split off due to the desired delimiter -- i.e. filling in the ___ in the sample code above.
The approaches I can think of are:
(1) manually adding a character to the end of each line in the original file,
(2) automatically adding a character to the end of each line by writing to another file,
(3) using getline without the delimiter and then making a function to split the resulting string by ','.
But is there a simpler or direct solution?
(I see that similar questions have been asked before, but I didn't see any solutions.)
My preference for clarity of the code would be to use your option 3) - use getline() with the standard '\n' delimiter to read the file into a buffer line by line and then use a tokenizer like strtok() (if you want to work on the C level) or boost::tokenizer to parse the string you read from the file.
You're really dealing with two distinct steps here, first read the line into the buffer, then take the buffer apart to extract the components you're after. Your code should reflect that and by doing so, you're also avoiding having to deal with odd states like the ones you describe where you end up having to do additional parsing anyway.
There is no easy way to determine "which delimiter terminated the string", and it gets "consumed" by getline, so it's lost to you.
Read the line, and parse split on commas yourself. You can use std::string::find() to find commas - however, if your file contains strings that in themselves contain commas, you will have to parse the string character by character, since you need to distinguish between commas in quoted text and commas in unquoted text.
Your big problem is your code does not do what you think it does.
getline with a delimiter treats \n as just another character from my reading of the docs. It does not split on both the delimiter and newline.
The efficient way to do this is to write your oen custom splitting getline: cppreference has a pretty clear description of what getline does, mimicing it should be easy (and safer than shooting from the hip, files are tricky).
Then return both the string, and information about why you finished your parse in a second channel.
Now, using getline naively then splitting is also viable, and will be much faster to write, snd probably less error prone to boot.