What does '|' mean in scanf format string - c++

char szA[256]={0};
scanf("%[^a]%s",&szA); //failed when trailing string
scanf("%[^a]|%s",&szA); //worked whatever the input
What does '|' mean in a format string. I cannot find official specification. Is there anyone who can give me some clue?
When I input something with several '|' ,the latter one still works(just means that the program not breakdown). Doesn't it need two buffers given after the format string. The former one crashed when input string can be divided into more than one string. So there is still other difference between them. What is it ?
So, I cannot understand why the latter one works when the buffer number less than directive number while the former one fails. Or can someone give me a input string to make the latter one crash.

What does '|' mean in a format string. I cannot find official specification. Is there anyone who can give me some clue?
It means that the code expects a literal | in the input stream.
Having said, that format specifier is not going to work.
The %[^a] part will capture all characters that are not a. That means it will capture even a | from the input stream. It will stop capturing when the character a is encountered in the stream. Of course that does not match the literal | in the format string. Hence, nothing after that will be processed.
If I provide the input def|akdk to the following program
#include <stdio.h>
int main()
{
char szA[256] = {0};
char temp[100] = {0};
int n = scanf("%[^a]|%s", szA, temp);
printf("%d\n%s\n%s\n", n, szA, temp);
}
I get the following output
1
def|
which makes perfect sense. BTW, the last line in the output is an empty line. I'm not sure how to show that in an answer.
When I change the scanf line to
int n = scanf("%[^a]a%s", szA, temp);
I get the following output
2
def|
kdk
which makes perfect sense.

It's not one of the format specifiers so it's a literal | character, meaning it must be present in the input stream. The official specification is the section entitled The fscanf function found in the ISO standard (e.g., C11 7.21.6.2) and the relevant section states:
The format is composed of zero or more directives: one or more white-space characters, an ordinary multibyte character (neither % nor a white-space character), or a conversion specification.
A directive that is an ordinary multibyte character is executed by reading the next characters of the stream. If any of those characters differ from the ones composing the directive, the directive fails and the differing and subsequent characters remain unread.
You can see the effect in the following complete program which fails to scan "four|1" when you're looking for the literal _ but works fine when you're looking for |.
#include <stdio.h>
int main(void) {
char cjunk[100];
int ijunk;
char inStr[] = "four|1";
if (sscanf(inStr, "%4s_%d", cjunk, &ijunk) != 2)
printf ("Could not scan\n");
if (sscanf(inStr, "%4s|%d", cjunk, &ijunk) == 2)
printf ("Scanned okay\n");
return 0;
}

So, after some conversation, in my comprehension, the latter one requires the remaining stream starts with '|' when dealing with the '|%s' directive. While the former directive excludes 'a' and leaves the remaining stream starts with 'a'. So the trailing directive always matches nothing and doesn't need to put anything into the buffer. So it never crashes even though the buffer not given.

Related

Does istream::ignore discard more than n characters?

(this is possibly a duplicate of Why does std::basic_istream::ignore() extract more characters than specified?, however my specific case doesn't deal with the delim)
From cppreference, the description of istream::ignore is the following:
Extracts and discards characters from the input stream until and including delim.
ignore behaves as an UnformattedInputFunction. After constructing and checking the sentry object, it extracts characters from the stream and discards them until any one of the following conditions occurs:
count characters were extracted. This test is disabled in the special case when count equals std::numeric_limitsstd::streamsize::max()
end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
the next available character c in the input sequence is delim, as determined by Traits::eq_int_type(Traits::to_int_type(c), delim). The delimiter character is extracted and discarded. This test is disabled if delim is Traits::eof()
However, let's say I've got the following program:
#include <iostream>
int main(void) {
int x;
char p;
if (std::cin >> x) {
std::cout << x;
} else {
std::cin.clear();
std::cin.ignore(2);
std::cout << "________________";
std::cin >> p;
std::cout << p;
}
Now, let's say I input something like p when my program starts. I expect cin to 'fail', then clear to be called and ignore to discard 2 characters from the buffer. So 'p' and '\n' that are left in the buffer should be discarded. However, the program still expects input after ignore gets called, so in reality it's only get to the final std::cin>>p after I've given it more than 2 characters to discard.
My issue:
Inputting something like 'b' and hitting Enter immediately after the first input (so 2 after the characters get discarded, 'p' and '\n') keeps 'b' in the buffer and immediately passes it to cin, without first printing the message. How can I make it so that the message gets printed immediately after the two characters are discarded and then << is called?
After a lot of back and forth in the comments (and reproducing the problem myself), it's clear the problem is that:
You enter p<Enter>, which isn't parsable
You try to discard exactly two characters with ignore
You output the underscores
You prompt for the next input
but in fact things seem to stop at step 2 until you give it more input, and the underscores only appear later. Well, bad news, you're right, the code is blocking at step 2 in ignore. ignore is blocking waiting for a third character to be entered (really, checking if it's EOF after those two characters), and by the spec, this is apparently the correct thing to do, I think?
The problem here is the same basic issue as the problem you linked just a different manifestation. When ignore terminates because it's read the number of characters requested, it always attempts to reads one more character, because it needs to know if condition 2 might also be true (it happened to read the last character so it can take the appropriate action, putting cin in EOF state, or leaving the next character in the buffer for the next read otherwise):
Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:
n != numeric_limits::max() (18.3.2) and n characters have been extracted so far
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).
Since you didn't provide an end character for ignore, it's looking for EOF, and if it doesn't find it after two characters, it must read one more to see if it shows up after the ignored characters (if it does, it'll leave cin in EOF state, if not, the character it peeked at will be the next one you read).
Simplest solution here is to not try to specifically discard exactly two characters. You want to get rid of everything through the newline, so do that with:
std::cin.ignore(std::numeric_limits<std::stringsize>::max(), '\n');
instead of std::cin.ignore(2);; that will read any and all characters until the newline (or EOF), consume the newline, and it won't ever overread (in the sense that it continues forever until the delimiter or EOF is found, there is no condition under which it finishes reading a count of characters and needs to peek further).
If for some reason you want to specifically ignore exactly two characters (how do you know they entered p<Enter> and not pabc<Enter>?), just call .get() on it a couple times or .read(&two_byte_buffer, 2) or the like, so you read the raw characters without the possibility of trying to peek beyond them.
For the record, this seems a little from the cppreference spec (which may be wrong); condition 2 in the spec doesn't specify it needs to verify if it is at EOF after reading count characters, and cppreference claims condition 3 (which would need to peek) is explicitly not checked if the "delimiter" is the default Traits::eof(). But the spec quote found in your other answer doesn't include that line about condition 3 not applying for Traits::eof(), and condition 2 might allow for checking if you're at EOF, which would end up with the observed behavior.
Your problem is related to your terminal. When you press ENTER, you are most likely getting two characters -- '\r' and '\n'. Consequently, there is still one character left in the input stream to read from. Change that line to:
std::cin.ignore(10, '\n'); // 10 is not magical. You may use any number > 2
to see the behavior you are expecting.
Passing exact number of characters in buffer will do the trick:
std::cin.ignore(std::cin.rdbuf()->in_avail());

The end character `\0` is it considered as one character or two characters?

As I was trying to understand more the behavior of some functions I took two examples :
char str[]="Hello\0World"
and
char str[100];
scanf("%s",str);// enter the same string "Hello\0world"
The problem here that in the first example I got Hello and in the second I got Hello\0world
why are the two characters \ and 0 interepreted as an end character of string in the first and not in the second one ?
\0 is an escape sequence, and while it consists of two characters in the source file, it is interpreted as a single character in the string, namely the null character. However, this special interpretation only happens in the source file; if you input \0 when you run the program, it gets interpreted literally as the two characters \ and 0.
Because when you enter "Hello\0World" for the scanf then you don't enter the same.
When you use \0 in the code that means a character with ascii code 0. Whereas when you're prompted these escape sequences are not interpreted. You actually entering a backslash and a zero character.
So your input will be equal to "Hello\0World" not Hello\0 World
\0 is one character. In first case it is written in code so treated as special sign, like new line (\n) or similar, so printing function stops here. In second case you are literally inputting two separate characters in one byte you have \ and in second 0 so it is not threaten as termination sign. If you want to place "real" \0 from keyboard look here How to simulate NUL character from keyboard?
This is a really good question! My assumption is that when you declare the string on line 3, as expected, the compiler is able to identify that there is an escape character and allocate the \0 appropriately. Next when you are reading from scanf, because your code is already compiled, the program has no way of knowing what characters will be entered and therefore doesn't treat the \ as an escape character and just treats it as a slash.
Here is a sample code I wrote based off of yours to try and solve this problem:
#include<stdio.h>
int main(){
char str1[]="Hello\0World";
printf("str1 = %s\n",str1);
char str2[100];
scanf("%s",str2);
printf("str2 = %s",str2);
}
In your first statement, char str[]="Hello\0World" '\0' is a NULL terminated. Strings consider '\0' as the end of the string.
In your second statement, scanf() takes the input until space occurs.
Overall '\0' is just a single character. '\' is used for ignorance
In the first case:
char str[]="Hello\0World"
the compiler has interpreted the string and place a null terminator (the \0) in the string.
In the second case:
char str[100];
scanf("%s",str);// enter the same string "Hello\0world"
You've entered the string, and there's been do compiler interpretation. It's the same as if you'd entered:
char str[]="Hello\\\0World"

How does `cin>>` take three values though they are separated by space?

When looking at some code online I found
cin>>arr[0][0]>>arr[0][1]>>arr[0][2]
where I put a line of three integer values separated by space. I see that those three integers separated by space become the value of arr[0][0], arr[0][1] and arr[0][2].
It doesn't cause any trouble if there are more than one space between them.
plz, can anyone explain me how this work?
Most overloads of operator>> consume and discard all whitespace characters first thing. They begin parsing the actual value (say, an int) starting from the first non-whitespace character in the stream.
Reading almost any types of inputs from a stream will skip any leading whitespaces first, unless you explicitly turn that feature off. You should read std::basic_istream documentation for more information:
Extracts an integer value potentially skipping preceding whitespace. The value is stored to a given reference value.
This function behaves as a FormattedInputFunction. After constructing and checking the sentry object, which may skip leading whitespace, extracts an integer value by calling std::num_get::get().
The same applies to other stream input functions, including the scanf family where most format specifiers will consume all whitespace characters before reading the value:
All conversion specifiers other than [, c, and n consume and discard all leading whitespace characters (determined as if by calling isspace) before attempting to parse the input. These consumed characters do not count towards the specified maximum field width.

scanf() curious behaviour!

I recently stumbled upon a curious case(atleast for me, since I hadn't encountered this before)..Consider the simple code below:-
int x;
scanf("%d",&x);
printf("%d",x);
The above code takes a normal integer input and displays the result as expected..
Now, if I modify the above code to the following:-
int x;
scanf("%d ",&x);//notice the extra space after %d
printf("%d",x);
This takes in another additional input before it gives the result of the printf statement.. I don't understand why a space results in change of behaviour of the scanf().. Can anyone explain this to me....
From http://beej.us/guide/bgc/output/html/multipage/scanf.html:
The scanf() family of functions reads data from the console or from a FILE stream, parses it, and stores the results away in variables you provide in the argument list.
The format string is very similar to that in printf() in that you can tell it to read a "%d", for instance for an int. But it also has additional capabilities, most notably that it can eat up other characters in the input that you specify in the format string.
What's happening is scanf is pattern matching the format string (kind of like a regular expression). scanf keeps consumes text from standard input (e.g. the console) until the entire pattern is matched.
In your second example, scanf reads in a number and stores it in x. But it has not yet reached the end of the format string -- there is still a space character left. So scanf reads additional whitespace character(s) from standard input in order to (try to) match it.
From the man page:
The format string consists of a sequence of directives which describe
how to process the sequence of input characters. If processing of a
directive fails, no further input is read, and scanf() returns. A
"failure" can be either of the following: input failure, meaning that
input characters were unavailable, or matching failure, meaning that
the input was inappropriate (see below).
A directive is one of the following:
? A sequence of white-space characters (space, tab, newline, etc;
see isspace(3)). This directive matches any amount of white
space, including none, in the input.
man scanf
[...]
A sequence of white-space characters (space, tab, newline, etc.;
see isspace(3)). This directive matches any amount of white
space, including none, in the input.

istream get method behavior

I read istream::get and a doubt still hangs. Let's say my delimiter is actually the NULL '\0' character, what happens in this case? From what I read:
If the delimiting character is found, it is not extracted from the input sequence and remains as the next character to be extracted. Use getline if you want this character to be extracted (and discarded). The ending null character that signals the end of a c-string is automatically appended at the end of the content stored in s.
The reason I would prefer "get" over "readline" is because of the capability to extract the character stream into a "streambuf".
I dont' quite get your problem.
On the msdn website, for the get function, it says:
In all cases, the delimiter is neither extracted from the stream nor returned by the function. The getline function, in contrast, extracts but does not store the delimiter.
In all cases, the delimiter is neither extracted from the stream nor returned by the function. The getline function, in contrast, extracts but does not store the delimiter.
http://msdn.microsoft.com/en-us/library/aa277360(VS.60).aspx
I don't think your going to have a problem, since the msdn site tells that the delimiter is neither extracted from the stream, nor returned vy the function.
Or maybe I'm missing a point here?
If you have something like this, then delimiter will not get stuck in the input stream:
std::string read_str(std::istream & in)
{
const int size = 1024;
char pBuffer[size];
in.getline(pBuffer, size, '\0');
return std::string(pBuffer);
}
just an example if you have '\0' as delimiter and strings are not bigger than 1024 bytes.