When looking at some code online I found
cin>>arr[0][0]>>arr[0][1]>>arr[0][2]
where I put a line of three integer values separated by space. I see that those three integers separated by space become the value of arr[0][0], arr[0][1] and arr[0][2].
It doesn't cause any trouble if there are more than one space between them.
plz, can anyone explain me how this work?
Most overloads of operator>> consume and discard all whitespace characters first thing. They begin parsing the actual value (say, an int) starting from the first non-whitespace character in the stream.
Reading almost any types of inputs from a stream will skip any leading whitespaces first, unless you explicitly turn that feature off. You should read std::basic_istream documentation for more information:
Extracts an integer value potentially skipping preceding whitespace. The value is stored to a given reference value.
This function behaves as a FormattedInputFunction. After constructing and checking the sentry object, which may skip leading whitespace, extracts an integer value by calling std::num_get::get().
The same applies to other stream input functions, including the scanf family where most format specifiers will consume all whitespace characters before reading the value:
All conversion specifiers other than [, c, and n consume and discard all leading whitespace characters (determined as if by calling isspace) before attempting to parse the input. These consumed characters do not count towards the specified maximum field width.
Related
I read an integer number :
is >> myInteger;
Now I want to know how many digits were read (I'm talking of the possible leading zeros). How can I do that?
You can:
get the value as a string, then parse it separately, however you wish (check length, count zeros, etc).
use is.tellg for this; Keep in mind that tellg will give you buffer positions, not not what was at those positions (it could be space characters or zeros)
read the buffer character by character using is.get, then process values according to your needs.
You could get the value of is.tellg() before you stream in the integer, then get it again and find the difference.
EDIT: Although as pointed out in the comments that will just tell you how many elements of the stream were consumed, some of which may be whitespace.
How to create regex which match all invalid Base64 characters ?
I found on stack [^a-zA-Z0-9+/=\n\r].*$ but when I try I got in result string with - sign.
I don't know regex at all, can anyone validate that this is good or bad regex ?
The short answer to your question is that if the message contains any match for a character from the class [^A-Za-z0-9+/=\s] then it contains an invalid base-64 character, except for MIME messages which may freely mix other data (for various purposes) together with the base-64 stream. (These other characters are deleted before decoding the base-64 object.)
As someone who was lucky enough to help write the internals of a very fast base 64 encoding program, that processed multi-byte blocks with each machine instruction, let me add a few remarks:
The base-64 alphabet is: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
output must be padded by zero or more = signs as necessary so that the total length of non-whitespace characters is a multiple of four.
Those equals signs can only occur at the end of the base-64 message, and there can be at most two of them.
Whitespace should be ignored regardless of what type. Usually messages are wrapped to a certain margin (which must be a multiple of four), but this is not necessary. The purpose of a base 64 encoding is to transfer arbitrary values, especially binary data, as plain text. You could theoretically even read someone a JPEG image over the phone using base 64 encoding.
My suggestion therefore for validating a base-64 message is to do more than just use a regular expression. Instead,
Eliminate all whitespace and call the length of the resulting output z.
Count the number x of base-64 alphabet characters.
Count the number y of equals sign(s) at the end of the message.
Return valid if y is at most 2 and x + y = z and invalid otherwise.
Note 1: The padding characters == or = do not serve any purpose in protecting the integrity of the data, and there are many derivatives of base-64 encoding which do not use them. Many consider the padding to be almost as useless and wasteful of processing time as the CR portion of the CRLF line-ending sequence.
Note 2: The variant used for MIME encoding accepts characters outside the base-64 alphabet to be contained within the message stream, but simply discards them when decoding the base-64 data object.
Note 3: I dislike the modern term "Base64" since it is an extremely ugly word. This fake word was never used by the original base-64 writers, but was adopted sometime in the next nine years.
You can encode most of this into a regular expression as follows (without the precise length checks on the last block of base-64 data):
^\s*(?:(?:[A-Za-z0-9+/]{4})+\s*)*[A-Za-z0-9+/]*={0,2}\s*$
That should probably be ^[a-zA-Z0-9+/\r\n]+={0,2}$1 instead.
Currently it only matches one valid character then allows anything after it. So, for instance:
aGVsbG8sIHdvcmxkIQ== match
aGV%sb-G8sIHdvcmxkIQ== also a match (starts with "a")
Whereas removing .* at the end, and adding a quantifier to the class, it forces the entire string to be legit:
aGVsbG8sIHdvcmxkIQ== match
aGV%sb-G8sIHdvcmxkIQ== not a match
1 As #p.s.w.g pointed out, a valid base64 shouldn't contain = within the value (since = has special meaning and is used as a filler).
Why these two functions istream::get(char*, streamsize) and istream::get(char*, streamsize, char) set the cin.fail bit when they find '\n' as the first character in the cin buffer?
As can be seen here, that's the behavior of the two overloads mentioned above. I'd like to know what was the purpose in designing these functions this way ? Note that both functions leave the character '\n' in the buffer, but if you call any of them a second time, they will fail because of the newline character, as shown in the link. Wouldn't it make more sense to make these two functions not to leave the character '\n' in the buffer, as the overloads of the function istream::get() and istream::getline() do ?
With std::istream::getline, if the delimiting character is found it is extracted and discarded. With std::istream::get the delimiting character remains in the stream.
With getline you don't know, if the delimiting character was read and discarded or if just n - 1 characters where read. If you only want to read whole lines, you can use get and then peek for the next character and see if it is a newline or the given delimiter.
But if you want to read whole lines up to some delimiter, you might also use std::getline, which reads the complete line in any case.
I recently stumbled upon a curious case(atleast for me, since I hadn't encountered this before)..Consider the simple code below:-
int x;
scanf("%d",&x);
printf("%d",x);
The above code takes a normal integer input and displays the result as expected..
Now, if I modify the above code to the following:-
int x;
scanf("%d ",&x);//notice the extra space after %d
printf("%d",x);
This takes in another additional input before it gives the result of the printf statement.. I don't understand why a space results in change of behaviour of the scanf().. Can anyone explain this to me....
From http://beej.us/guide/bgc/output/html/multipage/scanf.html:
The scanf() family of functions reads data from the console or from a FILE stream, parses it, and stores the results away in variables you provide in the argument list.
The format string is very similar to that in printf() in that you can tell it to read a "%d", for instance for an int. But it also has additional capabilities, most notably that it can eat up other characters in the input that you specify in the format string.
What's happening is scanf is pattern matching the format string (kind of like a regular expression). scanf keeps consumes text from standard input (e.g. the console) until the entire pattern is matched.
In your second example, scanf reads in a number and stores it in x. But it has not yet reached the end of the format string -- there is still a space character left. So scanf reads additional whitespace character(s) from standard input in order to (try to) match it.
From the man page:
The format string consists of a sequence of directives which describe
how to process the sequence of input characters. If processing of a
directive fails, no further input is read, and scanf() returns. A
"failure" can be either of the following: input failure, meaning that
input characters were unavailable, or matching failure, meaning that
the input was inappropriate (see below).
A directive is one of the following:
? A sequence of white-space characters (space, tab, newline, etc;
see isspace(3)). This directive matches any amount of white
space, including none, in the input.
man scanf
[...]
A sequence of white-space characters (space, tab, newline, etc.;
see isspace(3)). This directive matches any amount of white
space, including none, in the input.
When you use ignore() in C++, is there a way to check those values that were ignored? I basically am reading some # of chars and want to know if I ignored normal characters in the text, or if I got the newline character first. Thanks.
I don't believe so - you'd have to "roll your own".
In other words, I think you'd have to write some code that read from the stream using get(), and then add some logic for keeping what you need and ignoring the rest (whilst checking to see what you're ignoring).
If you don't actually want to ignore the characters, don't use ignore() to extract them. get() can do the same job but also stores the extracted characters so that you can inspect them later.
If you provide the optional delim parameter to ignore(), then it can stop at a newline:
streampos old = is.tellg();
is.ignore(num, '\n');
if (is.tellg() != old + num) {
// didn't ignore "num" characters, if not eof or error then we
// must have reached a newline character.
}
There's a snag, though - when ignore() hits the delimiter, it ignores that too. So if you hit the delimiter exactly at the end of your set of ignored characters, then tellg() will return old + num. AFAIK there's no way to tell whether or not the last character ignored was the delimiter. There's also no way to specify a delimiter that isn't a single character.
I also don't know whether and when this is likely to be any faster than just reading num bytes and searching it for newlines. My initial thought was, "which part of the difference between ignore() and read() is non-obvious?" ;-)