Unreported error VS 2015: Hex char specifier [duplicate] - c++

This question already has answers here:
Multi-character constant warnings
(6 answers)
What do single quotes do in C++ when used on multiple characters?
(5 answers)
Closed 3 years ago.
I wanted this: char c = '\x20' ;
But by mistake I typed this: char c = 'x20';
The VS2015 compiler reported a warning 'converting integer to char', there was no error, the code ran but the value of c was 48 (decimal). Can anyone explain how the erroneous format conversion works, assuming it is a valid form (I didn't think it was). Or is this maybe an error that VS15 doesn't recognise?

'x20' is a multicharacter literal. Per [lex.ccon]/2:
A character literal that does not begin with u8, u, U, or L is
an ordinary character literal. An ordinary character literal that
contains a single c-char representable in the execution character
set has type char, with value equal to the numerical value of the
encoding of the c-char in the execution character set.
An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal, or an
ordinary character literal containing a single c-char not
representable in the execution character set, is
conditionally-supported, has type int, and has an
implementation-defined value.
Therefore, from a standard perspective, your implementation supports this conditionally-supported construct, and you get an implementation-defined value of type int which, when converted to type char, results in char(48).
Per Microsoft Visual Studio C++ Documentation:
Microsoft Specific
Multiple characters in the literal fill corresponding bytes as needed
from high-order to low-order. To create a char value, the compiler
takes the low-order byte. To create a wchar_t or char16_t value,
the compiler takes the low-order word. The compiler warns that the
result is truncated if any bits are set above the assigned byte or
word.
char c0 = 'abcd'; // C4305, C4309, truncates to 'd'
wchar_t w0 = 'abcd'; // C4305, C4309, truncates to '\x6364'
In your case, you use 'x20'. The compiler takes the low-order byte — '0', which is char(48) under ASCII encoding.

Related

How to mix hexadecimal char and normal char in string literal in C++? [duplicate]

This question already has answers here:
How to properly add hex escapes into a string-literal?
(3 answers)
Limit the Length of a Hexadecimal Escape Sequence in a C-String [duplicate]
(1 answer)
Closed 4 years ago.
Is it possible to mix '\xfd' and 'a' in a single string literal?
For example:
unsigned char buff1[] = "\xfda";
unsigned char buff1[] = "\x0f\x0015899999999";
VC++2015 reports:
Error C2022 '-1717986919': too big for character
As mentioned by the other answer '\xfda' is considered as a single hex character literal. To get a string literal with '\xfd' and 'a' you need to split the string.
"\xfd" "a"
Adjacent string literal tokens are concatenated, which means that for example "ab" "cd" is the same as "abcd".
You will not be able to do so using a hex character literal in a single string. [lex.ccon]/8 states
The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character. The escape \xhhh consists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character. There is no limit to the number of digits in a hexadecimal sequence. A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for character literals with no prefix) or wchar_­t (for character literals prefixed by L). [ Note: If the value of a character literal prefixed by u, u8, or U is outside the range defined for its type, the program is ill-formed. — end note ]
emphasis mine
This means '\xfda' is considered a single hex character literal since all of its digits are valid hex digits. What you can do is use multiple string literals that will be concatenated for you to break it up like
unsigned char buff1[] = "\xfd" "a";
Another option would be to switch to using an octal literal if you want 'a' to be part of the string. That would be "\375a".
Not possible, as explained well in NathanOliver's answer. But there is also no need you can simply use two literals:
unsigned char buff1[] = "\x0f\x00""15899999999";

How compiler identifies the ASCII code of multiple characters

int var;
var=' '; // this is a single space
cout << var; // prints 32
var = ' '; // double space
cout << var; // prints 8224. Why?
How the compiler calculates this (8224) for two spaces?
This happens with every multi-character literal.
This is what C++ standard N3690 mentions about multicharacter literals:
An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character
set, is conditionally-supported, has type int, and has an implementation-defined value.
So the answer is that the corresponding int value is implementation-specific.
While for single-char literal:
An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.
a char in c++ is a byte(with possible int values from 0 to 255).
So in your case when you cast the two white-spaces ' ' to an int, behind the scenes it's just a base-256 conversion. To be more precise:
the white-space ' ' has an ASCII of 32.
So, two white-spaces will be cast to an int of: 32 + 256*32 = 8224.
EDIT
this is how your two characters are represented in memory, where one char-block is a byte, which can have values ranging in 0-255:
|char| char|.
when you cast this two blocks to an int, you make a base-256 conversion, i.e. the ASCII of the right char block, which is 32 we multiply by 256^0. Then the ASCII of the next char block, i.e. 32 we multiply by 256^1.
Step 2. is implementation dependent as #saurav-sahu mentions, e.g. if it's big endian or little endian.
I tried to give you an intuition of what goes behind the system, but as pete_becker has correctly pointed to, it's highly implementation specific, e.g. the char type can be interpreted as a signed or unsigned value and so on.

char val = 'abcd'. Using multi character char

I have a confusion of how the compiler handles a char variable with multiple characters. I understand that a char is 1 byte and it can contain one character like ASCII.
But when I try:
char _val = 'ab';
char _val = 'abc';
char _val = 'abcd';
They compiles fine and when I print _val it always prints the last character. But when I did
char _val = 'abcde';
Then I got a compiler error:
Error 1 error C2015: too many characters in constant
So my questions are:
Why does the compiler always takes the last character when multiple characters are used? What is the compiler mechanism in this situation.
Why did I get a too many characters error when I put 5 characters. 2 characters is more than what a char can handle so why 5?
I am using Visual Studio 2013.
Thank you.
[lex.ccon]/1:
An ordinary character literal that contains more than one c-char is a
multicharacter literal. A multicharacter literal [..] is conditionally-supported, has type int, and
has an implementation-defined value.
Why does the compiler always takes the last character when multiple
characters are used? What is the compiler mechanism in this situation.
Most compilers just shift the character values together in order: That way the last character occupies the least significant byte, the penultimate character occupies the byte next to the least significant one, and so forth.
I.e. 'abc' would be equivalent to 'c' + ((int)'b')<<8) + (((int)'a')<<16) (Demo).
Converting this int back to a char will have an implementation defined value - that might just emerge from taking the value of the int modulo 256. That would simply give you the last character.
Why did I get a too many characters error when I put 5 characters. 2
characters is more than what a char can handle so why 5?
Because on your machine an int is probably four bytes large. If the above is indeed the way your compiler arranges multicharacter constants in, he cannot put five char values into an int.

What is the difference of the U prefix for a character literal vs. a string literal?

In The C++ Programming Language C++ 4th edition, section 6.2.6, it says:
Combinations of R, L, and u prefixes are allowed, for example, uR"**(foo\(bar))**". Note the dramatic difference in the meaning of a U prefix for a character (unsigned) and for a string UTF-32 encoding (§7.3.2.2).
I don't quite understand what the author is trying to say here. What is the "dramatic difference" indeed? Why is the word "(unsigned)" used here?
Per my understanding, a U-prefixed character literal contains the ISO-10646 code point value of the quoted character, which is basically of the same idea as the U prefix of a string literal, and has nothing to do with the concept of "unsigned".
unsigned is a C++ keyword and means that the integer type that (in most cases) follows in a declaration has only positive values.
For reference look here:
http://en.cppreference.com/w/cpp/language/types
Now for char and char[] you have:
char16_t c = u'\u00F6';
char32_t d = U'\U0010FFFF';
char16_t C[] = u"Hell\u00F6";
char32_t D[] = U"Hell\U000000F6\U0010FFFF";
For further reference to string literals:
Unicode encoding for string literals in C++11
So indeed there is some difference between u and U and unsigned but I wouldn't consider it dramatic.

Printing char by integer qualifier

I am trying to execute the below program.
#‎include‬ "stdio.h"
#include "string.h"
void main()
{
char c='\08';
printf("%d",c);
}
I'm getting the output as 56 . But for any numbers other than 8 , the output is the number itself , but for 8 the answer is 56.
Can somebody explain ?
A characters that begins with \0 represents Octal number, is the base-8 number system, and uses the digits 0 to 7. So \08 is invalid representation of octal number because 8 ∉ [0, 7], hence you're getting implementation-defined behavior.
Probably your compiler recognize a Multibyte Character '\08' as '\0' one character and '8' as another and interprets as '\08' as '\0' + '8' which makes it '8'. After looking at the ASCII table, you'll note that the decimal value of '8' is 56.
Thanks to #DarkDust, #GrijeshChauhan and #EricPostpischil.
The value '\08' is considered to be a multi-character constant, consisting of \0 (which evaluates to the number 0) and the ASCII character 8 (which evaluates to decimal 56). How it's interpreted is implementation defined. The C99 standard says:
An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer. The
value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that results
when an object with type char whose value is that of the single
character or escape sequence is converted to type int.
So if you would assign '\08' to something bigger than a char, like int or long, it would even be valid. But since you assign it to a char you're "chopping off" some part. Which part is probably also implementation/machine dependent. In your case it happens to gives you value of the 8 (the ASCII character which evaluates to the number 56).
Both GCC and Clang do warn about this problem with "warning: multi-character character constant".
\0 is used to represent octal numbers in C/C++. Octal base numbers are from 0->7 so \08 is a multi-character constant, consisting of \0, the compiler interprets \08 as \0 + 8, which makes it '8' whose ascii value is 56 . Thats why you are getting 56 as output.
As other answers have said, these kind of numbers represent octal characters (base 8). This means that you have to write '\010' for 8, '\011' for 9, etc.
There are other ways to write your assign:
char c = 8;
char c = '\x8'; // hexadecimal (base 16) numbers