I know multi-character character constant stated as int. and I know that the value of it is compiler dependent. but my question is when I store a multi-character character constant in a char variable it will behave in a different way.
#include <iostream>
int main() {
std::cout << 'asb';
return 0;
}
output: 6386530
#include <iostream>
int main() {
char a = 'asb';
std::cout << a;
return 0;
}
output: b
Case 1 :
You are getting 'a'*256²+'s'*256+'b' = 6386530
because 'a' = 97, 's' = 115, 'b' = 98
cf. Ascii table
'asb' is interpreted as an integer.
typeid('asb').name()[0] == 'i' && sizeof('asd') == 4;
An integer is 32 bits, and you can store 'asb' (24bits) in an integer.
That's why std::cout interprets it as an integer and display 6386530
Note that also:
typeid('xxxxabcd').name()[0] == 'i' && sizeof('xxxxabcd') == 4;
but 'xxxxabcd' is represented by 64-bits, so 32-bits are lost.
std::cout << 'xxxxabcd';
std::cout << 'abcd';
would print the same thing.
Case 2 :
'asb' is interpreted as an integer and you cast it into a char (8-bits).
As #BenjaminJones pointed out, only the last 8-bits (98=='b') are saved.
And std::cout interprets it as a char so it displays 'b'.
Anyway, both case provokes compilation warning such as :
warning: multi-character character constant [-Wmultichar]
warning: multi-character character constant [-Wmultichar] In function 'int main()'
warning: overflow in implicit constant conversion [-Woverflow]
I guess the behavior depends on the compiler.
You don't store the value of the multi-char constant as is. You convert the value into another value that fits in the range of a char. Since you are also printing the value via entirely different overloads of operator<< (the one for char instead of the one for int), it stands to reason the output would be different on account of that too.
In your second example, the literal is an int that gets converted to a char. Therefore, there are a few implementation specific issues here:
Interpretation of a multi-character literal
Whether char is signed or unsigned
Conversion from an int to an signed char (if char is signed)
It appears that what is happening is that 'asb' is getting interpreted as 6386530, and then truncated according to the rules for conversion from int to an unsigned char. In other words, 6386530 % 256 == 97 == 'b'.
Related
Is this behavior expected or as per standards (used VC compiler)?
Example 1 (signed char):
char s = 'R'
std::cout << s << std::endl; // Prints R.
std::cout << std::format("{}\n", s); // Prints R.
Example 2 (unsigned char):
unsigned char u = 'R';
std::cout << u << std::endl; // Prints R.
std::cout << std::format("{}\n", u); // Prints 82.
In the second example with std::format, u is printed as 82 instead of R, is it a bug or expected behavior?
Without using std::format, if just by std::cout, I get R in both examples.
This is intentional and specified as such in the standard.
Both char and unsigned char are fundamentally numeric types. Normally only char has the additional meaning of representing a character. For example there are no unsigned char string literals. If unsigned char is used, often aliased to std::uint8_t, then it is normally supposed to represent a numeric value (or a raw byte of memory, although std::byte is a better choice for that).
So it makes sense to choose a numeric interpretation for unsigned char and a character interpretation for char by default. In both cases that can be overwritten with {:c} as specifier for a character interpretation and {:d} for a numeric interpretation.
I think operator<<'s behavior is the non-intuitive one, but that has been around for much longer and probably can't be changed.
Also note that signed char is a completely distinct type from both char and unsigned char and that it is implementation-defined whether char is an signed or unsigned integer type (but always distinct from both signed and unsigned char).
If you used signed char it would also be interpreted as numeric by default for the same reason as unsigned char is.
In the second example std::format, its printed as 82 instead of 'R',
Is it an issue or standard?
This is behavior defined by the standard, according to [format.string.std]:
Type
Meaning
...
...
c
Copies the character static_cast<charT>(value) to the output. Throws format_error if value is not in the range of representable values for charT.
d
to_chars(first, last, value).
...
...
none
The same as d. [Note 8: If the formatting argument type is charT or bool, the default is instead c or s, respectively. — end note]
For integer types, if type options are not specified, then d will be the default. Since unsigned char is an integer type, it will be interpreted as an integer, and its value will be the value converted by std::to_chars.
(Except for charT type and bool type, the default type options are c or s)
When we compare numbers in a string/character format, how does the c++ compiler interpret it? The example below will make it clear.
#include <iostream>
using namespace std;
int main() {
// your code goes here
if ('1'<'2')
cout<<"true";
return 0;
}
The output is
true
What is happening inside the compiler? Is there an implicit conversion happening from string to integer just like when we refer an index in an array using a character,
arr['a']
=> arr[97]
'1' is a char type in C++ with an implementation defined value - although the ASCII value of the character 1 is common, and it cannot be negative.
The expression arr['a'] is defined as per pointer arithmetic: *(arr + 'a'). If this is outside the bounds of the array then the behaviour of the program is undefined.
Note that '1' < '2' is true on any platform. The same cannot be said for 'a' < 'b' always being true although I've never come across a platform where it is not true. That said, in ASCII 'A' is less than 'a', but in EBCDIC (in all variants) 'A' is greater than 'a'!
The behaviour of an expression like "ab" < "cd" is unspecified. This is because both const char[3] constants decay to const char* types, and the behaviour of comparing two pointers that do not point to objects in the same array is unspecified.
(A final note: in C '1', '2', and 'a' are all int types.)
The operands '1' and '2' are not strings, they're char literals.
The characters represent specific numbers of type char, typically defined by the ASCII table, specifically 49 for '1' and 50 for '2'.
The operator < compares those numbers, and since the number representation of '1' is lesser than that of '2', the result of '1'<'2' is true.
int main()
{
char MCU = 0b00000000;
char al_av = 0b10100000;
// Before bit operation
cout << "MCU = " << int(MCU) << endl;
MCU = MCU | al_av;
// After the bit operation
cout << "MCU = " << int(MCU) << endl; // Expected 160, got -96
char temp = 160;
cout << temp; // got the a with apostrophe
return 0;
}
I expected the output of char temp to be a negative number (or a warning / error) because 160 exceeds the [-127,127] interval, but instead, the result was the one in the ASCII table (a with apostrophe)
On cpp reference:
char - type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type)
I don't understand what is written in italic (also I'm not sure it helps a lot for this question). Is there any implicit conversion ?
Why signed char can hold bigger values than 127?
It cannot.
char x = 231;
here, there is an (implicit) integer conversion: 231 is a prvalue of type int and takes value -25 before it is converted to char (which is signed on your system). You can ask your compiler to warn you about it with -Wconstant-conversion.
char - type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type)
I don't understand what is written in italic
This isn't related to what the type can hold, it only ensures that the three types char, signed char and unsigned char have common properties.
From C++14 char, if signed, must be a 2's complement type. That means that it has the range of at least -128 to +127. It's important to know that the range could be larger than this so it's incorrect to assume that a number greater than 127 cannot be stored in a char if signed. Use
std::numeric_limits<char>::max()
to get the real upper limit on your platform.
If you do assign a value larger than this to a char and char is signed then the behaviour of your code is implementation defined. Typically that means wrap-around to a negative which is practically universal behaviour for a signed char type.
Note also that ASCII is a 7 bit encoding, so it's wrong to say that any character outside the range 0 - 127 is ASCII. Note also that ASCII is not the only encoding supported by C++. There are others.
Finally, the distinct types: Even if char is signed, it is a different type from signed char. This means that the code
int main() {
char c;
signed char d;
std::swap(c, d);
}
will always result in a compile error.
char temp = 160;
It is actually negative. The point is cout supports non-ASCII characters, so it interprets it as non-negative. cout is probably casting it to unsigned char (or any unsigned integral type) before using it.
If you use printf and tell it to interpret it as an integer you will see that it is a negative value.
printf("%d\n", temp); // prints -96
Help me understand the following:
cout<<'a'; //prints a and it's okay but
cout<<'ab'; //prints 24930 but I was expecting an error due to term 'ab' having two character in single quote
cout<<'a'+1; //prints 98
cout<<"ab"; // prints ab and it's okay but
cout<<"ab"+1; // prints b, why?
cout<<"a"+1; // prints nothing ?
cout<<'a'+'b'; // prints 195 ?
cout<<"a"+"b"; // gives error ?
Please help me to understand all these things in details. I am very confused. I would be very thankful.
'a' is a char type in C++. std::cout overloads << for a char to output the character, rather than the character number.
'ab' is a multicharacter literal in C++. It must have an int type. Its value is implementation defined, but 'a' * 256 + 'b' is common. With ASCII encoding, that is 24930. The overloaded << operator for an int outputs the number.
'a' + 1 is an arithmetic expression. 'a' is converted to an int type prior to the addition according to the standard integral type promotion rules.
"ab" + 1 is executing pointer arithmetic on the const char[3] type, so it's equivalent to "b". Remember that << has lower precedence than +.
"a" + 1 is similar to the above but only the NUL-terminator is output.
'a' + 'b' is an int type. Both arguments are converted to an int prior to the addition.
The arguments of "a" + "b" decay to const char* types prior to the addition. But that's the addition of two pointers, which is not valid C++.
Can someone please guide me as to how these answers are being produced. For ii.) Why are the letters being turned into numbers? For iii.) What is going on here?
Problem 23: Suppose that a C++ program called prog.cpp is compiled and correctly executed on venus with the instructions:
venus> g++ prog.cpp
venus> a.out file1 file2 file3
For each of the following short segments of the program prog.cpp write exactly what output is produced. Each answer should consist of those symbols printed by the given part of the program and nothing else.
(ii)
char a = ’a’;
while (a <= ’f’) {
cout << ’a’ - a;
a = a + 1; }
Answer:
0-1-2-3-4-5
(iii)
int main(int argc, char *argv[]) {
cout << argc;
Answer:
4
a-'a'
returns a number since the ASCII number of the char a, which is 97, is subtracted from the ASCII value of the variable a. So, the difference in the ASCII value is printed as the integer.
The second case, argc prints the number of commandline arguments given running the program.
cout << ’a’ - a;
This is actually a very interesting case. Clearly the values should be 0, then -1, then -2, etc., but if you cout those numbers with char type (e.g. cout << char(0) << char(-1);) you'll get "garbage" (or perhaps nothing) on your terminal - 0 is a non-printable NUL character, and (char)-1, (char)-2 might end up being rendered as some strange graphics character (e.g. blank and a square dot per http://www.goldparser.org/images/chart-set-ibm-pc2.gif)...
The reason they're being printed as readable numbers is that the 'a' - a expression evaluates to an int type - not char - and ints do print in a human-readable numeric format.
'a' is a character literal of type char, encoded using the ASCII value 97. a is a variable but also of type char. And yet, before one can be subtracted from the other they undergo Integral Promotion to type int; from the C++11 Standard:
4.5 Integral promotions [conv.prom]
1 A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer
conversion rank (4.13) is less than the rank of int can be converted
to a prvalue of type int if int can represent all the values of the
source type; otherwise, the source prvalue can be converted to a
prvalue of type unsigned int.
Given 4.13 says char has lower rank than int, this means char can be converted to int if needed, but why is it needed?
The compiler's conceptually providing a int operator-(int, int) function but no char operator-(char, char), and that forces subtraction of two char values to be shoehorned into the int operator after undergoing the Integral Promotion.
(ii)
char a = ’a’;
while (a <= ’f’) {
cout << ’a’ - a;
a = a + 1;
}`
Answer: 0-1-2-3-4-5
Explanation:
This is a loop that would consider the ascii value of 'a' and compare it to 'f'. Thus the loop will progress from 'a' to 'f' and terminate on 'f'.
Inside the loop it will subtract the value of the variable a from ascii value of 'a' thus you see 1-2-3-4-5. To understand this better you can put print statements before each statement
(iii)
int main(int argc, char *argv[]) {
cout << argc;
}
Answer: 4
Explanation: Since the Answer is 4 i presume there is no syntax error and that was just a type in your function (ending brace in the question '}')
argv - The array of pointers to character
argc - This is the count of elements in argv
when you run the executable you pass it some command line parameters these will be stored in argv and the count/number of elements would be stored in argc.
argc is 4 means it was called as myexe.exe param1 param2 param3, where params are 3 parameters along with the myexe.exe making the count to 4.