Printing char by integer qualifier - c++

I am trying to execute the below program.
#‎include‬ "stdio.h"
#include "string.h"
void main()
{
char c='\08';
printf("%d",c);
}
I'm getting the output as 56 . But for any numbers other than 8 , the output is the number itself , but for 8 the answer is 56.
Can somebody explain ?

A characters that begins with \0 represents Octal number, is the base-8 number system, and uses the digits 0 to 7. So \08 is invalid representation of octal number because 8 ∉ [0, 7], hence you're getting implementation-defined behavior.
Probably your compiler recognize a Multibyte Character '\08' as '\0' one character and '8' as another and interprets as '\08' as '\0' + '8' which makes it '8'. After looking at the ASCII table, you'll note that the decimal value of '8' is 56.
Thanks to #DarkDust, #GrijeshChauhan and #EricPostpischil.

The value '\08' is considered to be a multi-character constant, consisting of \0 (which evaluates to the number 0) and the ASCII character 8 (which evaluates to decimal 56). How it's interpreted is implementation defined. The C99 standard says:
An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer. The
value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that results
when an object with type char whose value is that of the single
character or escape sequence is converted to type int.
So if you would assign '\08' to something bigger than a char, like int or long, it would even be valid. But since you assign it to a char you're "chopping off" some part. Which part is probably also implementation/machine dependent. In your case it happens to gives you value of the 8 (the ASCII character which evaluates to the number 56).
Both GCC and Clang do warn about this problem with "warning: multi-character character constant".

\0 is used to represent octal numbers in C/C++. Octal base numbers are from 0->7 so \08 is a multi-character constant, consisting of \0, the compiler interprets \08 as \0 + 8, which makes it '8' whose ascii value is 56 . Thats why you are getting 56 as output.

As other answers have said, these kind of numbers represent octal characters (base 8). This means that you have to write '\010' for 8, '\011' for 9, etc.
There are other ways to write your assign:
char c = 8;
char c = '\x8'; // hexadecimal (base 16) numbers

Related

Unreported error VS 2015: Hex char specifier [duplicate]

This question already has answers here:
Multi-character constant warnings
(6 answers)
What do single quotes do in C++ when used on multiple characters?
(5 answers)
Closed 3 years ago.
I wanted this: char c = '\x20' ;
But by mistake I typed this: char c = 'x20';
The VS2015 compiler reported a warning 'converting integer to char', there was no error, the code ran but the value of c was 48 (decimal). Can anyone explain how the erroneous format conversion works, assuming it is a valid form (I didn't think it was). Or is this maybe an error that VS15 doesn't recognise?
'x20' is a multicharacter literal. Per [lex.ccon]/2:
A character literal that does not begin with u8, u, U, or L is
an ordinary character literal. An ordinary character literal that
contains a single c-char representable in the execution character
set has type char, with value equal to the numerical value of the
encoding of the c-char in the execution character set.
An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal, or an
ordinary character literal containing a single c-char not
representable in the execution character set, is
conditionally-supported, has type int, and has an
implementation-defined value.
Therefore, from a standard perspective, your implementation supports this conditionally-supported construct, and you get an implementation-defined value of type int which, when converted to type char, results in char(48).
Per Microsoft Visual Studio C++ Documentation:
Microsoft Specific
Multiple characters in the literal fill corresponding bytes as needed
from high-order to low-order. To create a char value, the compiler
takes the low-order byte. To create a wchar_t or char16_t value,
the compiler takes the low-order word. The compiler warns that the
result is truncated if any bits are set above the assigned byte or
word.
char c0 = 'abcd'; // C4305, C4309, truncates to 'd'
wchar_t w0 = 'abcd'; // C4305, C4309, truncates to '\x6364'
In your case, you use 'x20'. The compiler takes the low-order byte — '0', which is char(48) under ASCII encoding.

How to mix hexadecimal char and normal char in string literal in C++? [duplicate]

This question already has answers here:
How to properly add hex escapes into a string-literal?
(3 answers)
Limit the Length of a Hexadecimal Escape Sequence in a C-String [duplicate]
(1 answer)
Closed 4 years ago.
Is it possible to mix '\xfd' and 'a' in a single string literal?
For example:
unsigned char buff1[] = "\xfda";
unsigned char buff1[] = "\x0f\x0015899999999";
VC++2015 reports:
Error C2022 '-1717986919': too big for character
As mentioned by the other answer '\xfda' is considered as a single hex character literal. To get a string literal with '\xfd' and 'a' you need to split the string.
"\xfd" "a"
Adjacent string literal tokens are concatenated, which means that for example "ab" "cd" is the same as "abcd".
You will not be able to do so using a hex character literal in a single string. [lex.ccon]/8 states
The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character. The escape \xhhh consists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character. There is no limit to the number of digits in a hexadecimal sequence. A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for character literals with no prefix) or wchar_­t (for character literals prefixed by L). [ Note: If the value of a character literal prefixed by u, u8, or U is outside the range defined for its type, the program is ill-formed. — end note ]
emphasis mine
This means '\xfda' is considered a single hex character literal since all of its digits are valid hex digits. What you can do is use multiple string literals that will be concatenated for you to break it up like
unsigned char buff1[] = "\xfd" "a";
Another option would be to switch to using an octal literal if you want 'a' to be part of the string. That would be "\375a".
Not possible, as explained well in NathanOliver's answer. But there is also no need you can simply use two literals:
unsigned char buff1[] = "\x0f\x00""15899999999";

How compiler identifies the ASCII code of multiple characters

int var;
var=' '; // this is a single space
cout << var; // prints 32
var = ' '; // double space
cout << var; // prints 8224. Why?
How the compiler calculates this (8224) for two spaces?
This happens with every multi-character literal.
This is what C++ standard N3690 mentions about multicharacter literals:
An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character
set, is conditionally-supported, has type int, and has an implementation-defined value.
So the answer is that the corresponding int value is implementation-specific.
While for single-char literal:
An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.
a char in c++ is a byte(with possible int values from 0 to 255).
So in your case when you cast the two white-spaces ' ' to an int, behind the scenes it's just a base-256 conversion. To be more precise:
the white-space ' ' has an ASCII of 32.
So, two white-spaces will be cast to an int of: 32 + 256*32 = 8224.
EDIT
this is how your two characters are represented in memory, where one char-block is a byte, which can have values ranging in 0-255:
|char| char|.
when you cast this two blocks to an int, you make a base-256 conversion, i.e. the ASCII of the right char block, which is 32 we multiply by 256^0. Then the ASCII of the next char block, i.e. 32 we multiply by 256^1.
Step 2. is implementation dependent as #saurav-sahu mentions, e.g. if it's big endian or little endian.
I tried to give you an intuition of what goes behind the system, but as pete_becker has correctly pointed to, it's highly implementation specific, e.g. the char type can be interpreted as a signed or unsigned value and so on.

C++ casting a single element of a string

I have recently tried to convert an element of a string (built from digits only) to int and in order to avoid bizzare results (like 49 instead of 1) I had to use 'stringstream' (not even knowing what it is and how it works) instead of int variable = static_cast<int>(my_string_name[digit_position]);.
According to what I've read about these 'streams' the type is irrelevant when you use them so I guess that was the case. The question is: What type did I actually want to convert FROM and why didn't it work? I thought that a string element is a char but apparently not if the conversion didn't work.
I thought that a string element is a char but apparently not if the conversion didn't work.
Yes, it is a char. However, the value of this char is not how it is rendered on the screen. This actually depends on the encoding of the string. For example, if your string is encoded in ASCII, you can see that the "bizarre" 49 value that you got must have been represented as a '1' character. Building on that, you can subtract the ASCII code of the character '0' to get the numeric values of these characters. However, be very careful: this greatly depends on your encoding. It is possible that your string may use multi-byte characters when you can't even index them this naively.
I believe that what you are looking for is not a cast, but the std::stoi (http://en.cppreference.com/w/cpp/string/basic_string/stol) function, which converts a string representation of an integer to int.
char values are already numeric, such a static_cast won't help you much here. What you actually need is:
int variable = my_string_name[digit_position] - '0';
For ASCII values the digits '0' - '9' are encoded as decimal numbers from 49 - 59. So to get the numerical value that is represented by the digit we need to substract the 1st digits value of 49.
To make the code portable for other character encoding tables (e.g. EBCDIC) the '0' is substracted.
Assume you have a string: std::string myString("1");. You could access the first element which is type char and you convert it to type int via static_cast. However, your expectation of what will happen is incorrect.
Upon looking at the ASCII table for number 1 you will see it has a base 10 integer value of 49, which is why the cast gives the output you're seeing. Instead you could use a conversion function like atoi (e.g., atoi(&myString[0]);).
Your "bizzare results" aren't actually that bizzare. When accessing a single character of a std::string like so : my_string_name[digit_position] you're getting a char. Putting that char in an int will convert '1' to an int. Looking at http://www.asciitable.com/ we can see that 49 is the representation for '1'. To get the actual number you should do int number = my_string_name[digit_position] - '0';
You are working with the ASCII char code for a number, not an actual number. However because the ASCII char codes are laid out in order:
'0'
'1'
'2'
'3'
'4'
'5'
'6'
'7'
'8'
'9'
So if you are converting a single character and you're character is in base-10 then you can use:
int variable = my_string_name[digit_position] - '0'
But this depends upon the use of character codes which are in sequence. Specifically, if there were any other characters interspersed here this wouldn't work at all (I don't know any character code mapping that doesn't have these listed sequentially.)
But to guarantee the conversion you should use stoi:
int variable = stoi(my_string_name.substr(digit_position, 1))

subtracting characters of numbers

I am trying to understand what is going on with the code:
cout << '5' - '3';
Is what I am printing an int? Why does it automatically change them to ints when I use the subtraction operator?
In C++ character literals just denote integer values.
A basic literal like '5' denotes a char integer value, which with almost all extant character encodings is 48 + 5 (because the character 0 is represented as value 48, and the C++ standard guarantees that the digit values are consecutive, although there's no such guarantee for letters).
Then, when you use them in an arithmetic expression, or even just write +'5', the char values are promoted to int. Or less imprecisely, the “usual arithmetic conversions” kick in, and convert up to the nearest type that is int or *higher that can represent all char values. This change of type affects how e.g. cout will present the value.
* Since a char is a single byte by definition, and since int can't be less than one byte, and since in practice all bits of an int are value representation bits, it's at best only in the most pedantic formal that a char can be converted up to a higher type than int. If that possibility exists in the formal, then it's pure language lawyer stuff.
What you're doing here is subtracting the value for ASCII character '5' from the value for ASCII character '3'. So '5' - '3' is equivalent to 53 - 51 which results in 2.
The ASCII value of character is here
Every character in C programming is given an integer value to represent it. That integer value is known as ASCII value of that character. For example: ASCII value of 'a' is 97. For example: If you try to store character 'a' in a char type variable, ASCII value of that character is stored which is 97.
Subtracting between '5' and '3' means subtracting between their ASCII value. So, replace cout << '5' - '3'; with their ASCII value cout << 53 - 51;. Because Every character in C programming is given an integer value to represent it.
There is a subtraction operation between two integer number, so, it prints a integer 2