I am trying to understand what is going on with the code:
cout << '5' - '3';
Is what I am printing an int? Why does it automatically change them to ints when I use the subtraction operator?
In C++ character literals just denote integer values.
A basic literal like '5' denotes a char integer value, which with almost all extant character encodings is 48 + 5 (because the character 0 is represented as value 48, and the C++ standard guarantees that the digit values are consecutive, although there's no such guarantee for letters).
Then, when you use them in an arithmetic expression, or even just write +'5', the char values are promoted to int. Or less imprecisely, the “usual arithmetic conversions” kick in, and convert up to the nearest type that is int or *higher that can represent all char values. This change of type affects how e.g. cout will present the value.
* Since a char is a single byte by definition, and since int can't be less than one byte, and since in practice all bits of an int are value representation bits, it's at best only in the most pedantic formal that a char can be converted up to a higher type than int. If that possibility exists in the formal, then it's pure language lawyer stuff.
What you're doing here is subtracting the value for ASCII character '5' from the value for ASCII character '3'. So '5' - '3' is equivalent to 53 - 51 which results in 2.
The ASCII value of character is here
Every character in C programming is given an integer value to represent it. That integer value is known as ASCII value of that character. For example: ASCII value of 'a' is 97. For example: If you try to store character 'a' in a char type variable, ASCII value of that character is stored which is 97.
Subtracting between '5' and '3' means subtracting between their ASCII value. So, replace cout << '5' - '3'; with their ASCII value cout << 53 - 51;. Because Every character in C programming is given an integer value to represent it.
There is a subtraction operation between two integer number, so, it prints a integer 2
Related
When we compare numbers in a string/character format, how does the c++ compiler interpret it? The example below will make it clear.
#include <iostream>
using namespace std;
int main() {
// your code goes here
if ('1'<'2')
cout<<"true";
return 0;
}
The output is
true
What is happening inside the compiler? Is there an implicit conversion happening from string to integer just like when we refer an index in an array using a character,
arr['a']
=> arr[97]
'1' is a char type in C++ with an implementation defined value - although the ASCII value of the character 1 is common, and it cannot be negative.
The expression arr['a'] is defined as per pointer arithmetic: *(arr + 'a'). If this is outside the bounds of the array then the behaviour of the program is undefined.
Note that '1' < '2' is true on any platform. The same cannot be said for 'a' < 'b' always being true although I've never come across a platform where it is not true. That said, in ASCII 'A' is less than 'a', but in EBCDIC (in all variants) 'A' is greater than 'a'!
The behaviour of an expression like "ab" < "cd" is unspecified. This is because both const char[3] constants decay to const char* types, and the behaviour of comparing two pointers that do not point to objects in the same array is unspecified.
(A final note: in C '1', '2', and 'a' are all int types.)
The operands '1' and '2' are not strings, they're char literals.
The characters represent specific numbers of type char, typically defined by the ASCII table, specifically 49 for '1' and 50 for '2'.
The operator < compares those numbers, and since the number representation of '1' is lesser than that of '2', the result of '1'<'2' is true.
I have recently tried to convert an element of a string (built from digits only) to int and in order to avoid bizzare results (like 49 instead of 1) I had to use 'stringstream' (not even knowing what it is and how it works) instead of int variable = static_cast<int>(my_string_name[digit_position]);.
According to what I've read about these 'streams' the type is irrelevant when you use them so I guess that was the case. The question is: What type did I actually want to convert FROM and why didn't it work? I thought that a string element is a char but apparently not if the conversion didn't work.
I thought that a string element is a char but apparently not if the conversion didn't work.
Yes, it is a char. However, the value of this char is not how it is rendered on the screen. This actually depends on the encoding of the string. For example, if your string is encoded in ASCII, you can see that the "bizarre" 49 value that you got must have been represented as a '1' character. Building on that, you can subtract the ASCII code of the character '0' to get the numeric values of these characters. However, be very careful: this greatly depends on your encoding. It is possible that your string may use multi-byte characters when you can't even index them this naively.
I believe that what you are looking for is not a cast, but the std::stoi (http://en.cppreference.com/w/cpp/string/basic_string/stol) function, which converts a string representation of an integer to int.
char values are already numeric, such a static_cast won't help you much here. What you actually need is:
int variable = my_string_name[digit_position] - '0';
For ASCII values the digits '0' - '9' are encoded as decimal numbers from 49 - 59. So to get the numerical value that is represented by the digit we need to substract the 1st digits value of 49.
To make the code portable for other character encoding tables (e.g. EBCDIC) the '0' is substracted.
Assume you have a string: std::string myString("1");. You could access the first element which is type char and you convert it to type int via static_cast. However, your expectation of what will happen is incorrect.
Upon looking at the ASCII table for number 1 you will see it has a base 10 integer value of 49, which is why the cast gives the output you're seeing. Instead you could use a conversion function like atoi (e.g., atoi(&myString[0]);).
Your "bizzare results" aren't actually that bizzare. When accessing a single character of a std::string like so : my_string_name[digit_position] you're getting a char. Putting that char in an int will convert '1' to an int. Looking at http://www.asciitable.com/ we can see that 49 is the representation for '1'. To get the actual number you should do int number = my_string_name[digit_position] - '0';
You are working with the ASCII char code for a number, not an actual number. However because the ASCII char codes are laid out in order:
'0'
'1'
'2'
'3'
'4'
'5'
'6'
'7'
'8'
'9'
So if you are converting a single character and you're character is in base-10 then you can use:
int variable = my_string_name[digit_position] - '0'
But this depends upon the use of character codes which are in sequence. Specifically, if there were any other characters interspersed here this wouldn't work at all (I don't know any character code mapping that doesn't have these listed sequentially.)
But to guarantee the conversion you should use stoi:
int variable = stoi(my_string_name.substr(digit_position, 1))
I am trying to execute the below program.
#include "stdio.h"
#include "string.h"
void main()
{
char c='\08';
printf("%d",c);
}
I'm getting the output as 56 . But for any numbers other than 8 , the output is the number itself , but for 8 the answer is 56.
Can somebody explain ?
A characters that begins with \0 represents Octal number, is the base-8 number system, and uses the digits 0 to 7. So \08 is invalid representation of octal number because 8 ∉ [0, 7], hence you're getting implementation-defined behavior.
Probably your compiler recognize a Multibyte Character '\08' as '\0' one character and '8' as another and interprets as '\08' as '\0' + '8' which makes it '8'. After looking at the ASCII table, you'll note that the decimal value of '8' is 56.
Thanks to #DarkDust, #GrijeshChauhan and #EricPostpischil.
The value '\08' is considered to be a multi-character constant, consisting of \0 (which evaluates to the number 0) and the ASCII character 8 (which evaluates to decimal 56). How it's interpreted is implementation defined. The C99 standard says:
An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer. The
value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that results
when an object with type char whose value is that of the single
character or escape sequence is converted to type int.
So if you would assign '\08' to something bigger than a char, like int or long, it would even be valid. But since you assign it to a char you're "chopping off" some part. Which part is probably also implementation/machine dependent. In your case it happens to gives you value of the 8 (the ASCII character which evaluates to the number 56).
Both GCC and Clang do warn about this problem with "warning: multi-character character constant".
\0 is used to represent octal numbers in C/C++. Octal base numbers are from 0->7 so \08 is a multi-character constant, consisting of \0, the compiler interprets \08 as \0 + 8, which makes it '8' whose ascii value is 56 . Thats why you are getting 56 as output.
As other answers have said, these kind of numbers represent octal characters (base 8). This means that you have to write '\010' for 8, '\011' for 9, etc.
There are other ways to write your assign:
char c = 8;
char c = '\x8'; // hexadecimal (base 16) numbers
What's the point of negative ASCII values?
int a = '«'; //a = -85 but as in ASCII table '<<' should be 174
There are no negative ASCII values. ASCII includes definitions for 128 characters. Their indexes are all positive (or zero!).
You're seeing this negative value because the character is from an Extended ASCII set and is too large to fit into the char literal. The value therefore overflows into the bit of your char (signed on your system, apparently) that defines negativeness.
The workaround is to write the value directly:
unsigned char a = 0xAE; // «
I've written it in hexadecimal notation for convention and because I think it looks prettier than 174. :)
This is an artefact of your compiler's char type being a signed integer type, and int being a wider signed integer type, and thus the character constant is considered a negative number and is sign-extended to the wider integer type.
There is not much sense in it, it just happens. The C standard allows for compiler implementations to choose whether they consider char to be signed or unsigned. Some compilers even have compile time switches to change the default. If you want to make sure about the signedness of the char type, explicitly write signed char or unsigned char, respectively.
Use an unsigned char to be extended to an int to avoid the negative int value, or open a whole new Pandora's box and enjoy wchar.
There is no such thing. ASCII is a table of characters, each character has an index, or a position, in the table. There are no "negative" indices.
Some compilers, though, consider char to be a signed integral data type, which is probably the reason for the confusion here.
If you print it as unsigned int, you will get the same bits interpreted as a unsigned (positive) value.
ASCII ranges 0..127, ANSI (also called 'extended ASCII') ranges 0..255.
ANSI range won't fit in a signed char (the default type for characters in most compilers).
Most compilers have an option like 'char' Type is Unsigned (GCC).
I had this artifact. When you use char as symbols you have no problem. But when you use it as integer (with isalpha(), etc.) and the ASCII code is greater then 127, then the 'char' interpret as 'signed char' and isalpha() return an exception. When I need use the 'char' as integer I cast the 'char' to unsigned:
isalpha((unsigned char)my_char);
#n0rd: koi8 codepage uses ascii from 128 to 255 and other national codepages: http://www.asciitable.com/
In a character representation, you have 8 bits (1 byte) allotted.
Out of this, the first bit is used to represent sign. In the case of unsigned character, it uses all 8 bits to represent a number allowing 0 to 255 where
128-255 are called extended ASCII.
Due to the representation in the memory as I have described, we have -1 having the same value as 255, char(-2)==char(254)
C++ Standard §3.9.1 Fundamental types
Objects declared as characters (char)
shall be large enough to store any
member of the implementation’s basic
character set. If a character from
this set is stored in a character
object, the integral value of that
character object is equal to the value
of the single character literal form
of that character. It is
implementation-defined whether a char
object can hold negative values.
Characters can be explicitly declared
unsigned or signed. Plain char, signed
char, and unsigned char are three
distinct types.<...>
I could not make sense of unsigned char.
A number may be +1 or -1.
I can not think -A and +A in similar manner.
What is the Historical reason of introducing unsigned char.
A char is actually an integral type. It is just that the type is also used to represent a character too. Since it is an integral type, it is valid to talk about signedness.
(I don't know exactly about the historical reason. Probably to save a keyword for byte by conflating it with char.)
In C (and thus C++), char does not mean character. It means a byte (int_least8_t). This is a historical legacy from the pre-Unicode days when a characters could actually fit in a char, but is now a flaw in the language.
Since char is really a small integer, having signed char and unsigned char makes sense. There are actually three distinct char types: char, signed char, and unsigned char. A common convention is that unsigned char represents bytes while plain char represents characters UTF-8 code units.
Computers do not "understand" the concept of alphabets or characters; they only work on numbers. So a bunch of people got together and agreed on what number maps to what letter. The most common one in use is ASCII (although the language does not guarantee that).
In ASCII, the letter A has the code 65. In environments using ASCII, the letter A would be represented by the number 65.
The char datatype also serves as an integral type - meaning that it can hold just numbers, so unsigned and signed was allowed. On most platforms I've seen, char is a single 8-bit byte.
You're reading too much in to it. A character is a small integral type that can hold a character. End of story. Unsigned char was never introduced or intended, it's just how it is, because char is an integral type identical to int or long or short, it's just the size that's different. The fact is that there's little reason to use unsigned char, but people do if they want one-byte unsigned integral storage.
If you want a small memory foot print and want to store a number than signed and unsigned char are usefull.
unsigned char is needed if you want to use a value between 128-255
unsigned char score = 232;
signed char is usfull if you want to store the difference between two characters.
signed char diff = 'D' - 'A';
char is distinct from the other two because you can not assume it is either.
You can use the the overflow from 255 to 0? (I don't know. Just a guess)
Maybe it is not only about characters but also about numbers between -128 and 127, and 0 to 255.
Think of the ASCII character set.
Historically, all characters used for text in computing were defined by the ASCII character set. Each character was represented by an 8 bit byte, which was unsigned, hence each character had a value in the range of 0 - 255.
The word character was reduced to char for coding.
An 8 bit char used the same memory as an 8 bit byte and as such they were interchangeable as far as a compiler was concerned.
The compiler directive unsigned (all numbers were signed by default as twos compliment is used to represent negative numbers in binary) when applied to a byte or a char forced them to have a value in the range 0-255.
If unsigned then then had a value of -128 - +127.
Nowadays with the advent of UNICODE and multiple byte character sets this relationship between byte and char no longer exists.