How do C++ compiler interpret comparison logic in string/character?

How do C++ compiler interpret comparison logic in string/character? - c++

When we compare numbers in a string/character format, how does the c++ compiler interpret it? The example below will make it clear.
#include <iostream>
using namespace std;
int main() {
// your code goes here
if ('1'<'2')
cout<<"true";
return 0;
}
The output is
true
What is happening inside the compiler? Is there an implicit conversion happening from string to integer just like when we refer an index in an array using a character,
arr['a']
=> arr[97]

'1' is a char type in C++ with an implementation defined value - although the ASCII value of the character 1 is common, and it cannot be negative.
The expression arr['a'] is defined as per pointer arithmetic: *(arr + 'a'). If this is outside the bounds of the array then the behaviour of the program is undefined.
Note that '1' < '2' is true on any platform. The same cannot be said for 'a' < 'b' always being true although I've never come across a platform where it is not true. That said, in ASCII 'A' is less than 'a', but in EBCDIC (in all variants) 'A' is greater than 'a'!
The behaviour of an expression like "ab" < "cd" is unspecified. This is because both const char[3] constants decay to const char* types, and the behaviour of comparing two pointers that do not point to objects in the same array is unspecified.
(A final note: in C '1', '2', and 'a' are all int types.)

The operands '1' and '2' are not strings, they're char literals.
The characters represent specific numbers of type char, typically defined by the ASCII table, specifically 49 for '1' and 50 for '2'.
The operator < compares those numbers, and since the number representation of '1' is lesser than that of '2', the result of '1'<'2' is true.

Related

characters in c and c++

What I ask about is(is my understanding totally true ?)
char x='b'; (x has one value can be represented with two ways one of them is int and the other is char in the two languages ( C and C++))
in C
if I let the compiler choose it will choose (Will give priority to) the int value because the standard of the C makes character literal as int (give priority to int not char) like in this code
int main(){
printf("%d", sizeof('b'));
return 0;}
output is: 4 (int size) that is meaning the compiler treat b as 98 first and get the size it as int
but I can use the char value if I choose like in this code
int main(){
char x = 'b';
printf("%c", x);
return 0;}
output is: b (char)
in C ++
if I let the compiler choose it will choose (Will give priority to) the char value because the standard of the C++ makes character literal as char (give priority to char not int) like in this code
int main(){
printf("%d", sizeof('b'));
return 0;}
output is:1 (char size) that is meaning the compiler treat b as char
but I can use the int value if I choose like in this code
int main(){
char x = 'b';
printf("%d", x);
return 0;}
output is 98 (98 is the int which represents b in the ASCII Code )

Character constants, this: 'A', are of type int in C and of type char in C++. The compiler picks the type specified by the respective language standard.
Declared character variables of type char are always char and 1 byte large (typically 8 bits on non-exotic systems).
printf("%c", some_char); is a variadic function (accepts any number of parameters) and those have special implicit type promotion rules. Something called default argument promotion will integer-promote the passed character variable to int. Read about integer promotion here: Implicit type promotion rules.
printf expects that promotion to happen, so %c will mean that parameter is converted back to char according to the internals of printf. This holds true in C++ as well, though stdio.h should be avoided.
printf("%d", x); In case x is a char this would pedantically be undefined behavior. But in practice the above mentioned integer promotion is likely to occur and so it prints an integer. Also note that there's nothing magic with char as such, they are just 1 byte large integers. So it already has the value 98 before conversion.

Some of the things you've said are true, some are mistaken. Let's go through them in turn.
char x='b'; (x has two values one of them is int and the other is char in the two languages ( C and C++))
Well, no, not really. When you say char x you get a variable generally capable of holding one character, and this is perfectly, equally true in both C and C++. The only difference, as we'll see, is the type (not the value) of the character constant 'b'.
in C if I let the compiler choose
I'm not sure what you mean by "choose", because there's not really any choice in the matter.
it will choose (Will give priority to) the int value like in this code
printf("%d", sizeof('b'));
output is: 4 (int size) that is meaning the compiler change b to 98 first and get the size it as int
Not exactly. You got 4 because, yes, the type of a character constant like 'b' is int. In ASCII, the character constant 'b' has the value 98 no matter what. The compiler didn't change a character to an int here, and the value didn't change from 'b' to 98 here. It was an int all along (because that's the type of character constants in C), and it had the value 98 all along (because that's the value of the letter b in ASCII).
but I can use the char value if I choose like in this code
`printf("%c", x);
Right. But there's nothing magical about that. Consider:
char c1 = 'b';
int i1 = 'b';
char c2 = 98;
int i2 = 98;
printf("%c %c %c %c\n", c1, i1, c2, i2);
printf("%d %d %d %d\n", c1, i1, c2, i2);
This prints
b b b b
98 98 98 98
You can print an int, or a char, using %c, and you'll get the character with that value.
You can print an int, or a char, using %d, and you'll get a numeric value.
(More on this later.)
in C ++ if I let the compiler choose it will choose (Will give priority to) the char value like in this code
printf("%d", sizeof('b'));
output is:1 (char size) that is meaning the compiler did not change b to int
What you're seeing is one of the big differences between C and C++: character constants like 'b' are type int in C, but type char in C++.
(Why is it this way? I'll speculate on that later.)
but I can use the int value if I choose like in this code
char x = 'b';
printf("%d", x);
output is 98 (98 is the int which represents b in the ASCII Code )
Right. And this would work exactly the same in C and C++. (Also my earlier printfs of c1, i1, c2, and i2 would work exactly the same in C and C++.)
So why are the types of character constants different? I'm not sure, but I believe it's like this:
C likes to promote everything to int. (Incidentally, that's why we were able to pass characters straight to printf and print them using %d: the characters all get promoted to int before being passed to printf, so it works just fine.) So there would be no point having character constants of type char, because any time you used one, for anything, it would get promoted to int. So character constants might as well start out being int.
In C++, on the other hand, the type of things matters more. And you might have two overloaded functions, f(int) and f(char), and if you called f('b'), clearly you want the version of f() called that accepts a char. So in C++ there was a reason, a good reason, to have character constants be type char, just like it looks like they are.
Addendum:
The fundamental issue that you're asking about here, that we've been kind of dancing around in this answer and these comments, is that in C (as in most languages) there are several forms of constant, that let you write constants in forms that are convenient and meaningful to you. For any different form of constant you can write, there are several things of interest, but most importantly what is the actual value? and what is the type?.
It may be easier to show this by example. Here is a rather large number of ways of representing the constant value 98 in a C or C++ program:
Form of constant
base
type
value
98
10
int
98
0142
8
unsigned
98
0x62
16
unsigned
98
98.
10
double
98
9.8e1
10
double
98
98.f
10
float
98
9.8e1f
10
float
98
98L
10
long
98
98U
10
unsigned
98
'b'
ASCII
int/char
98
This table is not even complete; there are more ways than these to write constants in C and C++.
But with one exception, every row in this table is equally true of both C and C++, except the last row. In C, the type of a character constant is int. In C++, the type of a character constant is char.
The type of a constant determines what happens when a constant appears in a larger expression. In that respect the type of a constant functions analogously to the type of a variable. If I write
int a = 1, b = 3;
double c = a / b;
it doesn't work right, because the rule in C is that when you divide an int by an int, you get truncating integer division. But the point is that the type of an operand directly determines the meaning of an expression. So the type of a constant becomes very interesting, too, as seen by the different behavior of these two lines:
double c2 = 1 / 3;
double c3 = 1. / 3;
Similarly, it can make a difference whether a constant has type int or type char. An expression that depends on whether a character constant has type int or type char will behave slightly differently in C versus C++. (In practice, pretty much the only difference that can be easily seen concerns sizeof.)
For completeness, it may be useful to look at the several other forms of character constant, in the same framework:
Form of constant
base
type
value
'b'
ASCII
int/char
98
'\142'
8
int/char
98
'\x62'
16
int/char
98

How to understand following single and double quotes terminology in C++?

Help me understand the following:
cout<<'a'; //prints a and it's okay but
cout<<'ab'; //prints 24930 but I was expecting an error due to term 'ab' having two character in single quote
cout<<'a'+1; //prints 98
cout<<"ab"; // prints ab and it's okay but
cout<<"ab"+1; // prints b, why?
cout<<"a"+1; // prints nothing ?
cout<<'a'+'b'; // prints 195 ?
cout<<"a"+"b"; // gives error ?
Please help me to understand all these things in details. I am very confused. I would be very thankful.

'a' is a char type in C++. std::cout overloads << for a char to output the character, rather than the character number.
'ab' is a multicharacter literal in C++. It must have an int type. Its value is implementation defined, but 'a' * 256 + 'b' is common. With ASCII encoding, that is 24930. The overloaded << operator for an int outputs the number.
'a' + 1 is an arithmetic expression. 'a' is converted to an int type prior to the addition according to the standard integral type promotion rules.
"ab" + 1 is executing pointer arithmetic on the const char[3] type, so it's equivalent to "b". Remember that << has lower precedence than +.
"a" + 1 is similar to the above but only the NUL-terminator is output.
'a' + 'b' is an int type. Both arguments are converted to an int prior to the addition.
The arguments of "a" + "b" decay to const char* types prior to the addition. But that's the addition of two pointers, which is not valid C++.

C++ casting a single element of a string

I have recently tried to convert an element of a string (built from digits only) to int and in order to avoid bizzare results (like 49 instead of 1) I had to use 'stringstream' (not even knowing what it is and how it works) instead of int variable = static_cast<int>(my_string_name[digit_position]);.
According to what I've read about these 'streams' the type is irrelevant when you use them so I guess that was the case. The question is: What type did I actually want to convert FROM and why didn't it work? I thought that a string element is a char but apparently not if the conversion didn't work.

I thought that a string element is a char but apparently not if the conversion didn't work.
Yes, it is a char. However, the value of this char is not how it is rendered on the screen. This actually depends on the encoding of the string. For example, if your string is encoded in ASCII, you can see that the "bizarre" 49 value that you got must have been represented as a '1' character. Building on that, you can subtract the ASCII code of the character '0' to get the numeric values of these characters. However, be very careful: this greatly depends on your encoding. It is possible that your string may use multi-byte characters when you can't even index them this naively.

I believe that what you are looking for is not a cast, but the std::stoi (http://en.cppreference.com/w/cpp/string/basic_string/stol) function, which converts a string representation of an integer to int.

char values are already numeric, such a static_cast won't help you much here. What you actually need is:
int variable = my_string_name[digit_position] - '0';
For ASCII values the digits '0' - '9' are encoded as decimal numbers from 49 - 59. So to get the numerical value that is represented by the digit we need to substract the 1st digits value of 49.
To make the code portable for other character encoding tables (e.g. EBCDIC) the '0' is substracted.

Assume you have a string: std::string myString("1");. You could access the first element which is type char and you convert it to type int via static_cast. However, your expectation of what will happen is incorrect.
Upon looking at the ASCII table for number 1 you will see it has a base 10 integer value of 49, which is why the cast gives the output you're seeing. Instead you could use a conversion function like atoi (e.g., atoi(&myString[0]);).

Your "bizzare results" aren't actually that bizzare. When accessing a single character of a std::string like so : my_string_name[digit_position] you're getting a char. Putting that char in an int will convert '1' to an int. Looking at http://www.asciitable.com/ we can see that 49 is the representation for '1'. To get the actual number you should do int number = my_string_name[digit_position] - '0';

You are working with the ASCII char code for a number, not an actual number. However because the ASCII char codes are laid out in order:
'0'
'1'
'2'
'3'
'4'
'5'
'6'
'7'
'8'
'9'
So if you are converting a single character and you're character is in base-10 then you can use:
int variable = my_string_name[digit_position] - '0'
But this depends upon the use of character codes which are in sequence. Specifically, if there were any other characters interspersed here this wouldn't work at all (I don't know any character code mapping that doesn't have these listed sequentially.)
But to guarantee the conversion you should use stoi:
int variable = stoi(my_string_name.substr(digit_position, 1))

subtracting characters of numbers

I am trying to understand what is going on with the code:
cout << '5' - '3';
Is what I am printing an int? Why does it automatically change them to ints when I use the subtraction operator?

In C++ character literals just denote integer values.
A basic literal like '5' denotes a char integer value, which with almost all extant character encodings is 48 + 5 (because the character 0 is represented as value 48, and the C++ standard guarantees that the digit values are consecutive, although there's no such guarantee for letters).
Then, when you use them in an arithmetic expression, or even just write +'5', the char values are promoted to int. Or less imprecisely, the “usual arithmetic conversions” kick in, and convert up to the nearest type that is int or *higher that can represent all char values. This change of type affects how e.g. cout will present the value.
* Since a char is a single byte by definition, and since int can't be less than one byte, and since in practice all bits of an int are value representation bits, it's at best only in the most pedantic formal that a char can be converted up to a higher type than int. If that possibility exists in the formal, then it's pure language lawyer stuff.

What you're doing here is subtracting the value for ASCII character '5' from the value for ASCII character '3'. So '5' - '3' is equivalent to 53 - 51 which results in 2.

The ASCII value of character is here
Every character in C programming is given an integer value to represent it. That integer value is known as ASCII value of that character. For example: ASCII value of 'a' is 97. For example: If you try to store character 'a' in a char type variable, ASCII value of that character is stored which is 97.
Subtracting between '5' and '3' means subtracting between their ASCII value. So, replace cout << '5' - '3'; with their ASCII value cout << 53 - 51;. Because Every character in C programming is given an integer value to represent it.
There is a subtraction operation between two integer number, so, it prints a integer 2

'a' == 'b'. It's a good way to do?

What happens if I compare two characters in this way:
if ('a' == 'b')
doSomething();
I'm really curious to know what the language (and the compiler) does when it finds a comparison like this. And, of course, if it is a correct way to do something, or if I have to use something like strcmp().
EDIT
Wait wait.
Since someone haven't understood what I really mean, I decided to explain in another way.
char x, y;
cout << "Put a character: ";
cin >> x;
cout << "Put another character: ";
cin >> y;
if (x == y)
doSomething();
Of course, in the if brackets you can replace == with any other comparison operator.
What really I want to know is: how the character are considered in C/C++? When the compiler compares two characters, how does it know that 'a' is different than 'b'? It refers to the ASCII table?

you can absolutely securely compare fundamental types by comparison operator ==

In C and C++, single character constants (and char variables) are integer values (in the mathematical sense, not in the sense of int values). The compiler compares them as integers when you use ==. You can also use the other integer comparison operators (<, <=, etc.) You can also add and subtract them. (For instance, a common idiom to change a digit character into its numerical value is c - '0'.)

For single chars, this form is correct. If both operands are known at compile time as in your example, then the condition can (and almost certainly will) be evaluated at compile time and not result in any code.
Note that a char ('a') is different from a single-character string ("a"). For the latter, comparison has a different meaning: it would compare the pointers rather than the characters.

Your processor would subtract both operands and if it's zero, zero condition bit is set, your values were the same.
For example: on arm machines you have the nzcv (negative, zero, carry, overflow) bits which tell you what happened.

Nothing will happen as a doesn't equal b.
If you question is just about is that the correct way, then the answer is yes.

First 'a' and 'b' are not strings, they are characters. The nuance is important because of its implications.
You can compare characters to characters just fine the same way you can compare integers to integers and floats with floats. It's usually not done because the outcome will always be the same. i.e. 'a' == 'b' will always be false.
If you're comparing strings, however, you'll have to use something like strcmp().

Compiler simply inserts an instruction for comparing two bytes for equality - a very efficient operation. Of course in your case 'a'=='b' is equivalent to a constant false.

The compiler will compare the numeric ASCII codes. So, 'a' is never equal to 'b'. But,
'a' < 'b' evaluates to true, since 'a' appears before 'b' in the ASCII table.

Of course, you want to use variables like
char myChr = 'a' ;
if( myChr == 'b' ) puts( "It's b" ) ;
Now you can start to think about "Yoda conditions", where you would do
if( 'b' == myChr ) puts( "It's a b" ) ;
so that in case you accidently typed one equals sign in the 2nd example:
if( 'b' = myChr ) puts( "It's a b" ) ;
that would raise a compiler error

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do C++ compiler interpret comparison logic in string/character? - c++

Related

characters in c and c++

How to understand following single and double quotes terminology in C++?

C++ casting a single element of a string

subtracting characters of numbers

'a' == 'b'. It's a good way to do?

Categories

Resources