How does s[i]^=32 convert upper to lower case? - c++

int main()
{
string s;
cout << "enter the string :" << endl;
cin >> s;
for (int i = 0; i < s.length(); i++)
s[i] ^= 32;
cout << "modified string is : " << s << endl;
return 0;
}
I saw this code which converts uppercase to lowercase on stackoverflow.
But I don't understand the line s[i] = s[i]^32.
How does it work?

^= is the exclusive-or assignment operator. 32 is 100000 in binary, so ^= 32 switches the fifth bit in the destination. In ASCII, lower and upper case letters are 32 positions apart, so this converts lower to upper case, and also the other way.
But it only works for ASCII, not for Unicode for example, and only for letters. To write portable C++, you should not assume the character encoding to be ASCII, so please don't use such code. #πάντα ῥεῖs answer shows a way to do it properly.

How does it work?
Let's see for ASCII value 'A':
'A' is binary 1000001
XORed with 32 (binary 100000)
yields any value where the upper character indicating bit isn't set:
1000001
XOR
100000
= 1100001 == 'a' in ASCII.
Any sane and portable c or c++ application should use tolower():
int main()
{
string s;
cout<<"enter the string :"<<endl;
cin>>s;
for (int i=0;i<s.length();i++) s[i] = tolower( (unsigned char)s[i] );
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cout<<"modified string is : "<<s<<endl;
return 0;
}
The s[i]=s[i]^32 (cargo cult) magic, relies on ASCII table specific mapping to numeric char values.
There are other char code tables like e.g. EBCDIC
, where the
s[i]=s[i]^32
method miserably fails to retrieve the corresponding lower case letters.
There's a more sophisticated c++ version of converting to lower case characters shown in the reference documentation page of std::ctype::tolower().

In C++, like its predecessor C, a char is a numeric type. This is after all how characters are represented on the hardware and these languages don't hide that from you.
In ASCII, letters have the useful property that the difference between an uppercase and a lowercase letter is a single binary bit: the 5th bit (if we start numbering from the right starting at 0).
Uppercase A is represented by the byte 0b01000001 (0x41 in hex), and lowercase a is represented by the byte 0b01100001 (0x61 in hex). Notice that the only difference between uppercase and lowercase A is the fifth bit. This pattern continues from B to Z.
So, when you do ^= 32 (which, incidentally, is 2 to the 5th power) on a number that represents an ASCII character, what that does is toggle the 5th bit - if it is 0, it becomes 1, and vice versa, which changes the character from upper to lower case and vice versa.

Related

Difference between converting int to char by (char) and by ASCII

I have an example:
int var = 5;
char ch = (char)var;
char ch2 = var+48;
cout << ch << endl;
cout << ch2 << endl;
I had some other code. (char) returned wrong answer, but +48 didn't. When I changed ONLY (char) to +48, then my code got corrected.
What is the difference between converting int to char by using (char) and +48 (ASCII) in C++?
char ch=(char)var; has the same effect as char ch=var; and assigns the numeric value 5 to ch. You're using ASCII (supported by all modern systems) and ASCII character code 5 represents Enquiry 'ENQ' an old terminal control code. Perhaps some old timer has a clue what it did!
char ch2 = var+48; assigns the numeric value 53 to ch2 which happens to represent the ASCII character for the digit '5'. ASCII 48 is zero (0) and the digits all appear in the ASCII table in order after that. So 48+5 lands on 53 (which represents the character '5').
In C++ char is a integer type. The value is interpreted as representing an ASCII character but it should be thought of as holding a number.
Its numeric range is either [-128,127] or [0,255]. That's because C++ requires sizeof(char)==1 and all modern platforms have 8 bit bytes.
NB: C++ doesn't actually mandate ASCII, but again that will be the case on all modern platforms.
PS: I think its an unfortunate artifact of C (inherited by C++) that sizeof(char)==1 and there isn't a separate fundamental type called byte.
A char is simply the base integral denomination in c++. Output statements, like cout and printf map char integers to the corresponding character mapping. On Windows computers this is typically ASCII.
Note that the 5th in ASCII maps to the Enquiry character which has no printable character, while the 53rd character maps to the printable character 5.
A generally accepted hack to store a number 0-9 in a char is to do: const char ch = var + '0' It's important to note the shortcomings here:
If your code is running on some non-ASCII character mapping then characters 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 may not be laid out in order in which case this wouldn't work
If var is outside the 0 - 9 range this var + '0' will map to something other than a numeric character mapping
A guaranteed way to get the most significant digit of a number independent of 1 or 2 is to use:
const auto ch = to_string(var).front()
Generally char represents a number as int does. Casting an int value to char doesn't provide it's ASCII representation.
The ASCII codes as numbers for digits range from 48 (== '0') to 58 (== '9'). So to get the printable digit you have to add '0' (or 48).
The difference is that casting to char (char) explicitly converts the digit to a char and adding 48 do not.
Its important to note that an int is typically 32 bit and char is typically 8 bit. This means that the number you can store in a char is from -127 to +127(or 0 to 255-(2^8-1) if you use unsigned char) and in an int from −2,147,483,648 (−231) to 2,147,483,647 (231 − 1)(or 0 to 2^32 -1 for unsigned).
Adding 48 to a value is not changing the type to char.

Char Subtraction in c++

I am fairly new to C++ and i have some trouble in understanding character subtraction in c++.
I had this code intially
char x='2';
x-='0';
if(x) cout << "More than Zero" << endl;
This returned More than Zero as output so to know the value of x i tried this code.
char x='2';
x-='0';
if(x) cout << x << endl;
And i am getting null character(or new line) as output.
Any help is appreciated.
According to the C++ Standard (2.3 Character sets)
...In both the source and execution basic character sets, the value of
each character after 0 in the above list of decimal digits shall be
one greater than the value of the previous.
So the codes of adjacent digits in any character set differ by 1.
Thus in this code snippet
char x='2';
x-='0';
if(x) cout << x << endl;
the difference between '2' and '0' (the difference between codes that represent these characters; for example in ASCII these codes are 0x32 and 0x30 while in EBCDIC they are 0xF2 and 0xF0 correspondingly) is equal to 2.
You can check this for example the following way
if(x) cout << ( int )x << endl;
or
if(x) cout << static_cast<int>( x ) << endl;
If you just write
if(x) cout << x << endl;
then the operator << tries to output x as a printable character image of the value 2 because x is of type char.
In C/C++ characters are stored as 8-bit integers with ASCII encoding. So when you do x-='0'; you're subtracting the ASCII value of '0' which is 48 from the ASCII value of '2' which is 50. x is then equal to 2 which is a special control character stating STX (start of text), which is not printable.
If you want to perform arithmetic on characters it's better to subtract '0' from every character before any operation and adding '0' to the result. To avoid problems like running over the range of the 8bit value I'd suggest to cast them on ints or longs.
char x = '2';
int tempVal = x - '0';
/*
Some operations are performed here
*/
x = tempValue % 10 + '0';
// % 10 - in case it excedes the range reserved for numbers in ASCII
cout << x << endl;
It's much safer to perform these operations on larger value types, and subtracting the '0' character allows you to perform operations independent on the ASCII encoding like you'd do with casual integers. Then you add '0' to go back to the ASCII encoding, which alows you to print a number.
You are substracting 48 (ascii char '0') to the character 50 (ascii '2')
50 - 48 = 2
if (x) ' true
In C++, characters are all represented by an ASCII code (see http://www.asciitable.com/)
I guess that doing :
'2' - '0'
is like doing
50 - 48 = 2
According to the ASCII table, the ASCII code 2 stands for start of text, which is not displayed by cout.
Hope it helps.
So what your code is doing is the following:
x = '2', which represents 50 as a decimal value in the ASCII table.
then your are basically saying:
x = x - '0', where zero in the ASCII table is represented as 48 in decimal, which equates to x = 50 - 48 = 2.
Note that 2 != '2' . If you look up 2(decimal) in the ASCII table that will give you a STX (start of text). This is what your code is doing. So keep in mind that the subtraction is taking place on the decimal value of the char.

Regarding conversion of text to hex via ASCII in C++

So, I've looked up how to do conversion from text to hexadecimal according to ASCII, and I have a working solution (proposed on here). My problem is that I don't understand why it works. Here's my code:
#include <string>
#include <iostream>
int main()
{
std::string str1 = "0123456789ABCDEF";
std::string output[2];
std::string input;
std::getline(std::cin, input);
output[0] = str1[input[0] & 15];
output[1] = str1[input[0] >> 4];
std::cout << output[1] << output[0] << std::endl;
}
Which is all well and good - it returns the hexadecimal value for single characters, however, what I don't understand is this:
input[0] & 15
input[0] >> 4
How can you perform bitwise operations on a character from a string? And why does it oh-so-nicely return the exact values we're after?
Thanks for any help! :)
In C++ a character is 8 bits long.
If you '&' it with 15 (binary 1111), then the least significant 4 bits are outputted to the first digit.
When you apply right shift by 4, then it is equivalent of dividing the character value by 16. This gives you the most significant 4 bits for second digit.
Once the above digit values are calculated, the required character is picked up from the constant string str1 having all the characters in their respective positions.
"Characters in a string" are not characters (individual strings of one character only). In some programming languages they are. In Javascript, for example,
var string = "testing 1,2,3";
var character = string[0];
returns "t".
In C and C++, however, 'strings' are arrays of 8-bit characters; each element of the array is an 8-bit number from 0..255.
Characters are just integers. In ASCII the character '0' is the integer 48. C++ makes this conversion implicitly in many contexts, including the one in your code.

convert char[] of hexadecimal numbers to char[] of letters corresponding to the hexadecimal numbers in ascii table and reversing it

I have a char a[] of hexadecimal characters like this:
"315c4eeaa8b5f8aaf9174145bf43e1784b8fa00dc71d885a804e5ee9fa40b16349c146fb778cdf2d3aff021dfff5b403b510d0d0455468aeb98622b137dae857553ccd8883a7bc37520e06e515d22c954eba5025b8cc57ee59418ce7dc6bc41556bdb36bbca3e8774301fbcaa3b83b220809560987815f65286764703de0f3d524400a19b159610b11ef3e"
I want to convert it to letters corresponding to each hexadecimal number like this:
68656c6c6f = hello
and store it in char b[] and then do the reverse
I don't want a block of code please, I want explanation and what libraries was used and how to use it.
Thanks
Assuming you are talking about ASCII codes. Well, first step is to find the size of b. Assuming you have all characters by 2 hexadecimal digits (for example, a tab would be 09), then size of b is simply strlen(a) / 2 + 1.
That done, you need to go through letters of a, 2 by 2, convert them to their integer value and store it as a string. Written as a formula you have:
b[i] = (to_digit(a[2*i]) << 4) + to_digit(a[2*i+1]))
where to_digit(x) converts '0'-'9' to 0-9 and 'a'-'z' or 'A'-'Z' to 10-15.
Note that if characters below 0x10 are shown with only one character (the only one I can think of is tab, then instead of using 2*i as index to a, you should keep a next_index in your loop which is either added by 2, if a[next_index] < '8' or added by 1 otherwise. In the later case, b[i] = to_digit(a[next_index]).
The reverse of this operation is very similar. Each character b[i] is written as:
a[2*i] = to_char(b[i] >> 4)
a[2*i+1] = to_char(b[i] & 0xf)
where to_char is the opposite of to_digit.
Converting the hexadecimal string to a character string can be done by using std::substr to get the next two characters of the hex string, then using std::stoi to convert the substring to an integer. This can be casted to a character that is added to a std::string. The std::stoi function is C++11 only, and if you don't have it you can use e.g. std::strtol.
To do the opposite you loop over each character in the input string, cast it to an integer and put it in an std::ostringstream preceded by manipulators to have it presented as a two-digit, zero-prefixed hexadecimal number. Append to the output string.
Use std::string::c_str to get an old-style C char pointer if needed.
No external library, only using the C++ standard library.
Forward:
Read two hex chars from input.
Convert to int (0..255). (hint: sscanf is one way)
Append int to output char array
Repeat 1-3 until out of chars.
Null terminate the array
Reverse:
Read single char from array
Convert to 2 hexidecimal chars (hint: sprintf is one way).
Concat buffer from (2) to final output string buffer.
Repeat 1-3 until out of chars.
Almost forgot to mention. stdio.h and the regular C-runtime required only-assuming you're using sscanf and sprintf. You could alternatively create a a pair of conversion tables that would radically speed up the conversions.
Here's a simple piece of code to do the trick:
unsigned int hex_digit_value(char c)
{
if ('0' <= c && c <= '9') { return c - '0'; }
if ('a' <= c && c <= 'f') { return c + 10 - 'a'; }
if ('A' <= c && c <= 'F') { return c + 10 - 'A'; }
return -1;
}
std::string dehexify(std::string const & s)
{
std::string result(s.size() / 2);
for (std::size_t i = 0; i != s.size(); ++i)
{
result[i] = hex_digit_value(s[2 * i]) * 16
+ hex_digit_value(s[2 * i + 1]);
}
return result;
}
Usage:
char const a[] = "12AB";
std::string s = dehexify(a);
Notes:
A proper implementation would add checks that the input string length is even and that each digit is in fact a valid hex numeral.
Dehexifying has nothing to do with ASCII. It just turns any hexified sequence of nibbles into a sequence of bytes. I just use std::string as a convenient "container of bytes", which is exactly what it is.
There are dozens of answers on SO showing you how to go the other way; just search for "hexify".
Each hexadecimal digit corresponds to 4 bits, because 4 bits has 16 possible bit patterns (and there are 16 possible hex digits, each standing for a unique 4-bit pattern).
So, two hexadecimal digits correspond to 8 bits.
And on most computers nowadays (some Texas Instruments digital signal processors are an exception) a C++ char is 8 bits.
This means that each C++ char is represented by 2 hex digits.
So, simply read two hex digits at a time, convert to int using e.g. an istringstream, convert that to char, and append each char value to a std::string.
The other direction is just opposite, but with a twist.
Because char is signed on most systems, you need to convert to unsigned char before converting that value again to hex digits.
Conversion to and from hexadecimal can be done using hex, like e.g.
cout << hex << x;
cin >> hex >> x;
for a suitable definition of x, e.g. int x
This should work for string streams as well.

How do I convert a single char in string to an int

Keep in mind, if you choose to answer the question, I am a beginner in the field of programming and may need a bit more explanation than others as to how the solutions work.
Thank you for your help.
My problem is that I am trying to do computations with parts of a string (consisting only of numbers), but I do not know how to convert an individual char to an int. The string is named "message".
for (int place = 0; place < message.size(); place++)
{
if (secondPlace == 0)
{
cout << (message[place]) * 100 << endl;
}
}
Thank you.
If you mean that you want to convert the character '0' to the integer 0, and '1' to 1, et cetera, than the simplest way to do this is probably the following:
int number = message[place] - '0';
Since the characters for digits are encoded in ascii in ascending numerical order, you can subtract the ascii value of '0' from the ascii value of the character in question and get a number equal to the digit.