Regarding conversion of text to hex via ASCII in C++ - c++

So, I've looked up how to do conversion from text to hexadecimal according to ASCII, and I have a working solution (proposed on here). My problem is that I don't understand why it works. Here's my code:
#include <string>
#include <iostream>
int main()
{
std::string str1 = "0123456789ABCDEF";
std::string output[2];
std::string input;
std::getline(std::cin, input);
output[0] = str1[input[0] & 15];
output[1] = str1[input[0] >> 4];
std::cout << output[1] << output[0] << std::endl;
}
Which is all well and good - it returns the hexadecimal value for single characters, however, what I don't understand is this:
input[0] & 15
input[0] >> 4
How can you perform bitwise operations on a character from a string? And why does it oh-so-nicely return the exact values we're after?
Thanks for any help! :)

In C++ a character is 8 bits long.
If you '&' it with 15 (binary 1111), then the least significant 4 bits are outputted to the first digit.
When you apply right shift by 4, then it is equivalent of dividing the character value by 16. This gives you the most significant 4 bits for second digit.
Once the above digit values are calculated, the required character is picked up from the constant string str1 having all the characters in their respective positions.

"Characters in a string" are not characters (individual strings of one character only). In some programming languages they are. In Javascript, for example,
var string = "testing 1,2,3";
var character = string[0];
returns "t".
In C and C++, however, 'strings' are arrays of 8-bit characters; each element of the array is an 8-bit number from 0..255.

Characters are just integers. In ASCII the character '0' is the integer 48. C++ makes this conversion implicitly in many contexts, including the one in your code.

Related

Difference between converting int to char by (char) and by ASCII

I have an example:
int var = 5;
char ch = (char)var;
char ch2 = var+48;
cout << ch << endl;
cout << ch2 << endl;
I had some other code. (char) returned wrong answer, but +48 didn't. When I changed ONLY (char) to +48, then my code got corrected.
What is the difference between converting int to char by using (char) and +48 (ASCII) in C++?
char ch=(char)var; has the same effect as char ch=var; and assigns the numeric value 5 to ch. You're using ASCII (supported by all modern systems) and ASCII character code 5 represents Enquiry 'ENQ' an old terminal control code. Perhaps some old timer has a clue what it did!
char ch2 = var+48; assigns the numeric value 53 to ch2 which happens to represent the ASCII character for the digit '5'. ASCII 48 is zero (0) and the digits all appear in the ASCII table in order after that. So 48+5 lands on 53 (which represents the character '5').
In C++ char is a integer type. The value is interpreted as representing an ASCII character but it should be thought of as holding a number.
Its numeric range is either [-128,127] or [0,255]. That's because C++ requires sizeof(char)==1 and all modern platforms have 8 bit bytes.
NB: C++ doesn't actually mandate ASCII, but again that will be the case on all modern platforms.
PS: I think its an unfortunate artifact of C (inherited by C++) that sizeof(char)==1 and there isn't a separate fundamental type called byte.
A char is simply the base integral denomination in c++. Output statements, like cout and printf map char integers to the corresponding character mapping. On Windows computers this is typically ASCII.
Note that the 5th in ASCII maps to the Enquiry character which has no printable character, while the 53rd character maps to the printable character 5.
A generally accepted hack to store a number 0-9 in a char is to do: const char ch = var + '0' It's important to note the shortcomings here:
If your code is running on some non-ASCII character mapping then characters 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 may not be laid out in order in which case this wouldn't work
If var is outside the 0 - 9 range this var + '0' will map to something other than a numeric character mapping
A guaranteed way to get the most significant digit of a number independent of 1 or 2 is to use:
const auto ch = to_string(var).front()
Generally char represents a number as int does. Casting an int value to char doesn't provide it's ASCII representation.
The ASCII codes as numbers for digits range from 48 (== '0') to 58 (== '9'). So to get the printable digit you have to add '0' (or 48).
The difference is that casting to char (char) explicitly converts the digit to a char and adding 48 do not.
Its important to note that an int is typically 32 bit and char is typically 8 bit. This means that the number you can store in a char is from -127 to +127(or 0 to 255-(2^8-1) if you use unsigned char) and in an int from −2,147,483,648 (−231) to 2,147,483,647 (231 − 1)(or 0 to 2^32 -1 for unsigned).
Adding 48 to a value is not changing the type to char.

Why does the size of this std::string change, when characters are changed?

I have an issue in which the size of the string is effected with the presence of a '\0' character. I searched all over in SO and could not get the answer still.
Here is the snippet.
int main()
{
std::string a = "123123\0shai\0";
std::cout << a.length();
}
http://ideone.com/W6Bhfl
The output in this case is
6
Where as the same program with a different string having numerals instead of characters
int main()
{
std::string a = "123123\0123\0";
std::cout << a.length();
}
http://ideone.com/mtfS50
gives an output of
8
What exactly is happening under the hood? How does presence of a '\0' character change the behavior?
The sequence \012 when used in a string (or character) literal is an octal escape sequence. It's the octal number 12 which corresponds to the ASCII linefeed ('\n') character.
That means your second string is actually equal to "123123\n3\0" (plus the actual string literal terminator).
It would have been very clear if you tried to print the contents of the string.
Octal sequences are one to three digits long, and the compiler will use as many digits as possible.
If you check the coloring at ideone you will see that \012 has a different color. That is because this is a single character written in octal.

Why was the C++ string converted to int?

In the following code, I can not understand why the string is converted to int in this way.
Why is it using a sum with 0 ?
string mystring;
vector<int> myint;
mystring[i+1]=myint[i]+'0';
This code converts an int (presumably a digit) to the character that represents it.
Since characters are sequential, and chars can be treated as integers, the character representing a certain digit can, in fact, be described by its distance from '0'. This way, 0 turns turn to the character '0', '5' is the character that is greater than '0' by five, and so on.
This is an efficient, old school and dangerous method to get a char representation of a single digit. '0' will be converted to an int containing its ASCII code (0x30 for '0') and then that is added to myint[i]. If myint[i] is 9 or lower, you can cast myint[i] to a char you will get the resulting digit as text.
Things will not go as expected if you add more than 9 to '0'
You can also get a number from its char representation :
char text = '5';
int digit = text - '0';
The '0' expression isn't string type, it's char type that stores characters of ASCII and also can represent numbers from 0 to 255. So, in arithmetic operations char behaves like integer type.
In C strings a represent as arrays of char: static (char str[N]) or dynamic (char *str = new char[n]). String literals puts into double quotes ("string").
So, '0' is char and "0" is char[1]

Convert a string variable of two chars to hex

I have code where the user inputs two chars into a string variable. I have a function that verifies that the user input is only two chars long, and that it only contains valid hexadecimal digits.
I want to write these digits to a binary file that's 32 bytes long. I tried:
outFile.write((char*)&string[0], 1);
In a loop that runs 32 times (I want to write one byte at a time) to test, but it just writes the ascii code for the char, not the actual char itself. I expected it to write a nybble and skip a nybble, but it wrote a full byte of ascii information instead. So I tried:
outFile.write((unsigned char*)&string[0], 1);
But my compiler complains about it being an invalid cast.
I want to solve this problem without converting the string into a c-style string. In other words, I want string to contain two chars and represent one byte of information. Not four (plus null characters).
You have a string that represents an integer. So convert the string to an integer:
unsigned char byte = (unsigned char)std::stoi(string, 0, 16);
outFile:write(static_cast<const char*>(&byte), 1);
As a workaround for your missing stoi you can do this:
#include <iostream>
#include <sstream>
#include <ios>
char hexnum[]{"2F"}; // or whatever, upper or lowercase hex digits allowed.
std::istringstream input(hexnum);
int num=0;
input >> std::hex >> num;
unsigned char byte = num;
outFile.write(static_cast<const char*>(&byte), 1);

convert char[] of hexadecimal numbers to char[] of letters corresponding to the hexadecimal numbers in ascii table and reversing it

I have a char a[] of hexadecimal characters like this:
"315c4eeaa8b5f8aaf9174145bf43e1784b8fa00dc71d885a804e5ee9fa40b16349c146fb778cdf2d3aff021dfff5b403b510d0d0455468aeb98622b137dae857553ccd8883a7bc37520e06e515d22c954eba5025b8cc57ee59418ce7dc6bc41556bdb36bbca3e8774301fbcaa3b83b220809560987815f65286764703de0f3d524400a19b159610b11ef3e"
I want to convert it to letters corresponding to each hexadecimal number like this:
68656c6c6f = hello
and store it in char b[] and then do the reverse
I don't want a block of code please, I want explanation and what libraries was used and how to use it.
Thanks
Assuming you are talking about ASCII codes. Well, first step is to find the size of b. Assuming you have all characters by 2 hexadecimal digits (for example, a tab would be 09), then size of b is simply strlen(a) / 2 + 1.
That done, you need to go through letters of a, 2 by 2, convert them to their integer value and store it as a string. Written as a formula you have:
b[i] = (to_digit(a[2*i]) << 4) + to_digit(a[2*i+1]))
where to_digit(x) converts '0'-'9' to 0-9 and 'a'-'z' or 'A'-'Z' to 10-15.
Note that if characters below 0x10 are shown with only one character (the only one I can think of is tab, then instead of using 2*i as index to a, you should keep a next_index in your loop which is either added by 2, if a[next_index] < '8' or added by 1 otherwise. In the later case, b[i] = to_digit(a[next_index]).
The reverse of this operation is very similar. Each character b[i] is written as:
a[2*i] = to_char(b[i] >> 4)
a[2*i+1] = to_char(b[i] & 0xf)
where to_char is the opposite of to_digit.
Converting the hexadecimal string to a character string can be done by using std::substr to get the next two characters of the hex string, then using std::stoi to convert the substring to an integer. This can be casted to a character that is added to a std::string. The std::stoi function is C++11 only, and if you don't have it you can use e.g. std::strtol.
To do the opposite you loop over each character in the input string, cast it to an integer and put it in an std::ostringstream preceded by manipulators to have it presented as a two-digit, zero-prefixed hexadecimal number. Append to the output string.
Use std::string::c_str to get an old-style C char pointer if needed.
No external library, only using the C++ standard library.
Forward:
Read two hex chars from input.
Convert to int (0..255). (hint: sscanf is one way)
Append int to output char array
Repeat 1-3 until out of chars.
Null terminate the array
Reverse:
Read single char from array
Convert to 2 hexidecimal chars (hint: sprintf is one way).
Concat buffer from (2) to final output string buffer.
Repeat 1-3 until out of chars.
Almost forgot to mention. stdio.h and the regular C-runtime required only-assuming you're using sscanf and sprintf. You could alternatively create a a pair of conversion tables that would radically speed up the conversions.
Here's a simple piece of code to do the trick:
unsigned int hex_digit_value(char c)
{
if ('0' <= c && c <= '9') { return c - '0'; }
if ('a' <= c && c <= 'f') { return c + 10 - 'a'; }
if ('A' <= c && c <= 'F') { return c + 10 - 'A'; }
return -1;
}
std::string dehexify(std::string const & s)
{
std::string result(s.size() / 2);
for (std::size_t i = 0; i != s.size(); ++i)
{
result[i] = hex_digit_value(s[2 * i]) * 16
+ hex_digit_value(s[2 * i + 1]);
}
return result;
}
Usage:
char const a[] = "12AB";
std::string s = dehexify(a);
Notes:
A proper implementation would add checks that the input string length is even and that each digit is in fact a valid hex numeral.
Dehexifying has nothing to do with ASCII. It just turns any hexified sequence of nibbles into a sequence of bytes. I just use std::string as a convenient "container of bytes", which is exactly what it is.
There are dozens of answers on SO showing you how to go the other way; just search for "hexify".
Each hexadecimal digit corresponds to 4 bits, because 4 bits has 16 possible bit patterns (and there are 16 possible hex digits, each standing for a unique 4-bit pattern).
So, two hexadecimal digits correspond to 8 bits.
And on most computers nowadays (some Texas Instruments digital signal processors are an exception) a C++ char is 8 bits.
This means that each C++ char is represented by 2 hex digits.
So, simply read two hex digits at a time, convert to int using e.g. an istringstream, convert that to char, and append each char value to a std::string.
The other direction is just opposite, but with a twist.
Because char is signed on most systems, you need to convert to unsigned char before converting that value again to hex digits.
Conversion to and from hexadecimal can be done using hex, like e.g.
cout << hex << x;
cin >> hex >> x;
for a suitable definition of x, e.g. int x
This should work for string streams as well.