C++: casted char pointer points to unexpected part of the data - c++

When I try to run the code below:
void* data = new char[SIZE]();
int16_t* num = static_cast<int16_t*>(data);
char* c = static_cast<char*>(data);
num[0] = 49;
num[1] = 50;
num[2] = 48;
for(int i = 0; i < 3; ++i)
cout << num[i] << " ";
cout << endl;
for(int i = 0; i < 6; ++i)
cout << "'" << c[i] << "' ";
cout << endl;
c[1] = 1;
cout << num[0] << endl;
I got some unexpected result:
49 50 48
'1' '' '2' '0' ''
305
So the first line of the output confirms that num[0] == 49 (int16), which in binary form is 00000000 00110001. Converted to char, the first byte should be an unprintable character and the second byte should be '1'. But the second line shows that it's the other way around.
Also, the third line shows the attempt to change the second byte to 00000001 changed the first byte instead. I expected it to be 00000000 00000001, but the int16 value is 305 which is 00000001 00110001.
What's going on here?

This happens because of the way the computer keeps information in its memory, a.k.a Endianness.
Here is a illustration of a 16bit variable in both big/little-endian notations:
In your case, since you are setting the values through a uint16_t variable, and your machine is using little-endian, each pair is stored in reverse order, thus, the '1' is printed before the unprintable character.
Extra reading (Elaborating Martin Bonner's comment)
For 16bit variable, there are only two possible byte endianness notations, those are the ones presented above. For 32bit variables, there are a total of 12 possible byte orderings, and at least three of them are/were used: Big-Endian, Little-Endian and PDP-Endian (a.k.a Middle-Endian):

Related

Printing Latin characters in Linux terminal using `std::wstring` and `std::wcout`

I'm coding in C++ on Linux (Ubuntu) and trying to print a string that contains some Latin characters.
Trying to debug, I have something like the following:
std::wstring foo = L"ÆØÅ";
std::wcout << foo;
for(int i = 0; i < foo.length(); ++i) {
std::wcout << std::hex << (int)foo[i] << " ";
std::wcout << (char)foo[i];
}
Characteristics of output I get:
The first print shows: ???
The loop prints the hex for the three characters as c6 d8 c5
When foo[i] is cast to char (or wchar_t), nothing is printed
Environmental variable $LANG is set to default en_US.UTF-8
In the conclusion of the answer I linked (which I still recommend reading) we can find:
When I should use std::wstring over std::string?
On Linux? Almost never, unless you use a toolkit/framework.
Short explanation why:
First of all, Linux is natively encoded in UTF-8 and is consequent in it (in contrast to e.g. Windows where files has one encoding and cmd.exe another).
Now let's have a look at such simple program:
#include <iostream>
int main()
{
std::string foo = "ψA"; // character 'A' is just control sample
std::wstring bar = L"ψA"; // --
for (int i = 0; i < foo.length(); ++i) {
std::cout << static_cast<int>(foo[i]) << " ";
}
std::cout << std::endl;
for (int i = 0; i < bar.length(); ++i) {
std::wcout << static_cast<int>(bar[i]) << " ";
}
std::cout << std::endl;
return 0;
}
The output is:
-49 -120 65
968 65
What does it tell us? 65 is ASCII code of character 'A', it means that that -49 -120 and 968 corresponds to 'ψ'.
In case of char character 'ψ' takes actually two chars. In case of wchar_t it's just one wchar_t.
Let's also check sizes of those types:
std::cout << "sizeof(char) : " << sizeof(char) << std::endl;
std::cout << "sizeof(wchar_t) : " << sizeof(wchar_t) << std::endl;
Output:
sizeof(char) : 1
sizeof(wchar_t) : 4
1 byte on my machine has standard 8 bits. char has 1 byte (8 bits), while wchar_t has 4 bytes (32 bits).
UTF-8 operates on, nomen omen, code units having 8 bits. There is is a fixed-length UTF-32 encoding used to encode Unicode code points that uses exactly 32 bits (4 bytes) per code point, but it's UTF-8 which Linux uses.
Ergo, terminal expects to get those two negatively signed values to print character 'ψ', not one value which is way above ASCII table (codes are defined up to number 127 - half of char possible values).
That's why std::cout << char(-49) << char(-120); will also print ψ.
But it shows the const char[] as printing correctly. But when I typecast to (char), nothing is printed.
The character was already encoded different, there are different values in there, simple casting won't be enough to convert them.
And as I've shown, size char is 1 byte and of wchar_t is 4 bytes. You can safely cast upward, not downward.

Accessing array by char in C++

Usually, I access an array in C++ by the syntax foo[2], where 2 is the index of an array.
In the below code. I didn't understand how this code is giving output and access this array by index 'b', 'c'. I am confused it is array index or something else.
int count[256] = {0};
count['b'] = 2;
cout << count['b'] << endl; //output 2
cout << count['c'] << endl; //output 0
Output
2
0
Remember that in c++ characters are represented as numbers. Take a look at this ascii table. http://www.asciitable.com
According to this the character 'b' is represented 98 and 'c' as 99. Therefore what your program is really saying is...
int count[256] = {0};
count[98] = 2;
cout << count[98] << endl; //output 2
cout << count[99] << endl; //output 0
Also incase you don't know saying an array = {0} means zero initialize every value so that is why count['c'] = 0.
In C/C++ there is not 8 bit / 1 byte integer. We simply use the char type to represent a single (signed or unsigned) byte and you can even put signed and unsigned infront of the char type. Char really is just another int type which we happen to use to express characters. You can also do the following.
char b = 98;
char c = 99;
char diff = c - b; //diff is now 1
Type char is actually an integral type. Every char value represented by a character literal has an underlying integral value it corresponds to in a given code page, which is probably an ASCII table. When you do:
count['b'] = 2;
you actually do:
count[98] = 2;
as character 'b' corresponds to an integral value of 98, character 'c' corresponds to an integral value of 99 and so on. To illustrate, the following statement:
char c = 'b';
is equivalent of:
char c = 98;
Here c has the same underlying value, it's the representation that differs.
Because characters are always represented by integers in the computer, it can be used as array indices.
You can verify by this:
char ch = 'b';
count[ch] = 2;
int i = ch;
cout << i << endl;
cout << count[i] << endl;
Usually the output is 98 2, but the first number may vary depending on the encoding of your environment.

Byte Swap with an array?

First of all, forgive my extremely amateur coding knowledge.
I am intern at a company and have been assigned to create a code in C++ that swaps bytes in order to get the correct checksum value.
I am reading a list that resembles something like:
S315FFF200207F7FFFFF42A000000000001B000000647C
S315FFF2003041A00000FF7FFFFF0000001B00000064ED
S315FFF2004042480000FF7FFFFF0000001E000000464F
I have made the program convert this string to hex and then int so that it can be read correctly. I am not reading the first 12 chars or last 2 chars of each line.
My question is how do I make the converted int do a byte swap (little endian to big endian) so that it is readable to the computer?
Again I'm sorry if this is a terrible explanation.
EDIT: I need to essentially take each byte (4 letters) and flip them. i.e: 64C7 flipped to C764, etc etc etc. How would I do this and put it into a new array? Each line is a string right now...
EDIT2: This is part of my code as of now...
int j = 12;
for (i = 0; i < hexLength2 - 5; i++){
string convert1 = ODL.substr(j, 4);
short input_int = stoi(convert1);
short lowBit = 0x00FF & input_int;
short hiBit = 0xFF00 & input_int;
short byteSwap = (lowBit << 8) | (hiBit >> 8);
I think I may need to convert my STOI to a short in some way..
EDIT3: Using the answer code below I get the following...
HEX: 8D --> stored to memory (myMem = unsigned short) as 141 (decimal) -->when byte swapped: -29440
Whats wrong here??
for (i = 0; i < hexLength2 - 5; i++){
string convert1 = ODL.substr(j, 2);
stringstream str1;
str1 << convert1;
str1 >> hex >> myMem[k];
short input_int = myMem[k]; //byte swap
short lowBit = 0x00FF & input_int;
short hiBit = 0xFF00 & input_int;
short byteSwap = (lowBit << 8) | (hiBit >> 8);
cout << input_int << endl << "BYTE SWAP: " <<byteSwap <<"Byte Swap End" << endl;
k++;
j += 2;
You can always do it bitwise too. (Assuming 16-bit word) For example, if you're byte swapping an int:
short input_int = 123; // each of the ints that you have
short input_lower_half = 0x00FF & input_int;
short input_upper_half = 0xFF00 & input_int;
// size of short is 16-bits, so shift the bits halfway in each direction that they were originally
short byte_swapped_int = (input_lower_half << 8) | (input_upper_half >> 8)
EDIT: My exact attempt at using your code
unsigned short myMem[20];
int k = 0;
string ODL = "S315FFF2000000008DC7000036B400003030303030319A";
int j = 12;
for(int i = 0; i < (ODL.length()-12)/4; i++) { // not exactly sure what your loop condition was
string convert1 = ODL.substr(j, 4);
cout << "substring is: " << convert1 << endl;
stringstream str1;
str1 << convert1;
str1 >> hex >> myMem[k];
short input_int = myMem[k]; //byte swap
unsigned short lowBit = 0x00FF & input_int; // changed this to unsigned
unsigned short hiBit = 0xFF00 & input_int; // changed this to unsigned
short byteSwap = (lowBit << 8) | (hiBit >> 8);
cout << hex << input_int << " BYTE SWAPed as: " << byteSwap <<", Byte Swap End" << endl;
k++;
j += 4;
}
it only matters to change the loBit and hiBit to be unsigned since those are the temporary values we're using.
If you're asking what I think you're asking-
First, you need to make sure you know what size your integers are. 32 bits is nice and standard, but check and make sure.
Second, cast your integer array as a char array. Now you can access and manipulate the array one byte at a time.
Third- just reverse the order of every four bytes (after your first 12 char offset). Swap the first and fourth and the second and third.

Two bytes into one

First off, I apologize if this is a duplicate; but my Google-fu seems to be failing me today.
I'm in the middle of writing an image format module for Photoshop, and one of the save options for this format, includes a 4-bit alpha channel. Of course, the data I have to convert is 8-bit/1 byte alpha - so I need to essentially take every two bytes of alpha, and merge it into one.
my attempt (below), I believe has a lot of room for improvement:
for(int x=0,w=0;x < alphaData.size();x+=2,w++)
{
short ashort=(alphaData[x] << 8)+alphaData[x+1];
alphaFinal[w]=(unsigned char)ashort;
}
alphaData and alphaFinal are vectors that contains the 8-bit alpha data and the 4-bit alpha data, respectively. I realize that reducing two bytes into the value of one, is bound to result in loss of "resolution", but I can't help but think there's a better way of doing this.
For extra information, here's the loop that does the reverse (converts 4-bit alpha from the format to 8-bit for Photoshop)
alphaData serves the same purpose as above, and imgData is an unsigned char vector that holds the raw image data. (alpha data is tacked on after the actual rgb data for the image in this particular variant of the format)
for(int b=alphaOffset,x2=0;b < (alphaOffset+dataLength); b++,x2+=2)
{
unsigned char lo = (imgData[b] & 15);
unsigned char hi = ((imgData[b] >> 4) & 15);
alphaData[x2]=lo*17;
alphaData[x2+1]=hi*17;
}
Are you sure that it's
alphaData[x2]=lo*17;
alphaData[x2+1]=hi*17;
and not
alphaData[x2]=lo*16;
alphaData[x2+1]=hi*16;
In any case, to generate the values that work with the decoding function you have posted, you just have to reverse the operations. So multiplying by 17 becomes dividing by 17 and the shifts and masks get reordered to look like this:
for(int x=0,w=0;x < alphaData.size();x+=2,w++)
{
unsigned char alpha1 = alphaData[x] / 17;
unsigned char alpha2 = alphaData[x+1] / 17;
Assert(alpha1 < 16 && alpha2 < 16);
alphaFinal[w]=(alpha2 << 4) | alpha1;
}
short ashort=(alphaData[x] << 8)+alphaData[x+1];
alphaFinal[w]=(unsigned char)ashort;
You're actually losing alphaData[x] in alphaFinal. You shift alphaData[x] by 8 bits to the left and then assign 8 low bits.
Also your for loop is unsafe, if for some reason alphaData.size() is odd, you'll run out of range.
what you want to do, I think, is to truncate an 8-bit value into a 4-bit one; not to combine two 8-bit vales. In other words, you want to drop the four least significant bits of each alpha value, not to combine two different alpha values.
So, basically, you want to right-shift by 4.
output = (input >> 4); /* truncate four bits */
in case you're not familiar with binary shifts, take this random 8-bit number:
10110110
>> 1
= 01011011
>> 1
= 00101101
>> 1
= 00010110
>> 1
= 00001011
so,
10110110
>> 4
= 00001011
and to reverse, left-shift instead...
input = (output << 4); /* expand four bits */
which, using the result from that same random 8-bit number as before, would be
00001011
>> 4
= 10110000
obviously, as you noted, 4 bits of precision is lost. But you'd be surprised how little it's noticed in a fully-composited work.
This code
for(int x=0,w=0;x < alphaData.size();x+=2,w++)
{
short ashort=(alphaData[x] << 8)+alphaData[x+1];
alphaFinal[w]=(unsigned char)ashort;
}
Is broken. Given
#include <iostream>
using std::cout;
using std::endl;
typedef unsigned char uchar;
int main() {
uchar x0 = 1; // for alphaData[x]
uchar x1 = 2; // for alphaData[x+1]
short ashort = (x0 << 8) + x1; // The value 0x0102
uchar afinal = (uchar)ashort; // truncates to 0x02.
cout << std::hex
<< "x0 = 0x" << x0 << " << 8 = 0x" << (x0 << 8) << endl
<< "x1 = 0x" << x1 << endl
<< "ashort = 0x" << ashort << endl
<< "afinal = 0x" << (unsigned int)afinal << endl
;
}
If you are saying that your source stream contains sequences of 4-bit pairs stored in 8-bit storage values, which you need to re-store as a single 8-bit value, then what you want is:
for(int x=0,w=0;x < alphaData.size();x+=2,w++)
{
unsigned char aleft = alphaData[x] & 0x0f; // 4 bits.
unsigned char aright = alphaData[x + 1] & 0x0f; // 4 bits.
alphaFinal[w] = (aleft << 4) | (aright);
}
"<<4" is equivalent to "*16", as ">>4" is equivalent to "/16".

Hex bitwise operation in c++

By using filestreaming in c++, I have read a string in the binary file into a buffer (4 bytes). I know that the buffer contains "89abcdef". The buffer is such that:
buffer[0] = 89
buffer[1] = ab
buffer[2] = cd
buffer[3] = ef
Now, I want to recover these numbers into one single hex number 0x89abcdef. However, this is not as simple as I thought. I tried the following code:
int num = 0;
num |= buffer[0];
num <<= 24;
cout << num << endl;
at this point, num is displayed to be
ea000000
When I tried the same algorithm for the second element of the buffer:
num = 0;
num |= buffer[1];
num <<= 16;
cout << num << endl;
output:
ffcd0000
The ff in front of the cd is highly inconvenient for me to add them all together (I was planning to make it something looks like 00cd0000, and add it to the first num).
Could anyone help me to recover the hex number 0x89abcdef? Thanks.
Don't modify the actual number until the end:
num = buffer [0] << 24 | buffer [1] << 16 | buffer [2] << 8 | buffer [3];
buffer [0] << 24 gives you your first result, which is combined with the second result independent of the first, and so on.
Also, as pointed out, operations like this should be done on unsigned numbers, so that the signing doesn't interfere with the result.
For all of your bitwise operations, you're going to want to use unsigned int instead of int. This way you can avoid the kinds of sign-extension problems you're seeing.