memcpy from Byte * to unsigned int Is Reversing Byte Order

memcpy from Byte * to unsigned int Is Reversing Byte Order - c++

I have a CFBitVector that looks like '100000000000000'
I pass the byte array to CFBitVectorGetBits, which then contains the values from this CFBitVector. After this call, bytes[2] looks like:
bytes[0] == '0x80'
bytes[1] == '0x00'
This is exactly what I would expect. However, when copying the contents of bytes[2] to unsigned int bytesValue, the value is 128 when it should be 32768. The decimal value 128 is represented by the hex value 0x0080. Essentially it seems that the byte order is reversed while performing memcpy. What is going on here? Is this just an issue with endianness?
Thanks
CFMutableBitVectorRef bitVector = CFBitVectorCreateMutable(kCFAllocatorDefault, 16);
CFBitVectorSetCount(bitVector, 16);
CFBitVectorSetBitAtIndex(bitVector, 0, 1);
CFRange range = CFRangeMake(0, 16);
Byte bytes[2] = {0,0};
unsigned int bytesValue = 0;
CFBitVectorGetBits(bitVector, range, bytes);
memcpy(&bytesValue, bytes, sizeof(bytes));
return bytesValue;

What is going on here? Is this just an issue with endianness?
Yes.
Your computer is little endian. The 16-bit value 32768 would be represented in-memory as:
00 80
On a little endian machine. You have:
80 00
Which is the opposite, representing 128 as you're seeing.

Related

c++ combining 2 uint8_t into one uint16_t not working?

So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!

There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.

In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.

If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15

Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.

One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.

How concatenate array of bytes and convert to decimal?

How can I concatenate bytes?, for example
I have one byte array, BYTE *buffer[2] = [0x00, 0x02], but I want concatenate this two bytes, but backwards.
something like that
0x0200 <---
and later convert those bytes in decimal 0x0200 = 512
but I don't know how do it on C, because I can't use memcpy or strcat for the reason that buffer is BYTE and not a CHAR, even don't know if I can do that
Can somebody help me with a code or how can I concatenate bytes to convert on decimal?
because I have another byte array, buff = {0x00, 0x00, 0x0C, 0x00, 0x00, 0x00} and need do the same.
help please.
regards.

BYTE is not a standard type and is probably a typedef for unsigned char. Here, I'll use the definitions from <stdint.h> that define intgers for specified byte widths and where a byte is uint8_t.
Concatenating two bytes "backwards" is easy if you think about it:
uint8_t buffer[2] = {0x00, 0x02};
uint16_t x = buffer[1] * 256 + buffer[0];
It isn't called backwards, by the way, but Little Endian byte order. The opposite would be Big Endian, where the most significant byte comes first:
uint16_t x = buffer[0] * 256 + buffer[1];
Then, there's no such thing as "converting to decimal". Internally, all numbers are binary. You can print them as decimal numbers or as hexadeximal numbers or as numbers of any base or even as Roman numerals if you like, but it's still the same number:
printf("dec: %u\n", x); // prints 512
printf("hex: %x\n", x); // prints 200
Now let's look what happens for byte arrays of any length:
uint8_t buffer[4] = {0x11, 0x22, 0x33, 0x44};
uint32_t x = buffer[3] * 256 * 256 * 256
+ buffer[2] * 256 * 256
+ buffer[1] * 256
+ buffer[0];
See a pattern? You can rewrite this as:
uint32_t x = ( ( (buffer[3]) * 256
+ buffer[2]) * 256
+ buffer[1]) * 256
+ buffer[0];
You can convert this logic to a function easily:
uint64_t int_little_endian(uint8_t *arr, size_t n)
{
uint64_t res = 0ul;
while (n--) res = res * 256 + arr[n];
return res;
}
Likewise for Big Endian, wher you move "forward":
uint64_t int_big_endian(uint8_t *arr, size_t n)
{
uint64_t res = 0ul;
while (n--) res = res * 256 + *arr++;
return res;
}
Lastly, code that deals with byte conversions usually doesn't use the arithmetic operations of multiplication and addition, but is uses so-called bit-wise operators. A multiplication with 2 is represented by a shifting all bits of a number right by one. (Much as a multiplication by 10 in decimal is done by shifting all digits by one and appending a zero.) Out multiplication by 256 will become a bit-shift of 8 bytes to the left, which i C notation is x << 8.
Addition is done by applying the bit-wise or. These two operations are not identical, because the bit-wise or operates on bits and does not account for carry. In our case, where there are no clashes between additions, they behave the same. Your Little-Endian conversion function now looks like this:
uint64_t int_little_endian(uint8_t *arr, size_t n)
{
uint64_t res = 0ul;
while (n--) res = res << 8 | arr[n];
return res;
}
And If that doesn't look like some nifty C code, I don't know. (If these bitwise operators confuse you, leave them for now. In your example, you're fine with multiplication and addition.)

C/C++: Conversion of char[] to int fails, unsigned char[] to int works, why?

I haven't found a question answering this exact behaviour, and somehow I just don't understand what is going on:
I read the contents of a Windows Bitmap File (bmp) into a array and use this array later to extract required information:
char biHeader[40];
// ...
source.read(biHeader,40);
// ...
int biHeight = biHeader[8] | (biHeader[9] << 8) | (biHeader[10] << 16) | (biHeader[11] << 24);
After this, biHeight shows as -112 which is totally wrong because it should be 400.
So, I took a look at a hexdump of the file. The contents read are:
90 01 00 00
Changing the byte order to big endian gives 0x190 which is 400 in decimal, as expected.
If I change above code to:
unsigned char biHeader[40];
// ...
source.read((char*)biHeader,40);
// ...
int biHeight = ... (same as before)
... then I get the expected value. What is going on here?
And: How would you read this data?

As a signed 8-bit two's complement integer, 0x90 is -112. When that is converted to int for the |, its value is preserved. Since all bits from the seventh on are set if the representation is two's complement, a bitwise or with values shifted left by at least eight bits doesn't change the value anymore.
As an unsigned 8-bit integer, the value of 0x90 is 144, a positive number with no bits beyond the 2^7 bit set. Then, a bitwise or with biHeader[9] << 8 changes the value to the desired 144 + 256 = 400.
When working with bitwise operators, (almost) always use unsigned types, signed types often lead to unpleasant surprises (and undefined behaviour if the shift result is out of range or a negative integer is shifted left).

C++ How to combine two signed 8 Bit numbers to a 16 Bit short? Unexplainable results

I need to combine two signed 8 Bit _int8 values to a signed short (16 Bit) value. It is important that the sign is not lost.
My code is:
unsigned short lsb = -13;
unsigned short msb = 1;
short combined = (msb << 8 )| lsb;
The result I get is -13. However, I expect it to be 499.
For the following examples, I get the correct results with the same code:
msb = -1; lsb = -6; combined = -6;
msb = 1; lsb = 89; combined = 345;
msb = -1; lsb = 13; combined = -243;
However, msb = 1; lsb = -84; combined = -84; where I would expect 428.
It seems that if the lsb is negative and the msb is positive, something goes wrong!
What is wrong with my code? How does the computer get to these unexpected results (Win7, 64 Bit and VS2008 C++)?

Your lsb in this case contains 0xfff3. When you OR it with 1 << 8 nothing changes because there is already a 1 in that bit position.
Try short combined = (msb << 8 ) | (lsb & 0xff);

Or using a union:
#include <iostream>
union Combine
{
short target;
char dest[ sizeof( short ) ];
};
int main()
{
Combine cc;
cc.dest[0] = -13, cc.dest[1] = 1;
std::cout << cc.target << std::endl;
}

It is possible that lsb is being automatically sign-extended to 16 bits. I notice you only have a problem when it is negative and msb is positive, and that is what you would expect to happen given the way you're using the or operator. Although, you're clearly doing something very strange here. What are you actually trying to do here?

Raisonanse C complier for STM8 (and, possibly, many other compilers) generates ugly code for classic C code when writing 16-bit variables into 8-bit hardware registers.
Note - STM8 is big-endian, for little-endian CPUs code must be slightly modified. Read/Write byte order is important too.
So, standard C code piece:
unsigned int ch1Sum;
...
TIM5_CCR1H = ch1Sum >> 8;
TIM5_CCR1L = ch1Sum;
Is being compiled to:
;TIM5_CCR1H = ch1Sum >> 8;
LDW X,ch1Sum
CLR A
RRWA X,A
LD A,XL
LD TIM5_CCR1,A
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Too long, too slow.
My version:
unsigned int ch1Sum;
...
TIM5_CCR1H = ((u8*)&ch1Sum)[0];
TIM5_CCR1L = ch1Sum;
That is compiled into adequate two MOVes
;TIM5_CCR1H = ((u8*)&ch1Sum)[0];
MOV TIM5_CCR1,ch1Sum
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Opposite direction:
unsigned int uSonicRange;
...
((unsigned char *)&uSonicRange)[0] = TIM1_CCR2H;
((unsigned char *)&uSonicRange)[1] = TIM1_CCR2L;
instead of
unsigned int uSonicRange;
...
uSonicRange = TIM1_CCR2H << 8;
uSonicRange |= TIM1_CCR2L;

Some things you should know about the datatypes (un)signed short and char:
char is an 8-bit value, thats what you where looking for for lsb and msb. short is 16 bits in length.
You should also not store signed values in unsigned ones execpt you know what you are doing.
You can take a look at the two's complement. It describes the representation of negative values (for integers, not for floating-point values) in C/C++ and many other programming languages.
There are multiple versions of making your own two's complement:
int a;
// setting a
a = -a; // Clean version. Easier to understand and read. Use this one.
a = (~a)+1; // The arithmetical version. Does the same, but takes more steps.
// Don't use the last one unless you need it!
// It can be 'optimized away' by the compiler.
stdint.h (with inttypes.h) is more for the purpose of having exact lengths for your variable. If you really need a variable to have a specific byte-length you should use that (here you need it).
You should everythime use datatypes which fit your needs the best. Your code should therefore look like this:
signed char lsb; // signed 8-bit value
signed char msb; // signed 8-bit value
signed short combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
or like this:
#include <stdint.h>
int8_t lsb; // signed 8-bit value
int8_t msb; // signed 8-bit value
int_16_t combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
For the last one the compiler will use signed 8/16-bit values everytime regardless what length int has on your platform. Wikipedia got some nice explanation of the int8_t and int16_t datatypes (and all the other datatypes).
btw: cppreference.com is useful for looking up the ANSI C standards and other things that are worth to know about C/C++.

You wrote, that you need to combine two 8-bit values. Why you're using unsigned short then?
As Dan already said, lsb automatically extended to 16 bits. Try the following code:
uint8_t lsb = -13;
uint8_t msb = 1;
int16_t combined = (msb << 8) | lsb;
This gives you the expected result: 499.

If this is what you want:
msb: 1, lsb: -13, combined: 499
msb: -6, lsb: -1, combined: -1281
msb: 1, lsb: 89, combined: 345
msb: -1, lsb: 13, combined: -243
msb: 1, lsb: -84, combined: 428
Use this:
short combine(unsigned char msb, unsigned char lsb) {
return (msb<<8u)|lsb;
}
I don't understand why you would want msb -6 and lsb -1 to generate -6 though.

Can someone explain this "endian-ness" function for me?

Write a program to determine whether a computer is big-endian or little-endian.
bool endianness() {
int i = 1;
char *ptr;
ptr = (char*) &i;
return (*ptr);
}
So I have the above function. I don't really get it. ptr = (char*) &i, which I think means a pointer to a character at address of where i is sitting, so if an int is 4 bytes, say ABCD, are we talking about A or D when you call char* on that? and why?
Would some one please explain this in more detail? Thanks.
So specifically, ptr = (char*) &i; when you cast it to char*, what part of &i do I get?

If you have a little-endian architecture, i will look like this in memory (in hex):
01 00 00 00
^
If you have a big-endian architecture, i will look like this in memory (in hex):
00 00 00 01
^
The cast to char* gives you a pointer to the first byte of the int (to which I have pointed with a ^), so the value pointed to by the char* will be 01 if you are on a little-endian architecture and 00 if you are on a big-endian architecture.
When you return that value, 0 is converted to false and 1 is converted to true. So, if you have a little-endian architecture, this function will return true and if you have a big-endian architecture, it will return false.

If ptr points to byte A or D depends on the endianness of the machine. ptr points to that byte of the integer that is at the lowest address (the other bytes would be at ptr+1,...).
On a big-endian machine the most significant byte of the integer (which is 0x00) will be stored at this lowest address, so the function will return zero.
On a litte-endian machine it is the opposite, the least significant byte of the integer (0x01) will be stored at the lowest address, so the function will return one in this case.

This is using type punning to access an integer as an array of characters. If the machine is big endian, this will be the major byte, and will have a value of zero, but if the machine is little endian, it will be the minor byte, which will have a value of one. (Instead of accessing i as a single integer, the same memory is accessed as an array of four chars).

Whether *((char*)&i) is byte A or byte D gets to the heart of endianness. On a little endian system, the integer 0x41424344 will be laid out in memory as: 0x44 43 42 41 (least significant byte first; in ASCII, this is "DCBA"). On a big endian system, it will be laid out as: 0x41 42 43 44. A pointer to this integer will hold the address of the first byte. Considering the pointer as an integer pointer, and you get the whole integer. Consider the pointer as a char pointer, and you get the first byte, since that's the size of a char.

Assume int is 4 bytes (in C it may not be). This assumption is just to simplify the example...
You can look at each of these 4 bytes individually.
char is a byte, so it's looking at the first byte of a 4 byte buffer.
If the first byte is non 0 then that tells you if the lowest bit is contained in the first byte.
I randomly chose the number 42 to avoid confusion of any special meaning in the value 1.
int num = 42;
if(*(char *)&num == 42)
{
printf("\nLittle-Endian\n");
}
else
{
printf("Big-Endian\n");
}
Breakdown:
int num = 42;
//memory of the 4 bytes is either: (where each byte is 0 to 255)
//1) 0 0 0 42
//2) 42 0 0 0
char*p = &num;/*Cast the int pointer to a char pointer, pointing to the first byte*/
bool firstByteOf4Is42 = *p == 42;/*Checks to make sure the first byte is 1.*/
//Advance to the 2nd byte
++p;
assert(*p == 0);
//Advance to the 3rd byte
++p;
assert(*p == 0);
//Advance to the 4th byte
++p;
bool lastByteOf4Is42 = *p == 42;
assert(firstByteOf4Is42 == !lastByteOf4Is42);
If firstByteOf4Is42 is true you have little-endian. If lastByteOf4Is42 is true then you have big-endian.

Sure, let's take a look:
bool endianness() {
int i = 1; //This is 0x1:
char *ptr;
ptr = (char*) &i; //pointer to 0001
return (*ptr);
}
If the machine is Little endian, then data will be in *ptr will be 0000 0001.
If the machine is Big Endian, then data will be inverted, that is, i will be
i = 0000 0000 0000 0001 0000 0000 0000 0000
So *ptr will hold 0x0
Finally, the return *ptr is equivalent to
if (*ptr = 0x1 ) //little endian
else //big endian

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js