uint8_t different decimal values for same binary - c++

I have the following issue using an ARM® Cortex™-M4F CPU running mbedOS 5.9.
Say I have the binary value 10101000 and that I also have the following union/struct:
union InputWord_u
{
uint8_t all;
struct BitField_s
{
uint8_t start : 1; // D7
uint8_t select : 3; // D6, D5, D4
uint8_t payload : 4; // D3, D2, D1, D0
} bits;
};
I have a simple program where I access my word and assign the values as such:
InputWord_u word;
word.bits.start = 0b1;
word.bits.select = 0b010;
word.bits.payload = 0b1000;
Therefore, word.all == 10101000 and is a uint8_t.
If I print this as such printf("%u", word.all); then I receive the value of 133.
If I then define the following uint8_t:
uint8_t value = 0b10101000;
And print this using printf("%u", value); then I receive the value 168.
I expect both values to equal 168.
I appreciate that this is likely me grossly misunderstanding how a Struct is represented in memory. Nevertheless, could someone please explain what is exactly going on?
Thanks.

The standard guarantees hardly anything about the representation of bit-fields.
Therefore, word.all == 10101000
What you've tripped over here is that you've assumed that the bit-fields are packed starting from most significant bit to least significant.
However, it appears that your bit fields were stored in the reverse order, and in fact word.all == 1000'010'1. To get the result you expect, you could reorder the bit-fields:
struct BitField_s
{
uint8_t payload : 4; // D3, D2, D1, D0
uint8_t select : 3; // D6, D5, D4
uint8_t start : 1; // D7
} bits;
But be aware that bit-fieds are not portable: Other systems might not have the same order.

The problem is, you calculated value in reversed way, like
(start << 7) | (select << 4) | payload
And actual value is calculated like
(payload << 4) | (select << 1) | start
So your bitfield starts with less-significant parts of uint8. It has nothing to do with little-endianness of system, because little-endianness defines orders of bytes in uint16, uint32 e.t.c.
Order of bits of bitfield inside byte is defined by compiler. For example, MSVC uses low-to-high order, as in your example.

Binary value of 133 and 168
133 = 10000101
168 = 10101000
sums it that actual alignment is different from your assumed alignment.
It seems that it as arranging in the following manner:
---- --- -
all
payload select start
And you are assuming the following order
- --- ----
start all payload
I also think different compiler has different alignment.

Related

How does this Union and Bit field interaction work?

So here is an example:
struct field
{
unsigned int a : 8;
unsigned int b : 8;
unsigned int c : 8;
unsigned int d : 8;
};
union test
{
unsigned int raw;
field bits;
};
int main()
{
test aUnion;
aUnion.raw = 0xabcdef;
printf("a: %x \n", aUnion.bits.a);
printf("b: %x \n", aUnion.bits.b);
printf("c: %x \n", aUnion.bits.c);
printf("d: %x \n", aUnion.bits.d);
return 0;
}
now running this I get:
a: ef
b: cd
c: ab
d: 0
And I guess I just dont really get whats happening here. So I set raw to a value, and since this is a union, everything else pulls from that since they have all been set to be smaller than an unsigned int? so the bit field is based on raw? but how does that map out? why is d: 0 in this instance?
I would appreciate any help here.
Using hexadecimal representation of an integer is useful because it makes clear what is the value of every byte of the integer. So the setting
aUnion.raw = 0xabcdef;
means that the value of least significant byte is 0xef, that the second least significant byte has value 0xcd and so on. But you are setting the raw field of the union, that is an integer so it is 4 bytes long. In the previous representation the most significant byte is missing, so it can be written as
aUnion.raw = 0x00abcdef;
(it is like making explicit that an integer x = 42 has 0 hundreds, 0 thousands and so on).
Your union fields represent respectively a =byte[0], b = byte[1], c = byte[2] and d = byte[3] of the integer raw, since in a union all the elements share the same memory location. This is true because you are running your code in a little endian architecture (least significant bytes come first).
So:
a = byte[0] of raw = 0xef
b = byte[1] of raw = 0xcd
c = byte[2] of raw = 0xab
d = byte[3] of raw = 0x00
Its because your unsigned int isn't 32 bit long enough (all 32 bits not set) to completely fill all the bit field values. Because it only 24 bits long, the bit field d is showing hex value of 00 . Try it for e.g.
aUnion.raw = 0xffabcdef;
which will produce
a: ef
b: cd
c: ab
d: ff
Since the dd bit field occupies bits 24-32 (on little endian), unless the assigned unsigned int field has been assigned a value that occupies those bits set, that bit field position doesn't show the value too.

C++ Compressing size of integer down to 2 bits?

I am doing a little game physics networking project right now, and I am trying to optimize the packets I am sending using this guide:
https://gafferongames.com/post/snapshot_compression/
In the "Optimize Quaternions" section it says:
Don’t always drop the same component due to numerical precision issues. Instead, find the component with the largest absolute value and ENCODE its index using two bits [0,3] (0=x, 1=y, 2=z, 3=w), then send the index of the largest component and the smallest three components over the network
Now my question is, how do I encode an integer down to 2 bits... or have I misunderstood the task?
I know very little about compressing data, but reducing a 4 byte integer (32 bits) down to ONLY 2 bits seems a bit insane to me. Is that even possible, or have I completely misunderstood everything?
EDIT:
Here is some code of what I have so far:
void HavNetConnection::sendBodyPacket(HavNetBodyPacket bp)
{
RakNet::BitStream bsOut;
bsOut.Write((RakNet::MessageID)ID_BODY_PACKET);
float maxAbs = std::abs(bp.rotation(0));
int maxIndex = 0;
for (int i = 1; i < 4; i++)
{
float rotAbs = std::abs(bp.rotation(i));
if (rotAbs > maxAbs) {
maxAbs = rotAbs;
maxIndex = i;
}
}
bsOut.Write(bp.position(0));
bsOut.Write(bp.position(1));
bsOut.Write(bp.position(2));
bsOut.Write(bp.linearVelocity(0));
bsOut.Write(bp.linearVelocity(1));
bsOut.Write(bp.linearVelocity(2));
bsOut.Write(bp.rotation(0));
bsOut.Write(bp.rotation(1));
bsOut.Write(bp.rotation(2));
bsOut.Write(bp.rotation(3));
bsOut.Write(bp.bodyId.toRawInt(bp.bodyId));
bsOut.Write(bp.stepCount);
// Send body packets over UDP (UNRELIABLE), priority could be low.
m_peer->Send(&bsOut, MEDIUM_PRIORITY, UNRELIABLE,
0, RakNet::UNASSIGNED_SYSTEM_ADDRESS, true);
}
The simplest solution to your problem is to use bitfields:
// working type (use your existing Quaternion implementation instead)
struct Quaternion{
float w,x,y,z;
Quaternion(float w_=1.0f, float x_=0.0f, float y_=0.0f, float z_=0.0f) : w(w_), x(x_), y(y_), z(z_) {}
};
struct PacketQuaternion
{
enum LargestElement{
W=0, X=1, Y=2, Z=3,
};
LargestElement le : 2; // 2 bits;
signed int i1 : 9, i2 : 9, i3 : 9; // 9 bits each
PacketQuaternion() : le(W), i1(0), i2(0), i3(0) {}
operator Quaternion() const { // convert packet quaternion to regular quaternion
const float s = 1.0f/float(1<<8); // scale int to [-1, 1]; you could also scale to [-sqrt(.5), sqrt(.5)]
const float f1=s*i1, f2 = s*i2, f3 = s*i3;
const float f0 = std::sqrt(1.0f - f1*f1-f2*f2-f3*f3);
switch(le){
case W: return Quaternion(f0, f1, f2, f3);
case X: return Quaternion(f1, f0, f2, f3);
case Y: return Quaternion(f1, f2, f0, f3);
case Z: return Quaternion(f1, f2, f3, f0);
}
return Quaternion(); // default, can't happen
}
};
If you have a look at the assembler code this generates, you will see a bit of shifting to extract le and i1 to i3 -- essentially the same code you could write manually as well.
Your PacketQuaternion structure will always occupy a whole number of bytes, so (on any non-exotic platform) you will still waste 3 bits (you could just use 10 bits per integer field here, unless you have other use for those bits).
I left out the code to convert from regular quaternion to PacketQuaternion, but that should be relatively simple as well.
Generally (as always when networking is involved), be extra careful that data is converted correctly in all directions, especially, if different architectures or different compilers are involved!
Also, as others have noted, make sure that network bandwidth indeed is a bottle neck before doing aggressive optimization here.
I'm guessing they want you to fit the 2 bits into some value you are already sending that doesn't need all of the available bits, or to pack several small bit fields into a single int for transmission.
You can do things like this:
// these are going to be used as 2 bit fields,
// so we can only go to 3.
enum addresses
{
x = 0, // 00
y = 1, // 01
z = 2, // 10
w = 3 // 11
};
int val_to_send;
// set the value to send, and shift it 2 bits left.
val_to_send = 1234;
// bit pattern: 0000 0100 1101 0010
// bit shift left by 2 bits
val_to_send = val_to_send << 2;
// bit pattern: 0001 0011 0100 1000
// set the address to the last 2 bits.
// this value is address w (bit pattern 11) for example...
val_to_send |= w;
// bit pattern: 0001 0011 0100 1011
send_value(val_to_send);
On the receive end:
receive_value(&rx_value);
// pick off the address by masking with the low 2 bits
address = rx_value & 0x3;
// address now = 3 (w)
// bit shift right to restore the value
rx_value = rx_value >> 2;
// rx_value = 1234 again.
You can 'pack' bits this way, any number of bits at a time.
int address_list;
// set address to w (11)
address_list = w;
// 0000 0011
// bit shift left by 2 bits
address_list = address_list << 2;
// 0000 1100
// now add address x (00)
address_list |= x;
// 0000 1100
// bit shift left 2 more bits
address_list = address_list << 2;
// 0011 0000
// add the address y (01)
address_list |= y;
// 0011 0001
// bit shift left 2 more bits
address_list = address_list << 2;
// 1100 0100
// add the address z. (10)
address_list |= z;
// 1100 0110
// w x y z are now in the lower byte of 'address_list'
This packs 4 addresses into the lower byte of 'address_list';
You just have to do the unpacking on the other end.
This has some implementation details to work out. You only have 30 bits now for the value, not 32. If the data is a signed int, you have more work to do to avoid shifting the sign bit out to the left, etc.
But, fundamentally, this is how you can stuff bit patterns into data that you are sending.
Obviously this assumes that sending is more expensive than the work of packing bits into bytes and ints, etc. This is often the case, especially where low baud rates are involved, as in serial ports.
There are a lot of possible understandings and misunderstandings in play here.
ttemple addressed your technical problem of sending less than a byte.
I want to reiterate the more theoretical points.
This is not done
You originally misunderstood the quoted passage.
We do not use two bits to say “not sending 2121387”,
but to say “not sending z-component”.
That these match exactly, should be easy to see.
This is impossible
If you want to send a 32 bit integer which might take any of the 2^32 possible values,
you need at least 32 bits.
As n bits can represent at most exactly 2^n states,
any smaller amount of bits just will not suffice.
This is kinda possible
Beyond your actual question:
When we relax the requirement that we will always use 2 bits
and have sufficiently strong assumptions
on the probability distribution of the values,
we can get the expected value of the number of bits down.
Ideas like this are used all over the place in the linked article.
Example
Let c be some integer that is 0 almost all the time (97%, say)
and can take any value the rest of the time (3%).
Then we can take one bit to say whether “c is zero”
and need no further bits most of the time.
In the cases where c is not zero,
we spend another 32 bits to encode it regularly.
In total we need 0.97*1+0.03*(1+32) = 1.96 bits on average.
But we need 33 bits sometimes,
which makes this compatible with my earlier assertion of impossibility.
This is complicated
Depending on your background (in math, bit-fiddling etc.) it might just seem like an enormous, unknowable piece of black magic.
(It isn't. You can learn this stuff.)
You do not seem completely lost and a quick learner
but I agree with Remy Lebeau
that you seem to be out of your depth.
Do you really need to do this?
Or are you optimizing prematurely?
If it runs well enough, let it run.
Concentrate on the important stuff.

Two values in one byte

In a single nibble (0-F) I can store one number from 0 to 15. In one byte, I can store a single number from 0 to 255 (00 - FF).
Can I use a byte (00-FF) to store two different numbers each in the range 0-127 (00 - 7F)?
The answer to your question is NO. You can split a single byte into two numbers, but the sum of the bits in the two numbers must be <= 8. Since, the range 0-127 requires 7 bits, the other number in the byte can only be 1 bit, i.e. 0-1.
For obvious cardinality reasons, you cannot store two small integers in the 0 ... 127 range in one byte of 0 ... 255 range. In other words the cartesian product [0;127]×[0;127] has 214 elements which is bigger than 28 (the cardinal of the [0;255] interval, for bytes)
(If you can afford losing precision - which you didn't tell - you could, e.g. by storing only the highest bits ...)
Perhaps your question is: could I store two small integers from [0;15] in a byte? Then of course you could:
typedef unsigned unibble_t; // unsigned nibble in [0;15]
uint8_t make_from_two_nibbles(unibble_t l, unibble_t r) {
assert(l<=15);
assert(r<=15);
return (l<<4) | r;
}
unibble_t left_nible (uint8_t x) { return x >> 4; }
unibble_t right_nibble (uint8_t) { return x & 0xf; }
But I don't think you always should do that. First, you might use bit fields in struct. Then (and most importantly) dealing with nibbles that way might be more inefficient and make less readable code than using bytes.
And updating a single nibble, e.g. with
void update_left_nibble (uint8_t*p, unibble_t l) {
assert (p);
assert (l<=15);
*p = ((l<<4) | ((*p) & 0xf));
}
is sometimes expensive (it involves a memory load and a memory store, so uses the CPU cache and cache coherence machinery), and most importantly is generally a non-atomic operation (what would happen if two different threads are calling simultaneously update_left_nibble on the same address p -i.e. with pointer aliasing- is undefined behavior).
As a rule of thumb, avoid packing more than one data item in a byte unless you are sure it is worthwhile (e.g. you have a billion of such data items).
One byte is not enough for two values in 0…127, because each of those values needs log2(128) = 7 bits, for a total of 14, but a byte is only 8 bits.
You can declare variables with bit-packed storage using the C and C++ bitfield syntax:
struct packed_values {
uint8_t first : 7;
uint8_t second : 7;
uint8_t third : 2;
};
In this example, sizeof(packed_values) should equal 2 because only 16 bits were used, despite having three fields.
This is simpler than using bitwise arithmetic with << and & operators, but it's still not quite the same as ordinary variables: bit-fields have no addresses, so you can't have a pointer (or C++ reference) to one.
Can I use a byte to store two numbers in the range 0-127?
Of course you can:
uint8_t storeTwoNumbers(unsigned a, unsigned b) {
return ((a >> 4) & 0x0f) | (b & 0xf0);
}
uint8_t retrieveTwoNumbers(uint8_t byte, unsigned *a, unsigned *b) {
*b = byte & 0xf0;
*a = (byte & 0x0f) << 4;
}
Numbers are still in range 0...127 (0...255, actually). You just loose some precision, similar to floating point types. Their values increment in steps of 16.
You can store two data in range 0-15 in a single byte, but you should not (one var = one data is a better design).
If you must, you can use bit-masks and bit-shifts to access to the two data in your variable.
uint8_t var; /* range 0-255 */
data1 = (var & 0x0F); /* range 0-15 */
data2 = (var & 0xF0) >> 4; /* range 0-15 */

How to set specific bits?

Let's say I've got a uint16_t variable where I must set specific bits.
Example:
uint16_t field = 0;
That would mean the bits are all zero: 0000 0000 0000 0000
Now I get some values that I need to set at specific positions.
val1=1; val2=2, val3=0, val4=4, val5=0;
The structure how to set the bits is the following
0|000| 0000| 0000 000|0
val1 should be set at the first bit on the left. so its only one or zero.
val2 should be set at the next three bits. val3 on the next four bits. val4 on the next seven bits and val5 one the last bit.
The result would be this:
1010 0000 0000 1000
I only found out how to the one specific bit but not 'groups'. (shift or bitset)
Does anyone have an idea how to solve this issue?
There are (at least) two basic approaches. One would be to create a struct with some bitfields:
struct bits {
unsigned a : 1;
unsigned b : 7;
unsigned c : 4;
unsigned d : 3;
unsigned e : 1;
};
bits b;
b.a = val1;
b.b = val2;
b.c = val3;
b.d = val4;
b.e = val5;
To get the 16-bit value, you could (for one example) create a union of that struct with a uint16_t. Just one minor problem: the standard doesn't guarantee what order the bit fields will end up in when you look at the 16-bit value. Just for example, you might need to reverse the order I've given above to get the order from most to least significant bits that you really want (but changing compilers might muck things up again).
The other obvious possibility would be to use shifting and masking to put the pieces together into a number:
int16_t result = val1 | (val2 << 1) | (val3 << 8) | (val4 << 12) | (val5 << 15);
For the moment, I've assumed each of the inputs starts out in the correct range (i.e., has a value that can be represented in the chosen number of bits). If there's a possibility that could be wrong, you'd want to mask it to the correct number of bits first. The usual way to do that is something like:
uint16_t result = input & ((1 << num_bits) - 1);
In case you're curious about the math there, it works like this. Lets's assume we want to ensure an input fits in 4 bits. Shifting 1 left 4 bits produces 00010000 (in binary). Subtracting one from that then clears the one bit that's set, and sets all the less significant bits than that, giving 00001111 for our example. That gives us the first least significant bits set. When we do a bit-wise AND between that and the input, any higher bits that were set in the input are cleared in the result.
One of the solutions would be to set a K-bit value starting at the N-th bit of field as:
uint16_t value_mask = ((1<<K)-1) << N; // for K=4 and N=3 will be 00..01111000
field = field & ~value_mask; // zeroing according bits inside the field
field = field | ((value << N) & value_mask); // AND with value_mask is for extra safety
Or, if you can use struct instead of uint16_t, you can use Bit fields and let the compiler to perform all these actions for you.
finalvle = 0;
finalvle = (val1&0x01)<<15;
finalvle += (val2&0x07)<<12;
finalvle += (val3&0x0f)<<8
finalvle += (val4&0xfe)<<1;
finalvle += (val5&0x01);
You can use the bitwise or and shift operators to achieve this.
Use shift << to 'move bytes to the left':
int i = 1; // ...0001
int j = i << 3 // ...1000
You can then use bitwise or | to put it at the right place, (assuming you have all zeros at the bits you are trying to overwrite).
int k = 0; // ...0000
k |= i // ...0001
k |= j // ...1001
Edit: Note that #Inspired's answer also explains with zeroing out a certain area of bits. It overall explains how you would go about implementing it properly.
try this code:
uint16_t shift(uint16_t num, int shift)
{
return num | (int)pow (2, shift);
}
where shift is position of bit that you wanna set

UINT16 value appears to be "backwards" when printing

I have a UINT8 pointer mArray, which is being assigned information via a *(UINT16 *) casting. EG:
int offset = someValue;
UINT16 mUINT16 = 0xAAFF
*(UINT16 *)&mArray[offset] = mUINT16;
for(int i = 0; i < mArrayLength; i++)
{
printf("%02X",*(mArray + i));
}
output: ... FF AA ...
expected: ... AA FF ...
The value I am expecting to be printed when it reaches offset is to be AA FF, but the value that is printed is FF AA, and for the life of me I can't figure out why.
You are using a little endian machine.
You didn't specify but I'm guessing your mArray is an array of bytes instead of an array of UINT16s.
You're also running on a little-endian machine. On little endian machines the bytes are stored in the opposite order of big-endian machines. Big endians store them pretty much the way humans read them.
You are probably using a computer that uses a "little-endian" representation of numbers in memory (such as Intel x86 architecture). Basically this means that the least significant byte of any value will be stored at the lowest address of the memory location that is used to store the values. See Wikipdia for details.
In your case, the number 0xAAFF consists of the two bytes 0xAA and 0xFF with 0xFF being the least significant one. Hence, a little-endian machine will store 0xFF at the lowest address and then 0xAA. Hence, if you interpret the memory location to which you have written an UINT16 value as an UINT8, you will get the byte written to that location which happens to be 0xFF
If you want to write an array of UINT16 values into an appropriately sized array of UINT8 values such that the output will match your expectations you could do it in the following way:
/* copy inItems UINT16 values from inArray to outArray in
* MSB first (big-endian) order
*/
void copyBigEndianArray(UINT16 *inArray, size_t inItems, UINT8 *outArray)
{
for (int i = 0; i < inItems; i++)
{
// shift one byte right: AAFF -> 00AA
outArray[2*i] = inArray[i] >> 8;
// cut off left byte in conversion: AAFF -> FF
outArray[2*i + 1] = inArray[i]
}
}
You might also want to check out the hton*/ntoh*-family of functions if they are available on your platform.
It's because your computer's CPU is using little endian representation of integers in memory