Endian-safe conversion from uint16 with value less than 256 to uint8 - c++

I am interacting with an API that returns uint16_t values; in this case I know that the value is never going to exceed 255. I need to convert the value to a uint8_t for usage with a separate API. I am currently doing this in the following way:
uint16_t u16_value = 100;
uint8_t u8_value = u16_value << 8;
This solution currently exposes endianness issues if moving from a little-endian (my current system) to a big-endian system.
What is the best way to mitigate against this?

From cppreference
For unsigned and positive a, the value of a << b is the value of a * 2**b, reduced modulo maximum value of the return type plus 1 (that is, bitwise left shift is performed and the bits that get shifted out of the destination type are discarded).
There's nothing about endianness here. You can just do
uint16_t u16_value = 100;
uint8_t u8_value = u16_value;
or
uint16_t u16_value = 100;
uint8_t u8_value = static_cast<uint8_t>(u16_value);
To be explicit.

Related

How to deal with the sign bit of integer representations with odd bit counts?

Let's assume we have a representation of -63 as signed seven-bit integer within a uint16_t. How can we convert that number to float and back again, when we don't know the representation type (like two's complement).
An application for such an encoding could be that several numbers are stored in one int16_t. The bit-count could be known for each number and the data is read/written from a third-party library (see for example the encoding format of tivxDmpacDofNode() here: https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/tiovx/docs/user_guide/group__group__vision__function__dmpac__dof.html --- but this is just an example). An algorithm should be developed that makes the compiler create the right encoding/decoding independent from the actual representation type. Of course it is assumed that the compiler uses the same representation type as the library does.
One way that seems to work well, is to shift the bits such that their sign bit coincides with the sign bit of an int16_t and let the compiler do the rest. Of course this makes an appropriate multiplication or division necessary.
Please see this example:
#include <iostream>
#include <cmath>
int main()
{
// -63 as signed seven-bits representation
uint16_t data = 0b1000001;
// Shift 9 bits to the left
int16_t correct_sign_data = static_cast<int16_t>(data << 9);
float f = static_cast<float>(correct_sign_data);
// Undo effect of shifting
f /= pow(2, 9);
std::cout << f << std::endl;
// Now back to signed bits
f *= pow(2, 9);
uint16_t bits = static_cast<uint16_t>(static_cast<int16_t>(f)) >> 9;
std::cout << "Equals: " << (data == bits) << std::endl;
return 0;
}
I have two questions:
This example uses actually a number with known representation type (two's complement) converted by https://www.exploringbinary.com/twos-complement-converter/. Is the bit-shifting still independent from that and would it work also for other representation types?
Is this the canonical and/or most elegant way to do it?
Clarification:
I know the bit width of the integers I would like to convert (please check the link to the TIOVX example above), but the integer representation type is not specified.
The intention is to write code that can be recompiled without changes on a system with another integer representation type and still correctly converts from int to float and/or back.
My claim is that the example source code above does exactly that (except that the example input data is hardcoded and it would have to be different if the integer representation type were not two's complement). Am I right? Could such a "portable" solution be written also with a different (more elegant/canonical) technique?
Your question is ambiguous as to whether you intend to truly store odd-bit integers, or odd-bit floats represented by custom-encoded odd-bit integers. I'm assuming by "not knowing" the bit-width of the integer, that you mean that the bit-width isn't known at compile time, but is discovered at runtime as your custom values are parsed from a file, for example.
Edit by author of original post:
The assumption in the original question that the presented code is independent from the actual integer representation type, is wrong (as explained in the comments). Integer types are not specified, for example it is not clear that the leftmost bit is the sign bit. Therefore the presented code also contains assumptions, they are just different (and most probably worse) than the assumption "integer representation type is two's complement".
Here's a simple example of storing an odd-bit integer. I provide a simple struct that let's you decide how many bits are in your integer. However, for simplicity in this example, I used uint8_t which has a maximum of 8-bits obviously. There are several different assumptions and simplifications made here, so if you want help on any specific nuance, please specify more in the comments and I will edit this answer.
One key detail is to properly mask off your n-bit integer after performing 2's complement conversions.
Also please note that I have basically ignored overflow concerns and bit-width switching concerns that may or may not be a problem depending on how you intend to use your custom-width integers and the maximum bit-width you intend to support.
#include <iostream>
#include <string>
struct CustomInt {
int bitCount = 7;
uint8_t value;
uint8_t mask = 0;
CustomInt(int _bitCount, uint8_t _value) {
bitCount = _bitCount;
value = _value;
mask = 0;
for (int i = 0; i < bitCount; ++i) {
mask |= (1 << i);
}
}
bool isNegative() {
return (value >> (bitCount - 1)) & 1;
}
int toInt() {
bool negative = isNegative();
uint8_t tempVal = value;
if (negative) {
tempVal = ((~tempVal) + 1) & mask;
}
int ret = tempVal;
return negative ? -ret : ret;
}
float toFloat() {
return toInt(); //Implied truncation!
}
void setFromFloat(float f) {
int intVal = f; //Implied truncation!
bool negative = f < 0;
if (negative) {
intVal = -intVal;
}
value = intVal;
if (negative) {
value = ((~value) + 1) & mask;
}
}
};
int main() {
CustomInt test(7, 0b01001110); // -50. Would be 78 if this were a normal 8-bit integer
std::cout << test.toFloat() << std::endl;
}

2 int8_t's to uint16_t and back

I want to support some serial device in my application.
This device is used with another program and I want to interact with both the device and the save files this program creates.
Yet for some yet to be discovered reason, weird integer casting is going on.
The device returns uint8's over a serial USB connection, the program saves them as int8 to a file and when you read the file, you need to combine 2 int8's to a single uint16.
So when writing the save-file after reading the device, i need to convert an int8 to uint8, resulting in any value higher then 127 to be written as a negative.
Then when I read the save file, I need to combine 2 int8's into a single uint16.
(So convert the negative value to positive and then stick them together)
And then when I save to a save file from within my application, I need to split my uint16 into 2 int8's.
I need to come up with the functions "encode", "combine" and "explode"
// When I read the device and write to the save file:
uint8_t val_1 = 255;
int8_t val_2 = encode(val_1);
REQUIRE(-1 == val_2);
// When I read from the save file to use it in my program.
int8_t val_1 = 7;
int8_t val_2 = -125;
uint16_t val_3 = combine(val_1, val_2);
REQUIRE(1923 == val_3);
// When I export data from my program to the device save-file
int8_t val_4;
int8_t val_5;
explode(val_3, &val_1, &val_2);
REQUIRE(7 == val_4);
REQUIRE(-125 == val_5);
Can anyone give me a head start here?
Your encode method can just be an assignment. Implicit conversion between unsigned integer types and signed integer types is well defined.
uint8_t val_1 = 255;
int8_t val_2 = val_1;
REQUIRE(-1 == val_2);
As for combine - you'll want to cast your first value to a uint16_t to ensure you have enough bits available, and then bitshift it left by 8 bits. This causes the bits from your first value to make up the 8 most significant bits of your new value (the 8 least significant bits are zero). You can then add your second value, which will set the 8 least significant bits.
uint16_t combine(uint8_t a, uint8_t b) {
return ((uint16_t)a << 8) + b;
}
Explode is just going to be the opposite of this. You need to bitshift right 8 bits to get the first output value, and then just simply assign to get the lowest 8 bits.
void explode(uint16_t from, int8_t &to1, int8_t &to2) {
// This gets the lowest 8 bits, and implicitly converts
// from unsigned to signed
to2 = from;
// Move the 8 most significant bits to be the 8 least
// significant bits, and then just assign as we did
// for the other value
to1 = (from >> 8);
}
As a full program:
#include <iostream>
#include <cstdint>
using namespace std;
int8_t encode(uint8_t from) {
// implicit conversion from unsigned to signed
return from;
}
uint16_t combine(uint8_t a, uint8_t b) {
return ((uint16_t)a << 8) + b;
}
void explode( uint16_t from, int8_t &to1, int8_t &to2 ) {
to2 = from;
to1 = (from >> 8);
}
int main() {
uint8_t val_1 = 255;
int8_t val_2 = encode(val_1);
assert(-1 == val_2);
// When I read from the save file to use it in my program.
val_1 = 7;
val_2 = -125;
uint16_t val_3 = combine(val_1, val_2);
assert(1923 == val_3);
// When I export data from my program to the device save-file
int8_t val_4;
int8_t val_5;
explode(val_3, val_4, val_5);
assert(7 == val_4);
assert(-125 == val_5);
}
For further reading on bit-manipulation mechanics, you could take a look here.

How to grab specific bits from a 256 bit message?

I'm using winsock to receive udp messages 256 bits long. I use 8 32-bit integers to hold the data.
int32_t dataReceived[8];
recvfrom(client, (char *)&dataReceived, 8 * sizeof(int), 0, &fromAddr, &fromLen);
I need to grab specific bits like, bit #100, #225, #55, etc. So some bits will be in dataReceived[3], some in dataReceived[4], etc.
I was thinking I need to bitshift each array, but things got complicated. Am I approaching this all wrong?
Why are you using int32_t type for buffer elements and not uint32_t?
I usually use something like this:
int bit_needed = 100;
uint32_t the_bit = dataReceived[bit_needed>>5] & (1U << (bit_needed & 0x1F));
Or you can use this one (but it won't work for sign in signed integers):
int bit_needed = 100;
uint32_t the_bit = (dataReceived[bit_needed>>5] >> (bit_needed & 0x1F)) & 1U;
In other answers you can access only lowes 8bits in each int32_t.
When you count bits and bytes from 0:
int bit_needed = 100;
So:
int byte = int(bit_needed / 8);
int bit = bit_needed % 8;
int the_bit = dataReceived[byte] & (1 << bit);
If the recuired bit contains 0, then the_bit will be zero. If it's 1, then the_bit will hold 2 to the power of that bit ordinal place within the byte.
You can make a small function to do the job.
uint8_t checkbit(uint32_t *dataReceived, int bitToCheck)
{
byte = bitToCheck/32;
bit = bitToCheck - byte*32;
if( dataReceived[byte] & (1U<< bit))
return 1;
else
return 0;
}
Note that you should use uint32_t rather than int32_t, if you are using bit shifting. Signed integer bit shifts lead to unwanted results, especially if the MSbit is 1.
You can use a macro in C or C++ to check for specific bit:
#define bit_is_set(var,bit) ((var) & (1 << (bit)))
and then a simple if:
if(bit_is_set(message,29)){
//bit is set
}

How to store double - endian independent

Despite the fact that big-endian computers are not very widely used, I want to store the double datatype in an independant format.
For int, this is really simple, since bit shifts make that very convenient.
int number;
int size=sizeof(number);
char bytes[size];
for (int i=0; i<size; ++i)
bytes[size-1-i] = (number >> 8*i) & 0xFF;
This code snipet stores the number in big endian format, despite the machine it is being run on. What is the most elegant way to do this for double?
The best way for portability and taking format into account, is serializing/deserializing the mantissa and the exponent separately. For that you can use the frexp()/ldexp() functions.
For example, to serialize:
int exp;
unsigned long long mant;
mant = (unsigned long long)(ULLONG_MAX * frexp(number, &exp));
// then serialize exp and mant.
And then to deserialize:
// deserialize to exp and mant.
double result = ldexp ((double)mant / ULLONG_MAX, exp);
The elegant thing to do is to limit the endianness problem to as small a scope as possible. That narrow scope is the I/O boundary between your program and the outside world. For example, the functions that send binary data to / receive binary data from some other application need to be aware of the endian problem, as do the functions that write binary data to / read binary data from some data file. Make those interfaces cognizant of the representation problem.
Make everything else blissfully ignorant of the problem. Use the local representation everywhere else. Represent a double precision floating point number as a double rather than an array of 8 bytes, represent a 32 bit integer as an int or int32_t rather than an array of 4 bytes, et cetera. Dealing with the endianness problem throughout your code is going to make your code bloated, error prone, and ugly.
The same. Any numeric object, including double, is eventually several bytes which are interpreted in a specific order according to endianness. So if you revert the order of the bytes you'll get exactly the same value in the reversed endianness.
char *src_data;
char *dst_data;
for (i=0;i<N*sizeof(double);i++) *dst_data++=src_data[i ^ mask];
// where mask = 7, if native == low endian
// mask = 0, if native = big_endian
The elegance lies in mask which handles also short and integer types: it's sizeof(elem)-1 if the target and source endianness differ.
Not very portable and standards violating, but something like this:
std::array<unsigned char, 8> serialize_double( double const* d )
{
std::array<unsigned char, 8> retval;
char const* begin = reinterpret_cast<char const*>(d);
char const* end = begin + sizeof(double);
union
{
uint8 i8s[8];
uint16 i16s[4];
uint32 i32s[2];
uint64 i64s;
} u;
u.i64s = 0x0001020304050607ull; // one byte order
// u.i64s = 0x0706050403020100ull; // the other byte order
for (size_t index = 0; index < 8; ++index)
{
retval[ u.i8s[index] ] = begin[index];
}
return retval;
}
might handle a platform with 8 bit chars, 8 byte doubles, and any crazy-ass byte ordering (ie, big endian in words but little endian between words for 64 bit values, for example).
Now, this doesn't cover the endianness of doubles being different than that of 64 bit ints.
An easier approach might be to cast your double into a 64 bit unsigned value, then output that as you would any other int.
void reverse_endian(double number, char (&bytes)[sizeof(double)])
{
const int size=sizeof(number);
memcpy(bytes, &number, size);
for (int i=0; i<size/2; ++i)
std::swap(bytes[i], bytes[size-i-1]);
}

int to short Assignment failing

I've encountered some strange behaviour when trying to promote a short to an int where the upper 2 bytes are 0xFFFF after promotion. AFAIK the upper bytes should always remain 0. See the following code:
unsigned int test1 = proxy0->m_collisionFilterGroup;
unsigned int test2 = proxy0->m_collisionFilterMask;
unsigned int test3 = proxy1->m_collisionFilterGroup;
unsigned int test4 = proxy1->m_collisionFilterMask;
if( test1 & 0xFFFF0000 || test2 & 0xFFFF0000 || test3 & 0xFFFF0000 || test4 & 0xFFFF0000 )
{
std::cout << "test";
}
The values of the involved variables is once cout is hit is:
Note the two highlighted values. I also looked at the disassembly which also looks fine to me:
My software is targeting x64 compiled with VS 2008 SP1. I also link in an out of the box version of Bullet Physics 2.80. The proxy objects are bullet objects.
The proxy class definition is as follows (with some functions trimmed out):
///The btBroadphaseProxy is the main class that can be used with the Bullet broadphases.
///It stores collision shape type information, collision filter information and a client object, typically a btCollisionObject or btRigidBody.
ATTRIBUTE_ALIGNED16(struct) btBroadphaseProxy
{
BT_DECLARE_ALIGNED_ALLOCATOR();
///optional filtering to cull potential collisions
enum CollisionFilterGroups
{
DefaultFilter = 1,
StaticFilter = 2,
KinematicFilter = 4,
DebrisFilter = 8,
SensorTrigger = 16,
CharacterFilter = 32,
AllFilter = -1 //all bits sets: DefaultFilter | StaticFilter | KinematicFilter | DebrisFilter | SensorTrigger
};
//Usually the client btCollisionObject or Rigidbody class
void* m_clientObject;
short int m_collisionFilterGroup;
short int m_collisionFilterMask;
void* m_multiSapParentProxy;
int m_uniqueId;//m_uniqueId is introduced for paircache. could get rid of this, by calculating the address offset etc.
btVector3 m_aabbMin;
btVector3 m_aabbMax;
SIMD_FORCE_INLINE int getUid() const
{
return m_uniqueId;
}
//used for memory pools
btBroadphaseProxy() :m_clientObject(0),m_multiSapParentProxy(0)
{
}
btBroadphaseProxy(const btVector3& aabbMin,const btVector3& aabbMax,void* userPtr,short int collisionFilterGroup, short int collisionFilterMask,void* multiSapParentProxy=0)
:m_clientObject(userPtr),
m_collisionFilterGroup(collisionFilterGroup),
m_collisionFilterMask(collisionFilterMask),
m_aabbMin(aabbMin),
m_aabbMax(aabbMax)
{
m_multiSapParentProxy = multiSapParentProxy;
}
}
;
I've never had this issue before and only started getting it after upgrading to 64 bit and integrating bullet. The only place I am getting issues is where bullet is involved so I suspect the issue is related to that somehow, but I am still super confused about what could make assignments between primitive types not behave as expected.
Thanks
You are requesting a conversion from signed to unsigned. This is pretty straigth-forward:
Your source value is -1. Since the type is short int, on your platform that has bits 0xFFFF.
The target type is unsigned int. -1 cannot be expressed as an unsigned int, but the conversion rule is defined by the standard: Pick the positive value that's congruent to -1 modulo 2N, where N is the number of value bits of the unsigned type.
On your platform, unsigned int has 32 value bits, so the modular representative of -1 modulo 232 is 0xFFFFFFFF.
If your own imaginary rules where to apply, you would want the result 0x0000FFFF, which is 65535, and not related to −1 in any obvious or useful way.
If you do want that conversion, you must perform the modular wrap-around on the short type manually:
short int mo = -1;
unsigned int weird = static_cast<unsigned short int>(mo);
Nutshell: C++ is about values, not about representations.
AFAIK the upper bytes should always remain 0
When promoting from short to int arithmetic shift (also called signed shift) is used,
to answer you question it`s enough to know that it is performed by extension of greatest byte value on number of added bytes;
example:
short b;
int a = b; /* here promotion is performed, mechanism of it can be described by following bitwise operation: */
a = b >> (sizeof(a) - sizeof(b)); // arithmetic shift performed
important to notice that in memory of computer representation of signed and unsigned values can be the same, the only difference in commands generated by compiler:
example:
unsigned short i = -1 // 0xffff
short j = 65535 // 0xffff
so actually signed/unsigned doesn`t matter for result on promotion, arithmetic shift is performed in both cases