How to safely extract a signed field from a uint32_t into a signed number (int or uint32_t) - c++

I have a project in which I am getting a vector of 32-bit ARM instructions, and a part of the instructions (offset values) needs to be read as signed (two's complement) numbers instead of unsigned numbers.
I used a uint32_t vector because all the opcodes and registers are read as unsigned and the whole instruction was 32-bits.
For example:
I have this 32-bit ARM instruction encoding:
uint32_t addr = 0b00110001010111111111111111110110
The last 19 bits are the offset of the branch that I need to read as signed integer branch displacement.
This part: 1111111111111110110
I have this function in which the parameter is the whole 32-bit instruction:
I am shifting left 13 places and then right 13 places again to have only the offset value and move the other part of the instruction.
I have tried this function casting to different signed variables, using different ways of casting and using other c++ functions, but it prints the number as it was unsigned.
int getCat1BrOff(uint32_t inst)
{
uint32_t temp = inst << 13;
uint32_t brOff = temp >> 13;
return (int)brOff;
}
I get decimal number 524278 instead of -10.
The last option that I think is not the best one, but it may work is to set all the binary values in a string. Invert the bits and add 1 to convert them and then convert back the new binary number into decimal. As I would of do it in a paper, but it is not a good solution.

It boils down to doing a sign extension where the sign bit is the 19th one.
There are two ways.
Use arithmetic shifts.
Detect sign bit and or with ones at high bits.
There is no portable way to do 1. in C++. But it can be checked on compilation time. Please correct me if the code below is UB, but I believe it is only implementation defined - for which we check at compile time.
The only questionable thing is conversion of unsigned to signed which overflows, and the right shift, but that should be implementation defined.
int getCat1BrOff(uint32_t inst)
{
if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
{
return int32_t(inst << uint32_t{13}) >> int32_t{13};
}
else
{
int32_t offset = inst & 0x0007FFFF;
if (offset & 0x00040000)
{
offset |= 0xFFF80000;
}
return offset;
}
}
or a more generic solution
template <uint32_t N>
int32_t signExtend(uint32_t value)
{
static_assert(N > 0 && N <= 32);
constexpr uint32_t unusedBits = (uint32_t(32) - N);
if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
{
return int32_t(value << unusedBits) >> int32_t(unusedBits);
}
else
{
constexpr uint32_t mask = uint32_t(0xFFFFFFFFu) >> unusedBits;
value &= mask;
if (value & (uint32_t(1) << (N-1)))
{
value |= ~mask;
}
return int32_t(value);
}
}
https://godbolt.org/z/rb-rRB

In practice, you just need to declare temp as signed:
int getCat1BrOff(uint32_t inst)
{
int32_t temp = inst << 13;
return temp >> 13;
}
Unfortunately this is not portable:
For negative a, the value of a >> b is implementation-defined (in most
implementations, this performs arithmetic right shift, so that the
result remains negative).
But I have yet to meet a compiler that doesn't do the obvious thing here.

Related

How to deal with the sign bit of integer representations with odd bit counts?

Let's assume we have a representation of -63 as signed seven-bit integer within a uint16_t. How can we convert that number to float and back again, when we don't know the representation type (like two's complement).
An application for such an encoding could be that several numbers are stored in one int16_t. The bit-count could be known for each number and the data is read/written from a third-party library (see for example the encoding format of tivxDmpacDofNode() here: https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/tiovx/docs/user_guide/group__group__vision__function__dmpac__dof.html --- but this is just an example). An algorithm should be developed that makes the compiler create the right encoding/decoding independent from the actual representation type. Of course it is assumed that the compiler uses the same representation type as the library does.
One way that seems to work well, is to shift the bits such that their sign bit coincides with the sign bit of an int16_t and let the compiler do the rest. Of course this makes an appropriate multiplication or division necessary.
Please see this example:
#include <iostream>
#include <cmath>
int main()
{
// -63 as signed seven-bits representation
uint16_t data = 0b1000001;
// Shift 9 bits to the left
int16_t correct_sign_data = static_cast<int16_t>(data << 9);
float f = static_cast<float>(correct_sign_data);
// Undo effect of shifting
f /= pow(2, 9);
std::cout << f << std::endl;
// Now back to signed bits
f *= pow(2, 9);
uint16_t bits = static_cast<uint16_t>(static_cast<int16_t>(f)) >> 9;
std::cout << "Equals: " << (data == bits) << std::endl;
return 0;
}
I have two questions:
This example uses actually a number with known representation type (two's complement) converted by https://www.exploringbinary.com/twos-complement-converter/. Is the bit-shifting still independent from that and would it work also for other representation types?
Is this the canonical and/or most elegant way to do it?
Clarification:
I know the bit width of the integers I would like to convert (please check the link to the TIOVX example above), but the integer representation type is not specified.
The intention is to write code that can be recompiled without changes on a system with another integer representation type and still correctly converts from int to float and/or back.
My claim is that the example source code above does exactly that (except that the example input data is hardcoded and it would have to be different if the integer representation type were not two's complement). Am I right? Could such a "portable" solution be written also with a different (more elegant/canonical) technique?
Your question is ambiguous as to whether you intend to truly store odd-bit integers, or odd-bit floats represented by custom-encoded odd-bit integers. I'm assuming by "not knowing" the bit-width of the integer, that you mean that the bit-width isn't known at compile time, but is discovered at runtime as your custom values are parsed from a file, for example.
Edit by author of original post:
The assumption in the original question that the presented code is independent from the actual integer representation type, is wrong (as explained in the comments). Integer types are not specified, for example it is not clear that the leftmost bit is the sign bit. Therefore the presented code also contains assumptions, they are just different (and most probably worse) than the assumption "integer representation type is two's complement".
Here's a simple example of storing an odd-bit integer. I provide a simple struct that let's you decide how many bits are in your integer. However, for simplicity in this example, I used uint8_t which has a maximum of 8-bits obviously. There are several different assumptions and simplifications made here, so if you want help on any specific nuance, please specify more in the comments and I will edit this answer.
One key detail is to properly mask off your n-bit integer after performing 2's complement conversions.
Also please note that I have basically ignored overflow concerns and bit-width switching concerns that may or may not be a problem depending on how you intend to use your custom-width integers and the maximum bit-width you intend to support.
#include <iostream>
#include <string>
struct CustomInt {
int bitCount = 7;
uint8_t value;
uint8_t mask = 0;
CustomInt(int _bitCount, uint8_t _value) {
bitCount = _bitCount;
value = _value;
mask = 0;
for (int i = 0; i < bitCount; ++i) {
mask |= (1 << i);
}
}
bool isNegative() {
return (value >> (bitCount - 1)) & 1;
}
int toInt() {
bool negative = isNegative();
uint8_t tempVal = value;
if (negative) {
tempVal = ((~tempVal) + 1) & mask;
}
int ret = tempVal;
return negative ? -ret : ret;
}
float toFloat() {
return toInt(); //Implied truncation!
}
void setFromFloat(float f) {
int intVal = f; //Implied truncation!
bool negative = f < 0;
if (negative) {
intVal = -intVal;
}
value = intVal;
if (negative) {
value = ((~value) + 1) & mask;
}
}
};
int main() {
CustomInt test(7, 0b01001110); // -50. Would be 78 if this were a normal 8-bit integer
std::cout << test.toFloat() << std::endl;
}

Converting 24 bit integer (2s complement) to 32 bit integer in C++

The dataFile.bin is a binary file with 6-byte records. The first 3
bytes of each record contain the latitude and the last 3 bytes contain
the longitude. Each 24 bit value represents radians multiplied by
0X1FFFFF
This is a task I've been working on. I havent done C++ in years so its taking me way longer than I thought it would -_-. After googling around I saw this algorthim which made sense to me.
int interpret24bitAsInt32(byte[] byteArray) {
int newInt = (
((0xFF & byteArray[0]) << 16) |
((0xFF & byteArray[1]) << 8) |
(0xFF & byteArray[2])
);
if ((newInt & 0x00800000) > 0) {
newInt |= 0xFF000000;
} else {
newInt &= 0x00FFFFFF;
}
return newInt;
}
The problem is a syntax issue I am restricting to working by the way the other guy had programmed this. I am not understanding how I can store the CHAR "data" into an INT. Wouldn't it make more sense if "data" was an Array? Since its receiving 24 integers of information stored into a BYTE.
double BinaryFile::from24bitToDouble(char *data) {
int32_t iValue;
// ****************************
// Start code implementation
// Task: Fill iValue with the 24bit integer located at data.
// The first byte is the LSB.
// ****************************
//iValue +=
// ****************************
// End code implementation
// ****************************
return static_cast<double>(iValue) / FACTOR;
}
bool BinaryFile::readNext(DataRecord &record)
{
const size_t RECORD_SIZE = 6;
char buffer[RECORD_SIZE];
m_ifs.read(buffer,RECORD_SIZE);
if (m_ifs) {
record.latitude = toDegrees(from24bitToDouble(&buffer[0]));
record.longitude = toDegrees(from24bitToDouble(&buffer[3]));
return true;
}
return false;
}
double BinaryFile::toDegrees(double radians) const
{
static const double PI = 3.1415926535897932384626433832795;
return radians * 180.0 / PI;
}
I appreciate any help or hints even if you dont understand a clue or hint will help me alot. I just need to talk to someone.
I am not understanding how I can store the CHAR "data" into an INT.
Since char is a numeric type, there is no problem combining them into a single int.
Since its receiving 24 integers of information stored into a BYTE
It's 24 bits, not bytes, so there are only three integer values that need to be combined.
An easier way of producing the same result without using conditionals is as follows:
int interpret24bitAsInt32(byte[] byteArray) {
return (
(byteArray[0] << 24)
| (byteArray[1] << 16)
| (byteArray[2] << 8)
) >> 8;
}
The idea is to store the three bytes supplied as an input into the upper three bytes of the four-byte int, and then shift it down by one byte. This way the program would sign-extend your number automatically, avoiding conditional execution.
Note on portability: This code is not portable, because it assumes 32-bit integer size. To make it portable use <cstdint> types:
int32_t interpret24bitAsInt32(const std::array<uint8_t,3> byteArray) {
return (
(const_cast<int32_t>(byteArray[0]) << 24)
| (const_cast<int32_t>(byteArray[1]) << 16)
| (const_cast<int32_t>(byteArray[2]) << 8)
) >> 8;
}
It also assumes that the most significant byte of the 24-bit number is stored in the initial element of byteArray, then comes the middle element, and finally the least significant byte.
Note on sign extension: This code automatically takes care of sign extension by constructing the value in the upper three bytes and then shifting it to the right, as opposed to constructing the value in the lower three bytes right away. This additional shift operation ensures that C++ takes care of sign-extending the result for us.
When an unsigned char is casted to an int the higher order bits are filled with 0's
When a signed char is casted to a casted int, the sign bit is extended.
ie:
int x;
char y;
unsigned char z;
y=0xFF
z=0xFF
x=y;
/*x will be 0xFFFFFFFF*/
x=z;
/*x will be 0x000000FF*/
So, your algorithm, uses 0xFF as a mask to remove C' sign extension, ie
0xFF == 0x000000FF
0xABCDEF10 & 0x000000FF == 0x00000010
Then uses bit shifts and logical ands to put the bits in their proper place.
Lastly checks the most significant bit (newInt & 0x00800000) > 0 to decide if completing with 0's or ones the highest byte.
int32_t upperByte = ((int32_t) dataRx[0] << 24);
int32_t middleByte = ((int32_t) dataRx[1] << 16);
int32_t lowerByte = ((int32_t) dataRx[2] << 8);
int32_t ADCdata32 = (((int32_t) (upperByte | middleByte | lowerByte)) >> 8); // Right-shift of signed data maintains signed bit

How to convert an array of bits to a char

I am trying to edit each byte of a buffer by modifying the LSB(Least Significant Bit) according to some requirements.
I am using the unsigned char type for the bytes, so please let me know IF that is correct/wrong.
unsigned char buffer[MAXBUFFER];
Next, i'm using this function
char *uchartob(char s[9], unsigned char u)
which modifies and returns the first parameter as an array of bits. This function works just fine, as the bits in the array represent the second parameter.
Here's where the hassle begins. I am going to point out what I'm trying to do step by step so you guys can let me know where i'm taking the wrong turn.
I am saving the result of the above function (called for each element of the buffer) in a variable
char binary_byte[9]; // array of bits
I am testing the LSB simply comparing it to some flag like above.
if (binary_byte[7]==bit_flag) // i go on and modify it like this
binary_byte[7]=0; // or 1, depending on the case
Next, I'm trying to convert the array of bits binary_byte (it is an array of bits, isn't it?) back into a byte/unsigned char and update the data in the buffer at the same time. I hope I am making myself clear enough, as I am really confused at the moment.
buffer[position_in_buffer]=binary_byte[0]<<7| // actualize the current BYTE in the buffer
binary_byte[1]<<6|
binary_byte[2]<<5|
binary_byte[3]<<4|
binary_byte[4]<<3|
binary_byte[5]<<2|
binary_byte[6]<<1|
binary_byte[7];
Keep in mind that the bit at the position binary_byte[7] may be modified, that's the point of all this.
The solution is not really elegant, but it's working, even though i am really insecure of what i did (I tried to do it with bitwise operators but without success)
The weird thing is when I am trying to print the updated character from the buffer. It has the same bits as the previous character, but it's a completely different one.
My final question is : What effect does changing only the LSB in a byte have? What should I expect?. As you can see, I'm getting only "new" characters even when i shouldn't.
So I'm still a little unsure what you are trying to accomplish here but since you are trying to modify individual bits of a byte I would propose using the following data structure:
union bit_byte
{
struct{
unsigned bit0 : 1;
unsigned bit1 : 1;
unsigned bit2 : 1;
unsigned bit3 : 1;
unsigned bit4 : 1;
unsigned bit5 : 1;
unsigned bit6 : 1;
unsigned bit7 : 1;
} bits;
unsigned char all;
};
This will allow you to access each bit of your byte and still get your byte representation. Here some quick sample code:
bit_byte myValue;
myValue.bits.bit0 = 1; // Set the LSB
// Test the LSB
if(myValue.bits.bit0 == 1) {
myValue.bits.bit7 = 1;
}
printf("%i", myValue.all);
bitwise:
set bit => a |= 1 << x;
reset bit => a &= ~(1 << x);
bit check => a & (1 << x);
flip bit => a ^= (1 << x)
If you can not manage this you can always use std::bitset.
Helper macros:
#define SET_BIT(where, bit_number) ((where) |= 1 << (bit_number))
#define RESET_BIT(where, bit_number) ((where) &= ~(1 << (bit_number)))
#define FLIP_BIT(where, bit_number) ((where) ^= 1 << (bit_number))
#define GET_BIT_VALUE(where, bit_number) (((where) & (1 << (bit_number))) >> bit_number) //this will retun 0 or 1
Helper application to print bits:
#include <iostream>
#include <cstdint>
#define GET_BIT_VALUE(where, bit_number) (((where) & (1 << (bit_number))) >> bit_number)
template<typename T>
void print_bits(T const& value)
{
for(uint8_t bit_count = 0;
bit_count < (sizeof(T)<<3);
++bit_count)
{
std::cout << GET_BIT_VALUE(value, bit_count) << std::endl;
}
}
int main()
{
unsigned int f = 8;
print_bits(f);
}

How to get values from unaligned memory in a standard way?

I know C++11 has some standard facilities which would allow to get integral values from unaligned memory. How could something like this be written in a more standard way?
template <class R>
inline R get_unaligned_le(const unsigned char p[], const std::size_t s) {
R r = 0;
for (std::size_t i = 0; i < s; i++)
r |= (*p++ & 0xff) << (i * 8); // take the first 8-bits of the char
return r;
}
To take the values stored in litte-endian order, you can then write:
uint_least16_t value1 = get_unaligned_le<uint_least16_t > (&buffer[0], 2);
uint_least32_t value2 = get_unaligned_le<uint_least32_t > (&buffer[2], 4);
How did the integral values get into the unaligned memory to begin with?
If they were memcpyed in, then you can use memcpy to get them out.
If they were read from a file or the network, you have to know their
format: how they were written to begin with. If they are four byte
big-endian 2s complement (the usual network format), then something
like:
// Supposes native int is at least 32 bytes...
unsigned
getNetworkInt( unsigned char const* buffer )
{
return buffer[0] << 24
| buffer[1] << 16
| buffer[2] << 8
| buffer[3];
}
This will work for any unsigned type, provided the type you're aiming
for is at least as large as the type you input. For signed, it depends
on just how portable you want to be. If all of your potential target
machines are 2's complement, and will have an integral type with the
same size as your input type, then you can use exactly the same code as
above. If your native machine is a 1's complement 36 bit machine (e.g.
a Unisys mainframe), and you're reading signed network format integers
(32 bit 2's complement), you'll need some additional logic.
As always, create the desired variable and populate it byte-wise:
#include <algorithm>
#include <type_traits>
template <typename R>
R get(unsigned char * p, std::size_t len = sizeof(R))
{
assert(len >= sizeof(R) && std::is_trivially_copyable<R>::value);
R result;
std::copy(p, p + sizeof(R), static_cast<unsigned char *>(&result));
return result;
}
This only works universally for trivially copyable types, though you can probably use it for on-trivial types if you have additional guarantees from elsewhere.

Writing values as an arbitrary amount of bits into a byte buffer in C++

Hey, I need to pack bit values into a byte buffer in C++. My Buffer class has a char array and a position, similar to Java's ByteBuffer. I need a good way to pack bits into this buffer, like so:
void put_bits(int amount, uint32_t value);
It needs to support up to 32 bits. I've seen a solution implemented in Java (that requires start/end access methods before bits can be packed) but I'm not sure how to do this in C++ because the endianness and other low level factors aren't hidden like they are in Java.
I have an inline function declared as endianness() which returns 0 (defined as BIG_ENDIAN) or 1 (defined as LITTLE_ENDIAN) that can be used, but I'm just not sure how to properly pack bits into a byte buffer.
This is the Java version of what I need to implement:
public void writeBits(int numBits, int value) {
int bytePos = bitPosition >> 3;
int bitOffset = 8 - (bitPosition & 7);
bitPosition += numBits;
for(; numBits > bitOffset; bitOffset = 8) {
buffer[bytePos] &= ~ bitMaskOut[bitOffset];
buffer[bytePos++] |= (value >> (numBits-bitOffset)) & bitMaskOut[bitOffset];
numBits -= bitOffset;
}
if(numBits == bitOffset) {
buffer[bytePos] &= ~ bitMaskOut[bitOffset];
buffer[bytePos] |= value & bitMaskOut[bitOffset];
}
else {
buffer[bytePos] &= ~ (bitMaskOut[numBits]<<(bitOffset - numBits));
buffer[bytePos] |= (value&bitMaskOut[numBits]) << (bitOffset - numBits);
}
}
Which requires these two methods as well:
public void initBitAccess() {
bitPosition = currentOffset * 8;
}
public void finishBitAccess() {
currentOffset = (bitPosition + 7) / 8;
}
How should I go about solving this? Thanks.
EDIT: I also still need to be able to write normal bytes before and after writing bits.
Just remove all the public keywords, and I would say that you have your C++ implementation right there.
As long as you use the byte buffer only as such, you can translate the Java code one-to-one. It only gets dangerous if you interpret a byte pointer as another type and try to store a complete int in the byte buffer.
You don't even need the endianness function in this case, since you store a byte in a byte buffer, and there is nothing to convert or adjust size or whatever.