I have a multi-byte primitive type called s32 which I want to read from a byte array.
The specifications are:
It is a 32-bit signed integer value, stored in little-endian order.
Negative integers are represented using 2's complement.
It uses 1 to 5 bytes depending on the magnitude. Each byte contributes its low seven bits to the value. If the high (8th) bit is set, then the next byte is also a part of the value.
Sign extension is applied: the seventh bit of the last byte of the encoding is propagated to fill out the 32 bits of the decoded value.
In the case of U32 - unsigned 32-bit I come up with this (any comments welcomed!) but not sure how to modify it for S32.
char temp = 0;
u32 value = 0;
size_t index = 0;
for(int i = 0; i < 5; i++)
{
if(i < 4)
{
temp = 0x7F & buffer[index];
}
else
{
temp = 0x0F & buffer[index];
}
value |= temp << (7 * i);
if(!(0x80 & buffer[index])) break;
++index;
}
Thanks everyone!
Are you working on a little-endian system?
If so following should do the trick.
if(!(0x80 & buffer[index]))
{
if(0x40 & buffer[index]) value = -value;
break;
}
If you need the negative of a little endian value on a big endian system, then it is a bit more tricky, but that requirement would seem very strange to me
I posted a SIGN_EXTEND macro in an answer to this question. For your code, I'd change your u32 value to s32 value, and apply SIGN_EXTEND as
// after loop close
SIGN_EXTEND(value, index * 7, u32);
using the accepted answer for the question, you'd say:
if(value > (1 << (index * 7 - 1))
value -= (1 << (index * 7));
Related
My task seems simple, I need to calculate the minimum number of bytes required to represent a variable integer (for example if the integer is 5, then I would like to return 1; if the integer is 300, I would like to return 2). Not referring to the data type int which is, as pointed out in comments, always just sizeof(int), I'm referring to a mathematical integer. And I almost have a solution. Here is my code:
int i;
cin >> i;
int length = 0;
while (i != 0) {
i >>= 8;
length++;
}
The problem is that this doesn’t work for any negative numbers (I have not been able to determine why) or some positive numbers where the most significant bit is a 0 (because the sign bit is the bit that makes it one bit larger)... Is there any hints or advice I can get in how to account for those cases?
Stored as a single byte,
Positive numbers are in the range 0x00 to 0x7F
Negative numbers are in the range 0x80 to 0xFF
As 2-bytes,
Positive numbers are in the range 0x0000 to 0x7FFF
Negative numbers are in the range 0x8000 to 0xFFFF
As 4-bytes,
Positive numbers are in the range 0x00000000 to 0x7FFFFFFF
Negative numbers are in the range 0x80000000 to 0xFFFFFFFF
You can use a function like the following to get the minimum size:
int getmin(int64_t i)
{
if(i == (int8_t)(i & 0xFF))
return 1;
if(i == (int16_t)(i & 0xFFFF))
return 2;
if(i == (int32_t)(i & 0xFFFFFFFF))
return 4;
return 8;
}
Then for example, when you see 0x80, translate it to -128. While 7F is translated to 127, and 0x801 should be translated to a positive number.
Note that this will be very difficult and rather pointless, it should be avoided. This doesn't accommodate storage of numbers in triple bytes, for that, you have to make up your own format.
The range of signed numbers possible to store in x bytes in 2's complement is -2^(8*x-1) to 2^(8*x-1)-1. For example, 1 byte can store signed integers from -128 to 127. Your example would incorrectly calculate that only 1 byte is needed to represent 128 (if we are talking about signed numbers), as right shifting by 8 would equal zero, but that last byte is required to know that this is not a negative number.
For handling negatives, turning it into a positive number and subtracting one (because negative numbers can store an extra value) will allow you to right shift.
int i;
cin >> i;
unsigned bytes = 1;
unsigned max = 128;
if (i < 0) {
i = ~i; //-1*i - 1
}
while(max <= i) {
i >>= 8;
bytes++;
}
cout << bytes;
Another option is to use __builtin_clz() if you are using gcc. This returns the leading zeros, which you can then use to determine the minimum number of bytes.
can someone please explain what this code is doing? i have to interpret this code and use it as a checksum code, but i am not sure if it is absolutely correct. Especially how the overflows are working and what *cp, const char* cp and sum & 0xFFFF mean? The basic idea was to take an input as string from user, convert it to binary form 16 bits at a time. Then sum all the multiple 16 bits together (in binary) and get a 16 bit sum. If there is any overflow bit in the addition, add that to lsb of final sum. Then take a ones complement of the result.
How close is this code to doing the above?
unsigned int packet::calculateChecksum()
{
unsigned int c = 0;
int i;
string j;
int k;
cout<< "enter a message" << message;
getline(cin, message) ; // Some string.
//std::string message =
std::vector<uint16_t> bitvec;
const char* cp = message.c_str()+1;
while (*cp) {
uint16_t bits = *(cp-1)>>8 + *(cp);
bitvec.push_back(bits);
cp += 2;
}
uint32_t sum=0;
uint16_t overflow=0;
uint32_t finalsum =0;
// Compute the sum. Let overflows accumulate in upper 16 bits.
for(auto j = bitvec.begin(); j != bitvec.end(); ++j)
sum += *j;
// Now fold the overflows into the lower 16 bits. Loop until no overflows.
do {
sum = (sum & 0xFFFF) + (sum >> 16);
} while (sum > 0xFFFF);
// Return the 1s complement sum in finalsum
finalsum = 0xFFFF & sum;
//cout<< "the finalsum is" << c;
c = finalsum;
return c;
}
I see several issues in the code:
cp is a pointer to zero ended char array holding the input message. The while(*cp) will have problem as inside the while loop body cp is incremented by 2!!! So it's fairly easy to skip the ending \0 of the char array (e.g. the input message has 2 characters) and result in a segmentation fault.
*(cp) and *(cp-1) fetch the two neighbouring characters (bytes) in the input message. But why the two-bytes word is formed by *(cp-1)>>8 + *(cp)? I think it would make sense to formed the 16bits word by *(cp-1)<<8 + *(cp) i.e. the preceding character sits on the higher byte and the following character sits on the lower byte of the 16bits word.
To answer your question sum & 0xFFFF just means calculating a number where the higher 16 bits are zero and the lower 16 bits are the same as in sum. the 0xFFFF is a bit mask.
The funny thing is, even the above code might not doing the exact thing you mentioned as requirement, as long as the sending and receiving party are using the same piece of incorrect code, your checksum creation and verification will pass, as both ends are consistent with each other:)
I have a vector<char> and I want to be able to get an unsigned integer from a range of bits within the vector. E.g.
And I can't seem to be able to write the correct operations to get the desired output. My intended algorithm goes like this:
& the first byte with (0xff >> unused bits in byte on the left)
<< the result left the number of output bytes * number of bits in a byte
| this with the final output
For each subsequent byte:
<< left by the (byte width - index) * bits per byte
| this byte with the final output
| the final byte (not shifted) with the final output
>> the final output by the number of unused bits in the byte on the right
And here is my attempt at coding it, which does not give the correct result:
#include <vector>
#include <iostream>
#include <cstdint>
#include <bitset>
template<class byte_type = char>
class BitValues {
private:
std::vector<byte_type> bytes;
public:
static const auto bits_per_byte = 8;
BitValues(std::vector<byte_type> bytes) : bytes(bytes) {
}
template<class return_type>
return_type get_bits(int start, int end) {
auto byte_start = (start - (start % bits_per_byte)) / bits_per_byte;
auto byte_end = (end - (end % bits_per_byte)) / bits_per_byte;
auto byte_width = byte_end - byte_start;
return_type value = 0;
unsigned char first = bytes[byte_start];
first &= (0xff >> start % 8);
return_type first_wide = first;
first_wide <<= byte_width;
value |= first_wide;
for(auto byte_i = byte_start + 1; byte_i <= byte_end; byte_i++) {
auto byte_offset = (byte_width - byte_i) * bits_per_byte;
unsigned char next_thin = bytes[byte_i];
return_type next_byte = next_thin;
next_byte <<= byte_offset;
value |= next_byte;
}
value >>= (((byte_end + 1) * bits_per_byte) - end) % bits_per_byte;
return value;
}
};
int main() {
BitValues<char> bits(std::vector<char>({'\x78', '\xDA', '\x05', '\x5F', '\x8A', '\xF1', '\x0F', '\xA0'}));
std::cout << bits.get_bits<unsigned>(15, 29) << "\n";
return 0;
}
(In action: http://coliru.stacked-crooked.com/a/261d32875fcf2dc0)
I just can't seem to wrap my head around these bit manipulations, and I find debugging very difficult! If anyone can correct the above code, or help me in any way, it would be much appreciated!
Edit:
My bytes are 8 bits long
The integer to return could be 8,16,32 or 64 bits wside
The integer is stored in big endian
You made two primary mistakes. The first is here:
first_wide <<= byte_width;
You should be shifting by a bit count, not a byte count. Corrected code is:
first_wide <<= byte_width * bits_per_byte;
The second mistake is here:
auto byte_offset = (byte_width - byte_i) * bits_per_byte;
It should be
auto byte_offset = (byte_end - byte_i) * bits_per_byte;
The value in parenthesis needs to be the number of bytes to shift right by, which is also the number of bytes byte_i is away from the end. The value byte_width - byte_i has no semantic meaning (one is a delta, the other is an index)
The rest of the code is fine. Though, this algorithm has two issues with it.
First, when using your result type to accumulate bits, you assume you have room on the left to spare. This isn't the case if there are set bits near the right boundry and the choice of range causes the bits to be shifted out. For example, try running
bits.get_bits<uint16_t>(11, 27);
You'll get the result 42 which corresponds to the bit string 00000000 00101010 The correct result is 53290 with the bit string 11010000 00101010. Notice how the rightmost 4 bits got zeroed out. This is because you start off by overshifting your value variable, causing those four bits to be shifted out of the variable. When shifting back at the end, this results in the bits being zeroed out.
The second problem has to do with the right shift at the end. If the rightmost bit of the value variable happens to be a 1 before the right shift at the end, and the template parameter is a signed type, then the right shift that is done is an 'arithmetic' right shift, which causes bits on the right to be 1-filled, leaving you with an incorrect negative value.
Example, try running:
bits.get_bits<int16_t>(5, 21);
The expected result should be 6976 with the bit string 00011011 01000000, but the current implementation returns -1216 with the bit string 11111011 01000000.
I've put my implementation of this below which builds the bit string from the right to the left, placing bits in their correct positions to start with so that the above two problems are avoided:
template<class ReturnType>
ReturnType get_bits(int start, int end) {
int max_bits = kBitsPerByte * sizeof(ReturnType);
if (end - start > max_bits) {
start = end - max_bits;
}
int inclusive_end = end - 1;
int byte_start = start / kBitsPerByte;
int byte_end = inclusive_end / kBitsPerByte;
// Put in the partial-byte on the right
uint8_t first = bytes_[byte_end];
int bit_offset = (inclusive_end % kBitsPerByte);
first >>= 7 - bit_offset;
bit_offset += 1;
ReturnType ret = 0 | first;
// Add the rest of the bytes
for (int i = byte_end - 1; i >= byte_start; i--) {
ReturnType tmp = (uint8_t) bytes_[i];
tmp <<= bit_offset;
ret |= tmp;
bit_offset += kBitsPerByte;
}
// Mask out the partial byte on the left
int shift_amt = (end - start);
if (shift_amt < max_bits) {
ReturnType mask = (1 << shift_amt) - 1;
ret &= mask;
}
}
There is one thing you certainly missed I think: the way you index the bits in the vector is different from what you have been given in the problem. I.e. with algorithm you outlined, the order of the bits will be like 7 6 5 4 3 2 1 0 | 15 14 13 12 11 10 9 8 | 23 22 21 .... Frankly, I didn't read through your whole algorithm, but this one was missed in the very first step.
Interesting problem. I've done similar, for some systems work.
Your char is 8 bits wide? Or 16? How big is your integer? 32 or 64?
Ignore the vector complexity for a minute.
Think about it as just an array of bits.
How many bits do you have? You have 8*number of chars
You need to calculate a starting char, number of bits to extract, ending char, number of bits there, and number of chars in the middle.
You will need bitwise-and & for the first partial char
you will need bitwise-and & for the last partial char
you will need left-shift << (or right-shift >>), depending upon which order you start from
what is the endian-ness of your Integer?
At some point you will calculate an index into your array that is bitindex/char_bit_width, you gave the value 171 as your bitindex, and 8 as your char_bit_width, so you will end up with these useful values calculated:
171/8 = 23 //location of first byte
171%8 = 3 //bits in first char/byte
8 - 171%8 = 5 //bits in last char/byte
sizeof(integer) = 4
sizeof(integer) + ( (171%8)>0?1:0 ) // how many array positions to examine
Some assembly required...
I have run into an interesting problem lately:
Lets say I have an array of bytes (uint8_t to be exact) of length at least one. Now i need a function that will get a subsequence of bits from this array, starting with bit X (zero based index, inclusive) and having length L and will return this as an uint32_t. If L is smaller than 32 the remaining high bits should be zero.
Although this is not very hard to solve, my current thoughts on how to do this seem a bit cumbersome to me. I'm thinking of a table of all the possible masks for a given byte (start with bit 0-7, take 1-8 bits) and then construct the number one byte at a time using this table.
Can somebody come up with a nicer solution? Note that i cannot use Boost or STL for this - and no, it is not a homework, its a problem i run into at work and we do not use Boost or STL in the code where this thing goes. You can assume that: 0 < L <= 32 and that the byte array is large enough to hold the subsequence.
One example of correct input/output:
array: 00110011 1010 1010 11110011 01 101100
subsequence: X = 12 (zero based index), L = 14
resulting uint32_t = 00000000 00000000 00 101011 11001101
Only the first and last bytes in the subsequence will involve some bit slicing to get the required bits out, while the intermediate bytes can be shifted in whole into the result. Here's some sample code, absolutely untested -- it does what I described, but some of the bit indices could be off by one:
uint8_t bytes[];
int X, L;
uint32_t result;
int startByte = X / 8, /* starting byte number */
startBit = 7 - X % 8, /* bit index within starting byte, from LSB */
endByte = (X + L) / 8, /* ending byte number */
endBit = 7 - (X + L) % 8; /* bit index within ending byte, from LSB */
/* Special case where start and end are within same byte:
just get bits from startBit to endBit */
if (startByte == endByte) {
uint8_t byte = bytes[startByte];
result = (byte >> endBit) & ((1 << (startBit - endBit)) - 1);
}
/* All other cases: get ending bits of starting byte,
all other bytes in between,
starting bits of ending byte */
else {
uint8_t byte = bytes[startByte];
result = byte & ((1 << startBit) - 1);
for (int i = startByte + 1; i < endByte; i++)
result = (result << 8) | bytes[i];
byte = bytes[endByte];
result = (result << (8 - endBit)) | (byte >> endBit);
}
Take a look at std::bitset and boost::dynamic_bitset.
I would be thinking something like loading a uint64_t with a cast and then shifting left and right to lose the uninteresting bits.
uint32_t extract_bits(uint8_t* bytes, int start, int count)
{
int shiftleft = 32+start;
int shiftright = 64-count;
uint64_t *ptr = (uint64_t*)(bytes);
uint64_t hold = *ptr;
hold <<= shiftleft;
hold >>= shiftright;
return (uint32_t)hold;
}
For the sake of completness, i'am adding my solution inspired by the comments and answers here. Thanks to all who bothered to think about the problem.
static const uint8_t firstByteMasks[8] = { 0xFF, 0x7F, 0x3F, 0x1F, 0x0F, 0x07, 0x03, 0x01 };
uint32_t getBits( const uint8_t *buf, const uint32_t bitoff, const uint32_t len, const uint32_t bitcount )
{
uint64_t result = 0;
int32_t startByte = bitoff / 8; // starting byte number
int32_t endByte = ((bitoff + bitcount) - 1) / 8; // ending byte number
int32_t rightShift = 16 - ((bitoff + bitcount) % 8 );
if ( endByte >= len ) return -1;
if ( rightShift == 16 ) rightShift = 8;
result = buf[startByte] & firstByteMasks[bitoff % 8];
result = result << 8;
for ( int32_t i = startByte + 1; i <= endByte; i++ )
{
result |= buf[i];
result = result << 8;
}
result = result >> rightShift;
return (uint32_t)result;
}
Few notes: i tested the code and it seems to work just fine, however, there may be bugs. If i find any, i will update the code here. Also, there are probably better solutions!
I have an array of unsigned chars. Basically I have an array of bits.
I know that the first 16 bits corresponds to an unsigned integer and I retrieve its value using (u16)(*(buffer+ 1) << 8 | *abcBuffer)
Then comes a data type called u30 which is described as follows:
u30 - variable length encoded 30-bit unsigned integer value. The variable encoding for u30 uses one to five bytes, depending on the magnitude of the value encoded. Each byte contributes its low seven bits to the value.If the high (8th) bit of a byte is set then the next byte is also part of the value.
I don't understand this description: it says u30(thirty!) and then it says 1 to 5 bytes? Also I have another data type called s24 - three-byte signed integer value.
How should one read (retrieve their values) such non-typical data types? Any help will be appreciated.
Thanks a lot!
i=0;
val = buf[i]&0x7F;
while (buf[i++]&0x80)
{
val |= (buf[i]&0x7F)<<(i*7);
}
Assuming I understand correctly (always a questionable matter), the following will read the values. It starts at position zero in this example (i would need to be offset by the actual position in the buffer):
unsigned int val;
unsigned char buf[300];
int i;
int shift;
i = 0;
buf[0] = 0x81;
buf[1] = 0x3;
val = 0;
shift = 0;
do
{
val |= (0x7F & buf[i] ) << shift;
shift += 7;
i++;
} while (( buf[i-1] & 0x80 ) && ( i < 5 ));
printf( "Val = %u\n", val );
The encoding format description is somewhat informal perhaps, but should be enough. The idea will be that you read one byte (call it x), you take the lowest 7 bits x & 0x7F and at the same time check if it's highest bit is set. You'll need to write a small loop that merges the 7 bit sequences in a uint variable until the current byte no longer has its highest bit set.
You will have to figure out if you need to merge the new bits at the high end, or the low end of the number (a = (a << 7) | (x & 0x7F)). For that you need one test sequence of which you know what the correct output is.
To read the variable length 30 bit value, you could do something like such:
const char HIGH_BIT = 0x80;
const char DATA_MASK = 0x7F;
const char LAST_MASK = 0x03; // only need 2 bits of last byte
char tmpValue = 0; // tmp holder for value of byte;
int value = 0; holder for the actual value;
char* ptr = buffer; // assume buffer is at the start of the 30 bit number
for(int i = 0; i < 5; i++)
{
if(i == 4)
{
tmpValue = LAST_MASK & *ptr;
}
else
{
tmpValue = DATA_MASK & *ptr;
}
value |= tmpValue << ( 7 * i);
if(!(HIGH_BIT & *ptr))
{
break;
}
if(i != 4)
{
++ptr;
}
}
buff = ptr; // advance the buffer afterwards.
#Mark: your answer was posted while I was typing this, and would work except for the high byte. the value is only 30 bits, so only the first 2 bits of the high byte are used for the value and you are using the full 8 bits of the value.