General algorithm for reading n bits and padding with zeros - c++

I need a function to read n bits starting from bit x(bit index should start from zero), and if the result is not byte aligned, pad it with zeros. The function will receive uint8_t array on the input, and should return uint8_t array as well. For example, I have file with following contents:
1011 0011 0110 0000
Read three bits from the third bit(x=2,n=3); Result:
1100 0000
There's no (theoretical) limit on input and bit pattern lengths

Implementing such a bitfield extraction efficiently without beyond the direct bit-serial algorithm isn't precisely hard but a tad cumbersome.
Effectively it boils down to an innerloop reading a pair of bytes from the input for each output byte, shifting the resulting word into place based on the source bit-offset, and writing back the upper or lower byte. In addition the final output byte is masked based on the length.
Below is my (poorly-tested) attempt at an implementation:
void extract_bitfield(unsigned char *dstptr, const unsigned char *srcptr, size_t bitpos, size_t bitlen) {
// Skip to the source byte covering the first bit of the range
srcptr += bitpos / CHAR_BIT;
// Similarly work out the expected, inclusive, final output byte
unsigned char *endptr = &dstptr[bitlen / CHAR_BIT];
// Truncate the bit-positions to offsets within a byte
bitpos %= CHAR_BIT;
bitlen %= CHAR_BIT;
// Scan through and write out a correctly shifted version of every destination byte
// via an intermediate shifter register
unsigned long accum = *srcptr++;
while(dstptr <= endptr) {
accum = accum << CHAR_BIT | *srcptr++;
*dstptr++ = accum << bitpos >> CHAR_BIT;
}
// Mask out the unwanted LSB bits not covered by the length
*endptr &= ~(UCHAR_MAX >> bitlen);
}
Beware that the code above may read past the end of the source buffer and somewhat messy special handling is required if you can't set up the overhead to allow this. It also assumes sizeof(long) != 1.
Of course to get efficiency out of this you will want to use as wide of a native word as possible. However if the target buffer necessarily word-aligned then things get even messier. Furthermore little-endian systems will need byte swizzling fix-ups.
Another subtlety to take heed of is the potential inability to shift a whole word, that is shift counts are frequently interpreted modulo the word length.
Anyway, happy bit-hacking!

Basically it's still a bunch of shift and addition operations.
I'll use a slightly larger example to demonstrate this.
Suppose we are give an input of 4 characters, and x = 10, n = 18.
00101011 10001001 10101110 01011100
First we need to locate the character contains our first bit, by x / 8, which gives us 1 (the second character) in this case. We also need the offset in that character, by x % 8, which equals to 2.
Now we can get out first character of the solution in three operations.
Left shift the second character 10001001 with 2 bits, gives us 00100100.
Right shift the third character 10101110 with 6 (comes from 8 - 2) bits, gives us 00000010.
Add these two characters gives us the first character in your return string, gives 00100110.
Loop this routine for n / 8 rounds. And if n % 8 is not 0, extract that many bits from the next character, you can do it in many approaches.
So in this example, our second round will give us 10111001, and the last step we get 10, then pad the rest bits with 0s.

Related

16-bit to 10-bit conversion code explanation

I came across the following code to convert 16-bit numbers to 10-bit numbers and store it inside an integer. Could anyone maybe explain to me what exactly is happening with the AND 0x03?
// Convert the data to 10-bits
int xAccl = (((data[1] & 0x03) * 256) + data[0]);
if(xAccl > 511) {
xAccl -= 1024;
}
Link to where I got the code: https://www.instructables.com/id/Measurement-of-Acceleration-Using-ADXL345-and-Ardu/
The bitwise operator & will make a mask, so in this case, it voids the 6 highest bits of the integer.
Basically, this code does a modulo % 1024 (for unsigned values).
data[1] takes the 2nd byte; & 0x03 masks that byte with binary 11 - so: takes 2 bits; * 256 is the same as << 8 - i.e. pushes those 2 bits into the 9th and 10th positions; adding data[0] to data combines these two bytes (personally I'd have used |, not +).
So; xAccl is now the first 10 bits, using big-endian ordering.
The > 511 seems to be a sign check; essentially, it is saying "if the 10th bit is set, treat the entire thing as a negative integer as though we'd used 10-bit twos complement rules".

Use bit manipulation to convert a bit from each byte in an 8-byte number to a single byte

I have a 64-bit unsigned integer. I want to check the 6th bit of each byte and return a single byte representing those 6th bits.
The obvious, "brute force" solution is:
inline const unsigned char Get6thBits(unsigned long long num) {
unsigned char byte(0);
for (int i = 7; i >= 0; --i) {
byte <<= 1;
byte |= bool((0x20 << 8 * i) & num);
}
return byte;
}
I could unroll the loop into a bunch of concatenated | statements to avoid the int allocation, but that's still pretty ugly.
Is there a faster, more clever way to do it? Maybe use a bitmask to get the 6th bits, 0x2020202020202020 and then somehow convert that to a byte?
If _pext_u64 is a possibility (this will work on Haswell and newer, it's very slow on Ryzen though), you could write this:
int extracted = _pext_u64(num, 0x2020202020202020);
This is a really literal way to implement it. pext takes a value (first argument) and a mask (second argument), at every position that the mask has a set bit it takes the corresponding bit from the value, and all bits are concatenated.
_mm_movemask_epi8 is more widely usable, you could use it like this:
__m128i n = _mm_set_epi64x(0, num);
int extracted = _mm_movemask_epi8(_mm_slli_epi64(n, 2));
pmovmskb takes the high bit of every byte in its input vector and concatenates them. The bits we want are not the high bit of every byte, so I move them up two positions with psllq (of course you could shift num directly). The _mm_set_epi64x is just some way to get num into a vector.
Don't forget to #include <intrin.h>, and none of this was tested.
Codegen seems reasonable enough
A weirder option is gathering the bits with a multiplication: (only slightly tested)
int extracted = (num & 0x2020202020202020) * 0x08102040810204 >> 56;
The idea here is that num & 0x2020202020202020 only has very few bits set, so we can arrange a product that never carries into bits that we need (or indeed at all). The multiplier is constructed to do this:
a0000000b0000000c0000000d0000000e0000000f0000000g0000000h0000000 +
0b0000000c0000000d0000000e0000000f0000000g0000000h00000000000000 +
00c0000000d0000000e0000000f0000000g0000000h000000000000000000000 etc..
Then the top byte will have all the bits "compacted" together. The lower bytes actually have something like that too, but they're missing bits that would have to come from "higher" (bits can only move to the left in a multiplication).

Unsure of normalising double values loaded as 2 bytes each

The code that I'm using for reading .wav file data into an 2D array:
int signal_frame_width = wavHeader.SamplesPerSec / 100; //10ms frame
int total_number_of_frames = numSamples / signal_frame_width;
double** loadedSignal = new double *[total_number_of_frames]; //array that contains the whole signal
int iteration = 0;
int16_t* buffer = new int16_t[signal_frame_width];
while ((bytesRead = fread(buffer, sizeof(buffer[0]), signal_frame_width, wavFile)) > 0)
{
loadedSignal[iteration] = new double[signal_frame_width];
for(int i = 0; i < signal_frame_width; i++){
//value normalisation:
int16_t c = (buffer[i + 1] << 8) | buffer[i];
double normalisedValue = c/32768.0;
loadedSignal[iteration][i] = normalisedValue;
}
iteration++;
}
The problem is in this part, I don't exaclty understand how it works:
int16_t c = (buffer[i + 1] << 8) | buffer[i];
It's example taken from here.
I'm working on 16bit .wav files only. As you can see, my buffer is loading (for ex. sampling freq. = 44.1kHz) 441 elements (each is 2byte signed sample). How should I change above code?
The original example, from which you constructed your code, used an array where each individual element represented a byte. It therefore needs to combine two consecutive bytes into a 16-bit value, which is what this line does:
int16_t c = (buffer[i + 1] << 8) | buffer[i];
It shifts the byte at index i+1 (here assumed to be the most significant byte) left by 8 positions, and then ORs the byte at index i onto that. For example, if buffer[i+1]==0x12 and buffer[i]==0x34, then you get
buffer[i+1] << 8 == 0x12 << 8 == 0x1200
0x1200 | buffer[i] == 0x1200 | 0x34 == 0x1234
(The | operator is a bitwise OR.)
Note that you need to be careful whether your WAV file is little-endian or big-endian (but the original post explains that quite well).
Now, if you store the resulting value in a signed 16-bit integer, you get a value between −32768 and +32767. The point in the actual normalization step (dividing by 32768) is just to bring the value range down to [−1.0, 1.0).
In your case above, you appear to already be reading into a buffer of 16-bit values. Note that your code will therefore only work if the endianness of your platform matches that of the WAV file you are working with. But if this assumption is correct, then you don't need the code line which you do not understand. You can just convert every array element into a double directly:
double normalisedValue = buffer[i]/32768.0;
If buffer was an array of bytes, then that piece of code would interpret two consecutive bytes as a single 16-bit integer (assuming little-endian encoding). The | operator will perform a bit-wise OR on the bits of the two bytes. Since we wish to interpret the two bytes as a single 2-byte integer, then we must shift the bits of one of them 8 bits (1 byte) to the left. Which one depends on whether they are ordered in little-endian or big-endian order. Little-endian means that the least significant byte comes first, so we shift the second byte 8 bits to the left.
Example:
First byte: 0101 1100
Second byte: 1111 0100
Now shift second byte:
Second "byte": 1111 0100 0000 0000
First "byte": 0000 0000 0101 1100
Bitwise OR-operation (if either is 1, then 1. If both are 0, then 0):
16-bit integer: 1111 0100 0101 1100
In your case however, the bytes in your file have already been interpreted as 16-bit ints using whatever endianness the platform has. So you do not need this step. However, in order to correctly interpret the bytes in the file, one must assume the same byte-order as they were written in. Therefore, one usually adds this step to ensure that the code works independent of the endianness of the platform, instead relying on the expected byte-order of the files (as most file formats will specify what the byte-order should be).

Shift left/right adding zeroes/ones and dropping first bits

I've got to program a function that receives
a binary number like 10001, and
a decimal number that indicates how many shifts I should perform.
The problem is that if I use the C++ operator <<, the zeroes are pushed from behind but the first numbers aren't dropped... For example
shifLeftAddingZeroes(10001,1)
returns 100010 instead of 00010 that is what I want.
I hope I've made myself clear =P
I assume you are storing that information in int. Take into consideration, that this number actually has more leading zeroes than what you see, ergo your number is most likely 16 bits, meaning 00000000 00000001 . Maybe try AND-ing it with number having as many 1 as the number you want to have after shifting? (Assuming you want to stick to bitwise operations).
What you want is to bit shift and then limit the number of output bits which can be active (hold a value of 1). One way to do this is to create a mask for the number of bits you want, then AND the bitshifted value with that mask. Below is a code sample for doing that, just replace int_type with the type of value your using -- or make it a template type.
int_type shiftLeftLimitingBitSize(int_type value, int numshift, int_type numbits=some_default) {
int_type mask = 0;
for (unsigned int bit=0; bit < numbits; bit++) {
mask += 1 << bit;
}
return (value << numshift) & mask;
}
Your output for 10001,1 would now be shiftLeftLimitingBitSize(0b10001, 1, 5) == 0b00010.
Realize that unless your numbits is exactly the length of your integer type, you will always have excess 0 bits on the 'front' of your number.

C++: Bitwise AND

I am trying to understand how to use Bitwise AND to extract the values of individual bytes.
What I have is a 4-byte array and am casting the last 2 bytes into a single 2 byte value. Then I am trying to extract the original single byte values from that 2 byte value. See the attachment for a screen shot of my code and values.
The problem I am having is I am not able to get the value of the last byte in the 2 byte value.
How would I go about doing this with Bitwise AND?
The problem I am having is I am not able to get the value of the last byte in the 2 byte value.
Your 2byte integer is formed with the values 3 and 4 (since your pointer is to a[1]). As you have already seen in your tests, you can get the 3 by applying the mask 0xFF. Now, to get the 4 you need to remove the lower bits and shift the value. In your example, by using the mask 0xFF00 you effectively remove the 3 from the 16bit number, but you leave the 4 in the high byte of your 2byte number, which is the value 1024 == 2^10 -- 11th bit set, which is the third bit in the second byte (counting from the least representative)
You can shift that result 8 bits to the right to get your 4, or else you can ignore the mask altogether, since by just shifting to the right the lowest bits will disappear:
4 == ( x>>8 )
More interesting results to test bitwise and can be obtained by working with a single number:
int x = 7; // or char, for what matters:
(x & 0x1) == 1;
(x & (0x1<<1) ) == 2; // (x & 0x2)
(x & ~(0x2)) == 5;
You need to add some bit-shifting to convert the masked value from the upper byte to the lower byte.
The problem I am having is I am not able to get the value of the last
byte in the 2 byte value.
Not sure where that "watch" table comes from or if there is more code involved, but it looks to me like the result is correct. Remember, one of them is a high byte and so the value is shifted << 8 places. On a little endian machine, the high byte would be the second one.