Efficiently removing lower half-byte in a byte array - C++ - c++

I have a long byte array and I want to remove the lower nibble (the lower 4 bits) of every byte and move the rest together such that the result occupies half the space as the input.
For example, if my input is 057ABC23, my output should be 07B2.
My current approach looks like this:
// in is unsigned char*
size_t outIdx = 0;
for(size_t i = 0; i < input_length; i += 8)
{
in[outIdx++] = (in[i ] & 0xF0) | (in[i + 1] >> 4);
in[outIdx++] = (in[i + 2] & 0xF0) | (in[i + 3] >> 4);
in[outIdx++] = (in[i + 4] & 0xF0) | (in[i + 5] >> 4);
in[outIdx++] = (in[i + 6] & 0xF0) | (in[i + 7] >> 4);
}
... where I basically process 8 bytes of input in every loop, to illustrate that I can assume input_length to be divisible by 8 (even though it's probably not faster than processing only 2 bytes per loop). The operation is done in-place, overwriting the input array.
Is there a faster way to do this? For example, since I can read in 8 bytes at a time anyway, the operation could be done on 4-byte or 8-byte integers instead of individual bytes, but I cannot think of a way to do that. The compiler doesn't come up with something itself either, as I can see the output code still operates on bytes (-O3 seems to do some loop unrolling, but that's it).
I don't have control over the input, so I cannot store it differently to begin with.

There is a general technique for bit-fiddling to swap bits around. Suppose you have a 64-bit number, containing the following nibbles:
HxGxFxExDxCxBxAx
Here by x I denote a nibble whose value is unimportant (you want to delete it). The result of your bit-operation should be a 32-bit number HGFEDCBA.
First, delete all the x nibbles:
HxGxFxExDxCxBxAx & *_*_*_*_*_*_*_*_ = H_G_F_E_D_C_B_A_
Here I denote 0 by _, and binary 1111 by * for clarity.
Now, replicate your data:
H_G_F_E_D_C_B_A_ << 4 = _G_F_E_D_C_B_A__
H_G_F_E_D_C_B_A_ | _G_F_E_D_C_B_A__ = HGGFFEEDDCCBBAA_
Notice how some of your target nibbles are together. You need to retain these places, and delete duplicate data.
HGGFFEEDDCCBBAA_ & **__**__**__**__ = HG__FE__DC__BA__
From here, you can extract the result bytes directly, or do another iteration or two of the technique.
Next iteration:
HG__FE__DC__BA__ << 8 = __FE__DC__BA____
HG__FE__DC__BA__ | __FE__DC__BA____ = HGFEFEDCDCBABA__
HGFEFEDCDCBABA__ & ****____****____ = HGFE____DCBA____
Last iteration:
HGFE____DCBA____ << 16 = ____DCBA________
HGFE____DCBA____ | ____DCBA________ = HGFEDCBADCBA____
HGFEDCBADCBA____ >> 32 = ________HGFEDCBA

All x64-86 (and most x86) cpus have SSE2.
For each 16-bit lane do
t = (x & 0x00F0) | (x >> 12).
Then use the pack instruction to truncate each 16-bit lane to 8-bits.
For example, 0xABCD1234 would become 0x00CA0031 then the pack would make it 0xCA31.
#include <emmintrin.h>
void squish_32bytesTo16 (unsigned char* src, unsigned char* dst) {
const __m128i mask = _mm_set1_epi16(0x00F0);
__m128i src0 = _mm_loadu_si128((__m128i*)(void*)src);
__m128i src1 = _mm_loadu_si128((__m128i*)(void*)(src + sizeof(__m128i)));
__m128i t0 = _mm_or_si128(_mm_and_si128(src0, mask), _mm_srli_epi16(src0, 12));
__m128i t1 = _mm_or_si128(_mm_and_si128(src1, mask), _mm_srli_epi16(src1, 12));
_mm_storeu_si128((__m128i*)(void*)dst, _mm_packus_epi16(t0, t1));
}

Just to put the resulting code here for future reference, it now looks like this (assuming the system is little endian, and the input length is a multiple of 8 bytes):
void compress(unsigned char* in, size_t input_length)
{
unsigned int* inUInt = reinterpret_cast<unsigned int*>(in);
unsigned long long* inULong = reinterpret_cast<unsigned long long*>(in);
for(size_t i = 0; i < input_length / 8; ++i)
{
unsigned long long value = inULong[i] & 0xF0F0F0F0F0F0F0F0;
value = (value >> 4) | (value << 8);
value &= 0xFF00FF00FF00FF00;
value |= (value << 8);
value &= 0xFFFF0000FFFF0000;
value |= (value << 16);
inUInt[i] = static_cast<unsigned int>(value >> 32);
}
}
Benchmarked very roughly it's around twice as fast as the code in the question (using MSVC19 /O2).
Note that this is basically the solution anatolyg posted before (just put into code), so upvote that answer instead if you found this helpful.

Related

Reverse nibbles of a hexadecimal number in C++

What would be the fastest way possible to reverse the nibbles (e.g digits) of a hexadecimal number in C++?
Here's an example of what I mean : 0x12345 -> 0x54321
Here's what I already have:
unsigned int rotation (unsigned int hex) {
unsigned int result = 0;
while (hex) {
result = (result << 4) | (hex & 0xF);
hex >>= 4;
}
return result;
}
This problem can be split into two parts:
Reverse the nibbles of an integer. Reverse the bytes, and swap the nibble within each byte.
Shift the reversed result right by some amount to adjust for the "variable length". There are std::countl_zero(x) & -4 (number of leading zeroes, rounded down to a multiple of 4) leading zero bits that are part of the leading zeroes in hexadecimal, shifting right by that amount makes them not participate in the reversal.
For example, using some of the new functions from <bit>:
#include <stdint.h>
#include <bit>
uint32_t reverse_nibbles(uint32_t x) {
// reverse bytes
uint32_t r = std::byteswap(x);
// swap adjacent nibbles
r = ((r & 0x0F0F0F0F) << 4) | ((r >> 4) & 0x0F0F0F0F);
// adjust for variable-length of input
int len_of_zero_prefix = std::countl_zero(x) & -4;
return r >> len_of_zero_prefix;
}
That requires C++23 for std::byteswap which may be a bit optimistic, you can substitute it with some other byteswap.
Easily adaptable to uint64_t too.
i would do it without loops based on the assumption that the input is 32 bits
result = (hex & 0x0000000f) << 28
| (hex & 0x000000f0) << 20
| (hex & 0x00000f00) << 12
....
dont know if faster, but I find it more readable

8-digit BCD check

I've a 8-digit BCD number and need to check it out to see if it is a valid BCD number. How can I programmatically (C/C++) make this?
Ex: 0x12345678 is valid, but 0x00f00abc isn't.
Thanks in advance!
You need to check each 4-bit quantity to make sure it's less than 10. For efficiency you want to work on as many bits as you can at a single time.
Here I break the digits apart to leave a zero between each one, then add 6 to each and check for overflow.
uint32_t highs = (value & 0xf0f0f0f0) >> 4;
uint32_t lows = value & 0x0f0f0f0f;
bool invalid = (((highs + 0x06060606) | (lows + 0x06060606)) & 0xf0f0f0f0) != 0;
Edit: actually we can do slightly better. It doesn't take 4 bits to detect overflow, only 1. If we divide all the digits by 2, it frees a bit and we can check all the digits at once.
uint32_t halfdigits = (value >> 1) & 0x77777777;
bool invalid = ((halfdigits + 0x33333333) & 0x88888888) != 0;
The obvious way to do this is:
/* returns 1 if x is valid BCD */
int
isvalidbcd (uint32_t x)
{
for (; x; x = x>>4)
{
if ((x & 0xf) >= 0xa)
return 0;
}
return 1;
}
This link tells you all about BCD, and recommends something like this asa more optimised solution (reworking to check all the digits, and hence using a 64 bit data type, and untested):
/* returns 1 if x is valid BCD */
int
isvalidbcd (uint32_t x)
{
return !!(((uint64_t)x + 0x66666666ULL) ^ (uint64_t)x) & 0x111111110ULL;
}
For a digit to be invalid, it needs to be 10-15. That in turn means 8 + 4 or 8+2 - the low bit doesn't matter at all.
So:
long mask8 = value & 0x88888888;
long mask4 = value & 0x44444444;
long mask2 = value & 0x22222222;
return ((mask8 >> 2) & ((mask4 >>1) | mask2) == 0;
Slightly less obvious:
long mask8 = (value>>2);
long mask42 = (value | (value>>1);
return (mask8 & mask42 & 0x22222222) == 0;
By shifting before masking, we don't need 3 different masks.
Inspired by #Mark Ransom
bool invalid = (0x88888888 & (((value & 0xEEEEEEEE) >> 1) + (0x66666666 >> 1))) != 0;
// or
bool valid = !((((value & 0xEEEEEEEEu) >> 1) + 0x33333333) & 0x88888888);
Mask off each BCD digit's 1's place, shift right, then add 6 and check for BCD digit overflow.
How this works:
By adding +6 to each digit, we look for an overflow * of the 4-digit sum.
abcd
+ 110
-----
*efgd
But the bit value of d does not contribute to the sum, so first mask off that bit and shift right. Now the overflow bit is in the 8's place. This all is done in parallel and we mask these carry bits with 0x88888888 and test if any are set.
0abc
+ 11
-----
*efg

Bit shifts and their logical operators

This program below moves the last (junior) and the penultimate bytes variable i type int. I'm trying to understand why the programmer wrote this
i = (i & LEADING_TWO_BYTES_MASK) | ((i & PENULTIMATE_BYTE_MASK) >> 8) | ((i & LAST_BYTE_MASK) << 8);
Can anyone explain to me in plain English whats going on in the program below.
#include <stdio.h>
#include <cstdlib>
#define LAST_BYTE_MASK 255 //11111111
#define PENULTIMATE_BYTE_MASK 65280 //1111111100000000
#define LEADING_TWO_BYTES_MASK 4294901760 //11111111111111110000000000000000
int main(){
unsigned int i = 0;
printf("i = ");
scanf("%d", &i);
i = (i & LEADING_TWO_BYTES_MASK) | ((i & PENULTIMATE_BYTE_MASK) >> 8) | ((i & LAST_BYTE_MASK) << 8);
printf("i = %d", i);
system("pause");
}
Since you asked for plain english: He swaps the first and second bytes of an integer.
The expression is indeed a bit convoluted but in essence the author does this:
// Mask out relevant bytes
unsigned higher_order_bytes = i & LEADING_TWO_BYTES_MASK;
unsigned first_byte = i & LAST_BYTE_MASK;
unsigned second_byte = i & PENULTIMATE_BYTE_MASK;
// Switch positions:
unsigned first_to_second = first_byte << 8;
unsigned second_to_first = second_byte >> 8;
// Concatenate back together:
unsigned result = higher_order_bytes | first_to_second | second_to_first;
Incidentally, defining the masks using hexadecimal notation is more readable than using decimal. Furthermore, using #define here is misguided. Both C and C++ have const:
unsigned const LEADING_TWO_BYTES_MASK = 0xFFFF0000;
unsigned const PENULTIMATE_BYTE_MASK = 0xFF00;
unsigned const LAST_BYTE_MASK = 0xFF;
To understand this code you need to know what &, | and bit shifts are doing on the bit level.
It's more instructive to define your masks in hexadecimal rather than decimal, because then they correspond directly to the binary representations and it's easy to see which bits are on and off:
#define LAST 0xFF // all bits in the first byte are 1
#define PEN 0xFF00 // all bits in the second byte are 1
#define LEAD 0xFFFF0000 // all bits in the third and fourth bytes are 1
Then
i = (i & LEAD) // leave the first 2 bytes of the 32-bit integer the same
| ((i & PEN) >> 8) // take the 3rd byte and shift it 8 bits right
| ((i & LAST) << 8) // take the 4th byte and shift it 8 bits left
);
So the expression is swapping the two least significant bytes while leaving the two most significant bytes the same.

How to convert 8 17-bit integers into 17 8-bit integers efficiently

Okay, I have the following problem: I have a set of 8 (unsigned) numbers that are all 17bit (a.k.a. none of them are any bigger than 131071). Since 17bit numbers are annoying work work with (keeping them in a 32-bit int is a waste of space), I would like to turn these into 17 8-bit numbers, like so:
If I have these 8 17-bit integers:
[25409, 23885, 24721, 23159, 25409, 23885, 24721, 23159]
I would turn them into a base 2 representationL
["00110001101000001", "00101110101001101", "00110000010010001", "00101101001110111", "00110001101000001", "00101110101001101", "00110000010010001", "00101101001110111"]
Then join that into one big string:
"0011000110100000100101110101001101001100000100100010010110100111011100110001101000001001011101010011010011000001001000100101101001110111"
Then split that into 17 strings, each with 8 chars:
["00110001", "10100000", "10010111", "01010011", "01001100", "00010010", "00100101", "10100111", "01110011", "00011010", "00001001", "01110101", "00110100", "11000001", "00100010", "01011010", "01110111"]
And, finally, convert the binary representations back into integers
[49, 160, 151, 83, 76, 18, 37, 167, 115, 26, 9, 117, 52, 193, 34, 90, 119]
This method works, but it's not very efficient, I am looking for something more efficient than this, preferrably coded in C++, since that's the language I am working with. I just can't think of any way to do this more efficient, and 17-bit numbers aren't exactly easy to work with (16-bit numbers would be much nicer to work with).
Thanks in advance, xfbs
Store the lowest 16 bits of each number as-is (i.e. in two bytes). This leaves the most significant bit of each number. Since there are eight such numbers, simply combine the eight bits into one extra byte.
This will require exactly the same amount of memory as your method, but will involve a lot less bit twiddling.
P.S. Regardless of the storage method, you should be using bit-manipulation operators (<<, >>, &, | and so on) to do the job; there should not be any intermediate string-based representations involved.
Have a look at std::bitset<N>. May be you can stuff them into that?
Efficiently? Then don't use string conversions, bitfields, etc. Manage to do shifts yourself to achieve that. (Note that the arrays must be unsigned so that we don't encounter problems when shifting).
uint32 A[8]; //Your input, unsigned int
ubyte B[17]; //Output, unsigned byte
B[0] = (ubyte)A[0];
B[1] = (ubyte)(A[0] >> 8);
B[2] = (ubyte)A[1];
B[3] = (ubyte)(A[1] >> 8);
.
:
And for the last one, we do what ajx said. We take the most significant digit of each number (shifting them 16 bits to the right leaves the 17th digit) and fill the bits of our output by shifting each of the most significant digits from 0 to 7 to the left:
B[16] = (A[0] >> 16) | ((A[1] >> 16) << 1) | ((A[2] >> 16) << 2) | ((A[3] >> 16) << 3) | ... | ((A[7] >> 16) << 7);
Well, "efficient" was this. Other easier methods exist, too.
Though you say they are 17-bit numbers, they must be stored into an array of 32bit integers, where only the less significant 17 bits are used. You can extract from the first directly two bytes (dst[0] = src[0] >> 9 is the first, dst[1] = (src[0] >> 1) & 0xff the second); then you "push" the first bit as the 18th bit of the second, so that
dst[2] = (src[0] & 1) << 7 | src[1] >> 10;
dst[3] = (src[1] >> 2) & 0xff;
if you generalize it, you will see that this "formula" may be applied
dst[2*i] = src[i] >> (9+i) | (src[i-1] & BITS(i)) << (8-i);
dst[2*i + 1] = (src[i] >> (i+1)) & 0xff;
and for the last one: dst[16] = src[7] & 0xff;.
The whole code could look like
dst[0] = src[0] >> 9;
dst[1] = (src[0] >> 1) & 0xff;
for(i = 1; i < 8; i++)
{
dst[2*i] = src[i] >> (9+i) | (src[i-1] & BITS(i)) << (8-i);
dst[2*i + 1] = (src[i] >> (i+1)) & 0xff;
}
dst[16] = src[7] & 0xff;
Likely analysing better the loops, optimizations can be done so that we don't need to treat in a special manner the cases on the boundaries. The BITS macro create a mask of N bits set to 1 (least significant bits). Something like (to be checked for a better way, if any)
#define BITS(I) (~((~0)<<(I)))
ADD
Here I supposed src is e.g. int32_t and dst int8_t or alike.
This is in C, so you can use vector instead.
#define srcLength 8
#define destLength 17
int src[srcLength] = { 25409, 23885, 24721, 23159, 25409, 23885, 24721, 23159 };
unsigned char dest[destLength] = { 0 };
int srcElement = 0;
int bits = 0;
int i = 0;
int j = 0;
do {
while( bits >= srcLength ) {
dest[i++] = srcElement >> (bits - srcLength);
srcElement = srcElement & ((1 << bits) - 1);
bits -= srcLength;
}
if( j < srcLength ) {
srcElement <<= destLength;
bits += destLength;
srcElement |= src[j++];
}
} while (bits > 0);
Disclaimer: if you literally have seventeen integers (and not 100000 groups by 17), you should forget these optimizations as long as your program doesn't run veeery slowly.
I'd probably go about it this way. I don't want to deal with weird types when I'm doing my processing. Maybe I need to store them in some funky formatting due to legacy problems though. The values that are hard-coded should probably be based off of the 17 value, just didn't bother.
struct int_block {
static const uint32 w = 17;
static const uint32 m = 131071;
int_block() : data(151, 0) {} // w * 8 + (sizeof(uint32) - w)
uint32 get(size_t i) const {
uint32 retval = *reinterpret_cast<const uint32 *>( &data[i*w] );
retval &= m;
return retval;
}
void set(size_t i, uint32 val) {
uint32 prev = *reinterpret_cast<const uint32 *>( &data[i*w] );
prev &= ~m;
val |= prev;
*reinterpret_cast<uint32 *>( &data[i*w] ) = val;
}
std::vector<char> data;
};
TEST(int_block_test) {
int_block ib;
for (uint32 i = 0; i < 8; i++)
ib.set(i, i+25);
for (uint32 i = 0; i < 8; i++)
CHECK_EQUAL(i+25, ib.get(i));
}
You'd be able to break this by giving it bad values, but I'll leave that as an exercise for the reader. :))
Quite honestly, I think you'd be happier off representing them as 32-bit integers and just writing conversion functions. But I suspect you don't have control over that.

Given an array of uint8_t what is a good way to extract any subsequence of bits as a uint32_t?

I have run into an interesting problem lately:
Lets say I have an array of bytes (uint8_t to be exact) of length at least one. Now i need a function that will get a subsequence of bits from this array, starting with bit X (zero based index, inclusive) and having length L and will return this as an uint32_t. If L is smaller than 32 the remaining high bits should be zero.
Although this is not very hard to solve, my current thoughts on how to do this seem a bit cumbersome to me. I'm thinking of a table of all the possible masks for a given byte (start with bit 0-7, take 1-8 bits) and then construct the number one byte at a time using this table.
Can somebody come up with a nicer solution? Note that i cannot use Boost or STL for this - and no, it is not a homework, its a problem i run into at work and we do not use Boost or STL in the code where this thing goes. You can assume that: 0 < L <= 32 and that the byte array is large enough to hold the subsequence.
One example of correct input/output:
array: 00110011 1010 1010 11110011 01 101100
subsequence: X = 12 (zero based index), L = 14
resulting uint32_t = 00000000 00000000 00 101011 11001101
Only the first and last bytes in the subsequence will involve some bit slicing to get the required bits out, while the intermediate bytes can be shifted in whole into the result. Here's some sample code, absolutely untested -- it does what I described, but some of the bit indices could be off by one:
uint8_t bytes[];
int X, L;
uint32_t result;
int startByte = X / 8, /* starting byte number */
startBit = 7 - X % 8, /* bit index within starting byte, from LSB */
endByte = (X + L) / 8, /* ending byte number */
endBit = 7 - (X + L) % 8; /* bit index within ending byte, from LSB */
/* Special case where start and end are within same byte:
just get bits from startBit to endBit */
if (startByte == endByte) {
uint8_t byte = bytes[startByte];
result = (byte >> endBit) & ((1 << (startBit - endBit)) - 1);
}
/* All other cases: get ending bits of starting byte,
all other bytes in between,
starting bits of ending byte */
else {
uint8_t byte = bytes[startByte];
result = byte & ((1 << startBit) - 1);
for (int i = startByte + 1; i < endByte; i++)
result = (result << 8) | bytes[i];
byte = bytes[endByte];
result = (result << (8 - endBit)) | (byte >> endBit);
}
Take a look at std::bitset and boost::dynamic_bitset.
I would be thinking something like loading a uint64_t with a cast and then shifting left and right to lose the uninteresting bits.
uint32_t extract_bits(uint8_t* bytes, int start, int count)
{
int shiftleft = 32+start;
int shiftright = 64-count;
uint64_t *ptr = (uint64_t*)(bytes);
uint64_t hold = *ptr;
hold <<= shiftleft;
hold >>= shiftright;
return (uint32_t)hold;
}
For the sake of completness, i'am adding my solution inspired by the comments and answers here. Thanks to all who bothered to think about the problem.
static const uint8_t firstByteMasks[8] = { 0xFF, 0x7F, 0x3F, 0x1F, 0x0F, 0x07, 0x03, 0x01 };
uint32_t getBits( const uint8_t *buf, const uint32_t bitoff, const uint32_t len, const uint32_t bitcount )
{
uint64_t result = 0;
int32_t startByte = bitoff / 8; // starting byte number
int32_t endByte = ((bitoff + bitcount) - 1) / 8; // ending byte number
int32_t rightShift = 16 - ((bitoff + bitcount) % 8 );
if ( endByte >= len ) return -1;
if ( rightShift == 16 ) rightShift = 8;
result = buf[startByte] & firstByteMasks[bitoff % 8];
result = result << 8;
for ( int32_t i = startByte + 1; i <= endByte; i++ )
{
result |= buf[i];
result = result << 8;
}
result = result >> rightShift;
return (uint32_t)result;
}
Few notes: i tested the code and it seems to work just fine, however, there may be bugs. If i find any, i will update the code here. Also, there are probably better solutions!