Extract and combine bits from different bytes c c++ - c++

I have declared an array of bytes:
uint8_t memory[123];
which i have filled with:
memory[0]=0xFF;
memory[1]=0x00;
memory[2]=0xFF;
memory[3]=0x00;
memory[4]=0xFF;
And now i get requests from the user for specific bits. For example, i receive a request to send the bits in position 10:35, and i must return those bits combined in bytes. In that case i would need 4 bytes which contain.
response[0]=0b11000000;
responde[1]=0b00111111;
response[2]=0b11000000;
response[3]=0b00000011; //padded with zeros for excess bits
This will be used for Modbus which is a big-endian protocol. I have come up with the following code:
for(int j=findByteINIT;j<(findByteFINAL);j++){
aux[0]=(unsigned char) (memory[j]>>(startingbit-(8*findByteINIT)));
aux[1]=(unsigned char) (memory[j+1]<<(startingbit-(8*findByteINIT)));
response[h]=(unsigned char) (aux[0] | aux[1] );
h++;
aux[0]=0x00;//clean aux
aux[1]=0x00;
}
which does not work but should be close to the ideal solution. Any suggestions?

I think this should do it.
int start_bit = 10, end_bit = 35; // input
int start_byte = start_bit / CHAR_BIT;
int shift = start_bit % CHAR_BIT;
int response_size = (end_bit - start_bit + (CHAR_BIT - 1)) / CHAR_BIT;
int zero_padding = response_size * CHAR_BIT - (end_bit - start_bit + 1);
for (int i = 0; i < response_size; ++i) {
response[i] =
static_cast<uint8_t>((memory[start_byte + i] >> shift) |
(memory[start_byte + i + 1] << (CHAR_BIT - shift)));
}
response[response_size - 1] &= static_cast<uint8_t>(~0) >> zero_padding;
If the input is a starting bit and a number of bits instead of a starting bit and an (inclusive) end bit, then you can use exactly the same code, but compute the above end_bit using:
int start_bit = 10, count = 9; // input
int end_bit = start_bit + count - 1;

Related

C++ - Converting char vector elements to single uint64

I have a vector which holds byte data (chars) received from a socket. This data holds different datatypes i want to extract. E.g. the first 8 elements (8 Bytes) of the vector are an uint64_t. Now I want to convert these first 8 Bytes to a single uint64.
A workaround I've found is:
// recv_buffer is the vector containing the received Bytes
std::vector<uint64_t> frame_number(recv_buffer.begin(), recv_buffer.begin() + sizeof(uint64_t));
uint64_t frame_num = frame.number.at(0);
Is there a way to extract the data without creating a new vector?
This is an effective method:
C/C++:
uint64_t hexToUint64(char *data, int32_t offset){
uint64_t num = 0;
for (int32_t i = offset; i < offset + 8; i++) {
num = (num << 8) + (data[i] & 0xFF);
}
return num;
}
Java:
long hexToUint64(byte[] data, int offset){
return
((long)data[offset++] << 56 & 0xFF00000000000000L) |
((long)data[offset++] << 48 & 0xFF000000000000L) |
((long)data[offset++] << 40 & 0xFF0000000000L) |
((long)data[offset++] << 32 & 0xFF00000000L) |
((long)data[offset++] << 24 & 0xFF000000L) |
((long)data[offset++] << 16 & 0xFF0000L) |
((long)data[offset++] << 8 & 0xFF00L) |
((long)data[offset++] & 0xFFL);
}
JavaScript:
function hexToUint64(data, offset) {
let num = 0;
let multiple = 0x100000000000000;
for (let i = offset; i < offset + 8; i++ , multiple /= 0x100) {
num += (data[i] & 0xFF) * multiple;
}
return num;
}
One normally uses memcpy or similar to a properly aligned structure, and then ntohl to convert a number from network byte order to computer byte order. ntohl is not part of the C++ specification, but exists in Linux and Windows and others regardless.
uint64_t frame_num;
std::copy(recv_buffer.begin(), recv_buffer.begin() + sizeof(uint64_t), static_cast<char*>(&fame_num);
//or memcpy(&frame_num, recv_buffer.data(), sizeof(frame_num));
frame_num = ntohl(ntohl);
It is tempting to do this for a struct that represents an entire network header, but since C++ compilers can inject padding bytes into structs, and it's undefined to write to the padding, it's better to do this one primitive at a time.
You could perform the conversion byte by byte like this:
int main()
{
unsigned char bytesArray[8];
bytesArray[0] = 0x05;
bytesArray[1] = 0x00;
bytesArray[2] = 0x00;
bytesArray[3] = 0x00;
bytesArray[4] = 0x00;
bytesArray[5] = 0x00;
bytesArray[6] = 0x00;
bytesArray[7] = 0x00;
uint64_t intVal = 0;
intVal = (intVal << 8) + bytesArray[7];
intVal = (intVal << 8) + bytesArray[6];
intVal = (intVal << 8) + bytesArray[5];
intVal = (intVal << 8) + bytesArray[4];
intVal = (intVal << 8) + bytesArray[3];
intVal = (intVal << 8) + bytesArray[2];
intVal = (intVal << 8) + bytesArray[1];
intVal = (intVal << 8) + bytesArray[0];
cout<<intVal;
return 0;
}
I suggest doing the following:
uint64_t frame_num = *((uint64_t*)recv_buffer.data());
You should of course first verify that the amount of data you have in recv_buffer is at least sizeof(frame_num) bytes.

Vectorizing bits packing in C++

I'm writing a tool for operations on long strings of 6 different letters (e.g. >1000000 letters), so I'd like to encode each letter in less than eight bits (for 6 letters 3 bits is sufficient)
Here is my code:
Rcpp::RawVector pack(Rcpp::RawVector UNPACKED,
const unsigned short ALPH_SIZE) {
const unsigned int IN_LEN = UNPACKED.size();
Rcpp::RawVector ret((ALPH_SIZE * IN_LEN + BYTE_SIZE - 1) / BYTE_SIZE);
unsigned int out_byte = ZERO;
unsigned short bits_left = BYTE_SIZE;
for (int i = ZERO; i < IN_LEN; i++) {
if (bits_left >= ALPH_SIZE) {
ret[out_byte] |= (UNPACKED[i] << (bits_left - ALPH_SIZE));
bits_left -= ALPH_SIZE;
} else {
ret[out_byte] |= (UNPACKED[i] >> (ALPH_SIZE - bits_left));
bits_left = ALPH_SIZE - bits_left;
out_byte++;
ret[out_byte] |= (UNPACKED[i] << (BYTE_SIZE - bits_left));
bits_left = BYTE_SIZE - bits_left;
}
}
return ret;
}
I'm using Rcpp, which is an R interface for C++. RawVector is in fact vector of char's.
This code works just perfectly - except it is too slow. I'm performing operations bit by bit while I could vectorize it somehow. And here is a question - is there any library or tool to do it? I'm not acknowledged with C++ tools.
Thanks in advance!
This code works just perfectly - except it is too slow.
Then you probably want to try out 4-bits/letter. Trading space for time. If 4-bits meets your compression needs (just 33.3% larger) then your code works on nibbles which will be much faster and simpler than tri-bits.
You need to unroll your loop, so optimizer could make something useful out of it. It will also get rid of your if, which kills any chance for quick performance. Something like this:
int i = 0;
for(i = 0; i + 8 <= IN_LEN; i += 8) {
ret[out_byte ] = (UNPACKED[i] ) | (UNPACKED[i + 1] << 3) | (UNPACKED[i + 2] << 6);
ret[out_byte + 1] = (UNPACKED[i + 2] >> 2) | (UNPACKED[i + 3] << 1) | (UNPACKED[i + 4] << 4) | (UNPACKED[i + 5] << 7);
ret[out_byte + 2] = (UNPACKED[i + 5] >> 1) | (UNPACKED[i + 6] << 2) | (UNPACKED[i + 7] << 5);
out_byte += 3;
}
for (; i < IN_LEN; i++) {
if (bits_left >= ALPH_SIZE) {
ret[out_byte] |= (UNPACKED[i] << (bits_left - ALPH_SIZE));
bits_left -= ALPH_SIZE;
} else {
ret[out_byte] |= (UNPACKED[i] >> (ALPH_SIZE - bits_left));
bits_left = ALPH_SIZE - bits_left;
out_byte++;
ret[out_byte] |= (UNPACKED[i] << (BYTE_SIZE - bits_left));
bits_left = BYTE_SIZE - bits_left;
}
}
This will allow optimizer to vectorize whole thing (assuming it's smart enough). With your current implementation i doubt any current compiler can find out, that your code loops after 3 written bytes and abuse it.
EDIT:
with sufficient constexpr / template magic you might be able to write some universal handler for body of the loop. Or just cover all small values (like write specialized template function for every bitcount from 1 to let's say 16). Packing values bitwise after 16 bits is overkill.

C++ with CUDA: how to express a byte as a char or set of chars?

INSIDE THE CUDA KERNEL
Suppose I have a byte that can have a binary value between 0 and 255.
I have a character array (char *) length three:
char * c = (char *) malloc(300000000*sizeof(char)); // 30 mb
Short of the following (as in, I would like to rule out “solutions” that involve a manual byte to char representation):
switch(my_byte){
case 0:
c[0] = '0';
case 1:
c[1] = '1';
...
case 255:
c[0] = '2';
c[1] = '5';
c[2] = '5';
}
How do I convert the byte to a char * style string in a Cuda kernel?
This is my solution, for now, in an effort to avoid the flow control issue in the vectorized code.
/*! \brief byte to raw chars; this is not a string! */
__device__ void byte_to_chars(uint8_t b,char * str_arr_ptr){
uint8_t buf[4];
buf[0] = b / 100;
buf[1] = (b % 100 - b % 10) / 10;
buf[2] = b % 10;
buf[3] = 3 - !buf[0] + !buf[0]*!buf[1]; // size
// buf[3] = sz
// 3 - buf[3] = missing digits; i.e., 1 for 023, 2 for 003
for(int i = 0; i < buf[3]; i++) str_arr_ptr[0][i] = buf[ i + 3 - buf[3] ]+'0';
// modify function signature as needed -- i.e., return
// useful info
}
However, a solution based on library calls would be best.
First, don't use malloc() for a small, fixed amount of space; use an array. Second, don't switch, and in general, in kernel code, try to avoid diverging control paths. Finally, if it's supposed to be a C-style string, it needs to end with '\0'.
So consider something like:
#include <cstdint>
enum { max_digits = 3, modulus = 10 };
struct stringized_byte_t {
char[max_digits+1] buffer;
}
stringized_byte_t stringize_a_byte(uint8_t my_byte)
{
uint8_t digits[max_digits];
uint8_t num_digits = 1;
uint8_t remainder = my_byte;
while(remainder >= modulus) {
uint8_t dividend = remainder / modulus;
digits[num_digits - 1] = remainder - dividend * modulus;
num_digits++;
remainder = dividend;
}
// at this point we have one digit left (although it might be 0),
// and we know the overall number of digits, so:
digits[num_digits - 1] = remainder;
// Now we need to flip the digit direction to fit the printing order,
// and terminate the string
stringized_byte_t sb;
for(int i = 0; i < num_digits; i++) {
sb.buffer[i] = '0' + digits[num_digits - i - 1];
}
sb.buffer[num_digits] = '\0';
return sb;
}
Note I used C-style coding rather than "pimping up" the class, so you can very easily convert this code into proper C.

Bitwise operator to calculate checksum

Am trying to come up with a C/C++ function to calculate the checksum of a given array of hex values.
char *hex = "3133455D332015550F23315D";
For e.g., the above buffer has 12 bytes and then last byte is the checksum.
Now what needs to done is, convert the 1st 11 individual bytes to decimal and then take there sum.
i.e., 31 = 49,
33 = 51,.....
So 49 + 51 + .....................
And then convert this decimal value to Hex. And then take the LSB of that hex value and convert that to binary.
Now take the 2's complement of this binary value and convert that to hex. At this step, the hex value should be equal to 12th byte.
But the above buffer is just an example and so it may not be correct.
So there're multiple steps involved in this.
Am looking for an easy way to do this using bitwise operators.
I did something like this, but it seems to take the 1st 2 bytes and doesn't give me the right answer.
int checksum (char * buffer, int size){
int value = 0;
unsigned short tempChecksum = 0;
int checkSum = 0;
for (int index = 0; index < size - 1; index++) {
value = (buffer[index] << 8) | (buffer[index]);
tempChecksum += (unsigned short) (value & 0xFFFF);
}
checkSum = (~(tempChecksum & 0xFFFF) + 1) & 0xFFFF;
}
I couldn't get this logic to work. I don't have enough embedded programming behind me to understand the bitwise operators. Any help is welcome.
ANSWER
I got this working with below changes.
for (int index = 0; index < size - 1; index++) {
value = buffer[index];
tempChecksum += (unsigned short) (value & 0xFFFF);
}
checkSum = (~(tempChecksum & 0xFF) + 1) & 0xFF;
Using addition to obtain a checksum is at least weird. Common checksums use bitwise xor or full crc. But assuming it is really what you need, it can be done easily with unsigned char operations:
#include <stdio.h>
char checksum(const char *hex, int n) {
unsigned char ck = 0;
for (int i=0; i<n; i+=1) {
unsigned val;
int cr = sscanf(hex + 2 * i, "%2x", &val); // convert 2 hexa chars to a byte value
if (cr == 1) ck += val;
}
return ck;
}
int main() {
char hex[] = "3133455D332015550F23315D";
char ck = checksum(hex, 11);
printf("%2x", (unsigned) (unsigned char) ck);
return 0;
}
As the operation are made on an unsigned char everything exceeding a byte value is properly discarded and you obtain your value (26 in your example).

Remove nth bit from buffer, and shift the rest

Giving a uint8_t buffer of x length, I am trying to come up with a function or a macro that can remove nth bit (or n to n+i), then left-shift the remaining bits.
example #1:
for input 0b76543210 0b76543210 ... then output should be 0b76543217 0b654321 ...
example #2: if the input is:
uint8_t input[8] = {
0b00110011,
0b00110011,
...
};
the output without the first bit, should be
uint8_t output[8] = {
0b00110010,
0b01100100,
...
};
I have tried the following to remove the first bit, but it did not work for the second group of bits.
/* A macro to extract (a-b) range of bits without shifting */
#define BIT_RANGE(N,x,y) ((N) & ((0xff >> (7 - (y) + (x))) << ((x))))
void removeBit0(uint8_t *n) {
for (int i=0; i < 7; i++) {
n[i] = (BIT_RANGE(n[i], i + 1, 7)) << (i + 1) |
(BIT_RANGE(n[i + 1], 1, i + 1)) << (7 - i); /* This does not extract the next element bits */
}
n[7] = 0;
}
Update #1
In my case, the input will be uint64_t number, then I will use memmov to shift it one place to the left.
Update #2
The solution can be in C/C++, assembly(x86-64) or inline assembly.
This is really 2 subproblems: remove bits from each byte and pack the results. This is the flow of the code below. I wouldn't use a macro for this. Too much going on. Just inline the function if you're worried about performance at that level.
#include <stdio.h>
#include <stdint.h>
// Remove bits n to n+k-1 from x.
unsigned scrunch_1(unsigned x, int n, int k) {
unsigned hi_bits = ~0u << n;
return (x & ~hi_bits) | ((x >> k) & hi_bits);
}
// Remove bits n to n+k-1 from each byte in the buffer,
// then pack left. Return number of packed bytes.
size_t scrunch(uint8_t *buf, size_t size, int n, int k) {
size_t i_src = 0, i_dst = 0;
unsigned src_bits = 0; // Scrunched source bit buffer.
int n_src_bits = 0; // Initially it's empty.
for (;;) {
// Get scrunched bits until the buffer has at least 8.
while (n_src_bits < 8) {
if (i_src >= size) { // Done when source bytes exhausted.
// If there are left-over bits, add one more byte to output.
if (n_src_bits > 0) buf[i_dst++] = src_bits << (8 - n_src_bits);
return i_dst;
}
// Pack 'em in.
src_bits = (src_bits << (8 - k)) | scrunch_1(buf[i_src++], n, k);
n_src_bits += 8 - k;
}
// Write the highest 8 bits of the buffer to the destination byte.
n_src_bits -= 8;
buf[i_dst++] = src_bits >> n_src_bits;
}
}
int main(void) {
uint8_t x[] = { 0xaa, 0xaa, 0xaa, 0xaa };
size_t n = scrunch(x, 4, 2, 3);
for (size_t i = 0; i < n; i++) {
printf("%x ", x[i]);
}
printf("\n");
return 0;
}
This writes b5 ad 60, which by my reckoning is correct. A few other test cases work as well.
Oops I coded it the first time shifting the wrong way, but include that here in case it's useful to someone.
#include <stdio.h>
#include <stdint.h>
// Remove bits n to n+k-1 from x.
unsigned scrunch_1(unsigned x, int n, int k) {
unsigned hi_bits = 0xffu << n;
return (x & ~hi_bits) | ((x >> k) & hi_bits);
}
// Remove bits n to n+k-1 from each byte in the buffer,
// then pack right. Return number of packed bytes.
size_t scrunch(uint8_t *buf, size_t size, int n, int k) {
size_t i_src = 0, i_dst = 0;
unsigned src_bits = 0; // Scrunched source bit buffer.
int n_src_bits = 0; // Initially it's empty.
for (;;) {
// Get scrunched bits until the buffer has at least 8.
while (n_src_bits < 8) {
if (i_src >= size) { // Done when source bytes exhausted.
// If there are left-over bits, add one more byte to output.
if (n_src_bits > 0) buf[i_dst++] = src_bits;
return i_dst;
}
// Pack 'em in.
src_bits |= scrunch_1(buf[i_src++], n, k) << n_src_bits;
n_src_bits += 8 - k;
}
// Write the lower 8 bits of the buffer to the destination byte.
buf[i_dst++] = src_bits;
src_bits >>= 8;
n_src_bits -= 8;
}
}
int main(void) {
uint8_t x[] = { 0xaa, 0xaa, 0xaa, 0xaa };
size_t n = scrunch(x, 4, 2, 3);
for (size_t i = 0; i < n; i++) {
printf("%x ", x[i]);
}
printf("\n");
return 0;
}
This writes d6 5a b. A few other test cases work as well.
Something similar to this should work:
template<typename S> void removeBit(S* buffer, size_t length, size_t index)
{
const size_t BITS_PER_UNIT = sizeof(S)*8;
// first we find which data unit contains the desired bit
const size_t unit = index / BITS_PER_UNIT;
// and which index has the bit inside the specified unit, starting counting from most significant bit
const size_t relativeIndex = (BITS_PER_UNIT - 1) - index % BITS_PER_UNIT;
// then we unset that bit
buffer[unit] &= ~(1 << relativeIndex);
// now we have to shift what's on the right by 1 position
// we create a mask such that if 0b00100000 is the bit removed we use 0b00011111 as mask to shift the rest
const S partialShiftMask = (1 << relativeIndex) - 1;
// now we keep all bits left to the removed one and we shift left all the others
buffer[unit] = (buffer[unit] & ~partialShiftMask) | ((buffer[unit] & partialShiftMask) << 1);
for (int i = unit+1; i < length; ++i)
{
//we set rightmost bit of previous unit according to last bit of current unit
buffer[i-1] |= buffer[i] >> (BITS_PER_UNIT-1);
// then we shift current unit by one
buffer[i] <<= 1;
}
}
I just tested it on some basic cases so maybe something is not exactly correct but this should move you onto the right track.