Parsing a binary message in C++. Any lib with examples?

Parsing a binary message in C++. Any lib with examples? - c++

I am looking for any library of example parsing a binary msg in C++. Most people asks for reading a binary file, or data received in a socket, but I just have a set of binary messages I need to decode. Somebody mentioned boost::spirit, but I haven't been able to find a suitable example for my needs.
As an example:
9A690C12E077033811FFDFFEF07F042C1CE0B704381E00B1FEFFF78004A92440
where first 8 bits are a preamble, next 6 bits the msg ID (an integer from 0 to 63), next 212 bits are data, and final 24 bits are a CRC24.
So in this case, msg 26, I have to get this data from the 212 data bits:
4 bits integer value
4 bits integer value
A 9 bit float value from 0 to 63.875, where LSB is 0.125
4 bits integer value
EDIT: I need to operate at bit level, so a memcpy is not a good solution, since it copies a number of bytes. To get first 4-bit integer value I should get 2 bits from a byte, and another 2 bits from the next byte, shift each pair and compose. What I am asking for is a more elegant way of extracting the values, because I have about 20 different messages and wanted to reach a common solution to parse them at bit level.
And so on.
Do you know os any library which can easily achieve this?
I also found other Q/A where static_cast is being used. I googled about it, and for each person recommending this approach, there is another one warning about endians. Since I already have my message, I don't know if such a warning applies to me, or is just for socket communications.
EDIT: boost:dynamic_bitset looks promising. Any help using it?

If you can't find a generic library to parse your data, use bitfields to get the data and memcpy() it into an variable of the struct. See the link Bitfields. This will be more streamlined towards your application.
Don't forget to pack the structure.
Example:
#pragma pack
include "order32.h"
struct yourfields{
#if O32_HOST_ORDER == O32_BIG_ENDIAN
unsigned int preamble:8;
unsigned int msgid:6;
unsigned data:212;
unsigned crc:24;
#else
unsigned crc:24;
unsigned data:212;
unsigned int msgid:6;
unsigned int preamble:8;
#endif
}/*__attribute__((packed)) for gcc*/;
You can do a little compile time check to assert if your machine uses LITTLE ENDIAN or BIG ENDIAN format. After that define it into a PREPROCESSOR SYMBOL::
//order32.h
#ifndef ORDER32_H
#define ORDER32_H
#include <limits.h>
#include <stdint.h>
#if CHAR_BIT != 8
#error "unsupported char size"
#endif
enum
{
O32_LITTLE_ENDIAN = 0x03020100ul,
O32_BIG_ENDIAN = 0x00010203ul,
O32_PDP_ENDIAN = 0x01000302ul
};
static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
{ { 0, 1, 2, 3 } };
#define O32_HOST_ORDER (o32_host_order.value)
#endif
Thanks to code by Christoph # here
Example program for using bitfields and their outputs:
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <memory.h>
using namespace std;
struct bitfields{
unsigned opcode:5;
unsigned info:3;
}__attribute__((packed));
struct bitfields opcodes;
/* info: 3bits; opcode: 5bits;*/
/* 001 10001 => 0x31*/
/* 010 10010 => 0x52*/
void set_data(unsigned char data)
{
memcpy(&opcodes,&data,sizeof(data));
}
void print_data()
{
cout << opcodes.opcode << ' ' << opcodes.info << endl;
}
int main(int argc, char *argv[])
{
set_data(0x31);
print_data(); //must print 17 1 on my little-endian machine
set_data(0x52);
print_data(); //must print 18 2
cout << sizeof(opcodes); //must print 1
return 0;
}

You can manipulate bits for your own, for example to parse 4 bit integer value do:
char[64] byte_data;
size_t readPos = 3; //any byte
int value = 0;
int bits_to_read = 4;
for (size_t i = 0; i < bits_to_read; ++i) {
value |= static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
}
Floats usually sent as string data:
std::string temp;
temp.assign(_data+readPos, 9);
flaot value = std::stof(temp);
If your data contains custom float format then just extract bits and do your math:
char[64] byte_data;
size_t readPos = 3; //any byte
float value = 0;
int i = 0;
int bits_to_read = 9;
while (bits_to_read) {
if (i > 8) {
++readPos;
i = 0;
}
const int bit = static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
//here your code
++i;
--bits_to_read;
}

Here is a good article that describes several solutions to the problem.
It even contains the reference to the ibstream class that the author created specifically for this purpose (the link seems dead, though). The only other mention of this class I could find is in the bit C++ library here - it might be what you need, though it's not popular and it's under GPL.
Anyway, the boost::dynamic_bitset might be the best choice as it's time-tested and community-proven. But I have no personal experience with it.

Related

Why Doesn't A Character Array Give an Unsigned Result

In this project I am supposed to receive a packet, and cast a part of it to an unsigned integer and get both Big-Endian and Little-Endian results. Originally, I wanted to just cast a pointer inside the byte array (packet) to an unsigned integer type that would automatically put the value received in Big-Endian form, like (uint32_be_t*)packet; similar to the way that it's automatically put into Little-Endian form when doing (uint32_t*)packet.
Since I couldn't find a type that automatically did this, I decided to create my own structure called "u32" which has the methods "get," which gets the value in Big-Endian form, and "get_le," which gets the value in Little-Endian form. However, I noticed that when I do this I get a negative result from the Little-Endian result.
struct u32 {
u8 data[4] = {};
uint32_t get() {
return ((uint32_t)data[3] << 0)
| ((uint32_t)data[2] << 8)
| ((uint32_t)data[1] << 16)
| ((uint32_t)data[0] << 24);
}
uint32_t get_le() {
return ((uint32_t)data[3] << 24)
| ((uint32_t)data[2] << 16)
| ((uint32_t)data[1] << 8)
| ((uint32_t)data[0] << 0);
}
};
In order to simulate a packet, I just created a character array and then cast a u32* to it like so:
int main() {
char ary[] = { 0x00, 0x00, 0x00, (char)0xF4 };
u32* v = (u32*)ary;
printf("%d %d\n", v->get(), v->get_le());
return 0;
}
But then I get the results: 244 -201326592
Why is this happening? The return type to "get_le" is uint32_t, and the first function, "get," which is supposed to return the Big-Endian unsigned integer, is performing correctly.
As a side note, this was just a test that popped into my head, so I went to the library to test it in-between classes, but unfortunately that means I have to use an online compiler (onlinegdb), but I figure it would work the same in Visual Studio. Also, if you have any suggestions as to how I could improve my code, it would be greatly appreciated. I am using Visual Studio 2019 and am allowed to use cstdlib.

Well, I daresay you want to use %u not %d in that printf() format-string!
%d assumes that the value is signed, so if the most-significant bit is 1 you get a minus sign.

There is a more elegant way to accomplish the same task. Just use uint32_t instead. You can use std::memcpy to convert between char arrays and uint32_t without invoking undefined behavior. This is what std::bit_cast does too. Reinterpreting a char* as an int* is undefined behavior. It is not the cause of your problem, because MSVC allows for it, but that's not really portable.
std::memcpy conversions or pointer casts will take place with native byte order, which is either little or big endian.
You can convert between byte orders using a builtin function. For MSVC, this would be:
_byteswap_ulong(x); // unsigned long is uint32_t on Windows
See the documentation of _byteswap_ulong. This will compile to just a single x86 bswap instruction, which is unlikely for your series of shifts. This can improve performance by a factor of 10x. GCC and clang have __builtin_bswap if you want portable code.
You can detect native endianness using std::endian or if you don't have C++20, __BYTE_ORDER__ macros. Converting to little-endian or big-endian would then just be doing nothing or performing a byte swap depending on your platform endianness.
#include <bit>
#include <cstring>
#include <cstdint>
uint32_t bswap(uint32_t x) {
return _byteswap_ulong(x);
}
uint32_t to_be(uint32_t x) {
return std::endian::native == std::endian::big ? x : bswap(x);
}
uint32_t to_le(uint32_t x) {
return std::endian::native == std::endian::little ? x : bswap(x);
}
int main() {
char ary[4] = { 0, 0, 0, (char) 0xF4 };
uint32_t v;
std::memcpy(&v, &ary, 4);
printf("%u %u\n", to_be(v), to_le(v));
return 0;
}

Arduino - how to feed a struct from a serial.read()?

I am a beginner and I am trying to feed a struct table with 4 members typed BIN with a pointer, then send them to another one, serial2. I fail to do so.
I receive 4 chars from serial1.read(), for example 'A' '10' '5' '3'.
To decrease the size of the data, I want to use a struct:
struct structTable {
unsigned int page:1; // (0,1)
unsigned int cric:4; // 10 choices (4 bits)
unsigned int crac:3; // 5 choices (3 bits)
unsigned int croc:2; // 3 choices (2 bits)
};
I declare and set: instance and pointer
struct structTable structTable;
struct structTable *PtrstructTable;
PtrstructTable = &structTable;
Then I try to feed like this:
for(int i = 0; i<=4; i++) {
if(i == 1) {
(*PtrProgs).page = Serial.read();
if(i == 2) {
(*PtrProgs).cric = Serial.read();
And so on. But it's not working...
I tried to feed a first char table and tried to cast the result:
(*PtrProgs).page = PtrT[1], BIN;
And now, I realize I can not feed 3 bits in one time! doh! All this seems very weak, and certainly a too long process for just 4 values. (I wanted to keep this kind of struct table for more instances).
Please, could you help me to find a simpler way to feed my table?

You can only send full bytes over the serial port. But you can also send raw data directly.
void send (const structTable* table)
{
Serial.write((const char*)table, sizeof(structTable)); // 2 bytes.
}
bool receive(structTable* table)
{
return (Serial.readBytes((char*)table, sizeof(structTable)) == sizeof(structTable));
}
You also have to be aware that sizeof(int) is not the same on all CPUS
A word about endianness. The definition for your struct for the program at the other end of the serial link, if running on a CPU with a different endianness would become:
struct structTable {
unsigned short int croc:2; // 3 choices (2 bits)
unsigned short int crac:3; // 5 choices (3 bits)
unsigned short int cric:4; // 10 choices (4 bits)
unsigned short int page:1; // (0,1)
};
Note the use of short int, which you can also use in the arduino code to be more precise. The reason is that short int is 16 bits on most CPUs, while int may be 16,32 or even 64 bits.

According to the Arduino reference I just looked up Serial::read, the code returns data byte-by-byte (eight bits at a time). So probably you should just read the data one byte (eight bits at a time) and do your unpacking after the fact.
In fact you might want to use a union (see e.g. this other stackoverflow post on how to use a union) so that you can get the best of both worlds. Specifically, if you define a union of your definition with the bits broken out and a second part of the union as one or two bytes, you can send the data as bytes and then decode it in the bits you are interested in.
UPDATE
Here is an attempt at some more details. There are a lot of caveats about unions - they aren't portable, they are compiler dependent, etc. But this might be worth trying.
typedef struct {
unsigned int page:1; // (0,1)
unsigned int cric:4; // 10 choices (4 bits)
unsigned int crac:3; // 5 choices (3 bits)
unsigned int croc:2; // 3 choices (2 bits)
} structTable;
typedef union {
structTable a;
uint16_t b;
} u_structTable;
serial.Read(val1);
serial.Read(val2);
u_structTable x;
x.b = val1 | (val2<<8);
printf("page is %d\n", x.a.page);

Malloc/VirtualAlloc prepending FFFFFF after 127 dec

Whenever I load a struct into memory the memory block seems to contain ffffff before certain bytes. After closer inspection I figured this occurs exactly at 0x80 (128 in dec).
#include <Windows.h>
#include <stdio.h>
typedef struct __tagMYSTRUCT {
BYTE unused[4096];
} MYSTRUCT, *PMYSTRUCT;
int main() {
MYSTRUCT myStruct;
for (int i = 0; i < 4094; i++) {
myStruct.unused[i] = 0x00;
}
myStruct.unused[4094] = 0x7F; /* No FFFFFF prepend */
myStruct.unused[4095] = 0x80; /* FFFFFF prepend */
MYSTRUCT *p = (MYSTRUCT*)malloc(4096);
*p = myStruct;
char *read = (char*)p;
for (int i = 0; i < 4096; i++) {
printf("%02x ", read[i]);
}
free(p);
p = NULL;
read = NULL;
return 0;
}
Any one knows why this happens and / or how to 'fix' it? (I assume bytes should reach to 0xff); if I write these bytes to a file, as in, fwrite(&myStruct, sizeof(myStruct), 1, [filestream]) it doesn't include the ffffff's
Compiler used: Visual Studio 2015 Community
P.S. as stated in the title the same occurs when using VirtualAlloc

This has nothing to do with VirtualAlloc nor with malloc.
Note that the following details depend on your platform and different things might happen on different operating systems or compilers:
char is a signed type (on your platform). It has a range of -128 to 127. When you treat the number 128 as a char it wraps around and is actually stored as -128.
%02x tells printf to print an unsigned int, in hexadecimal, with at least two digits. But you are actually passing a char. The compiler will automatically convert it to an int (with the value -128), which printf will then misinterpret as an unsigned int. On your platform, -128 converted to an unsigned int will give the same value as 0xffffff80.

How to convert an array of bits to a char

I am trying to edit each byte of a buffer by modifying the LSB(Least Significant Bit) according to some requirements.
I am using the unsigned char type for the bytes, so please let me know IF that is correct/wrong.
unsigned char buffer[MAXBUFFER];
Next, i'm using this function
char *uchartob(char s[9], unsigned char u)
which modifies and returns the first parameter as an array of bits. This function works just fine, as the bits in the array represent the second parameter.
Here's where the hassle begins. I am going to point out what I'm trying to do step by step so you guys can let me know where i'm taking the wrong turn.
I am saving the result of the above function (called for each element of the buffer) in a variable
char binary_byte[9]; // array of bits
I am testing the LSB simply comparing it to some flag like above.
if (binary_byte[7]==bit_flag) // i go on and modify it like this
binary_byte[7]=0; // or 1, depending on the case
Next, I'm trying to convert the array of bits binary_byte (it is an array of bits, isn't it?) back into a byte/unsigned char and update the data in the buffer at the same time. I hope I am making myself clear enough, as I am really confused at the moment.
buffer[position_in_buffer]=binary_byte[0]<<7| // actualize the current BYTE in the buffer
binary_byte[1]<<6|
binary_byte[2]<<5|
binary_byte[3]<<4|
binary_byte[4]<<3|
binary_byte[5]<<2|
binary_byte[6]<<1|
binary_byte[7];
Keep in mind that the bit at the position binary_byte[7] may be modified, that's the point of all this.
The solution is not really elegant, but it's working, even though i am really insecure of what i did (I tried to do it with bitwise operators but without success)
The weird thing is when I am trying to print the updated character from the buffer. It has the same bits as the previous character, but it's a completely different one.
My final question is : What effect does changing only the LSB in a byte have? What should I expect?. As you can see, I'm getting only "new" characters even when i shouldn't.

So I'm still a little unsure what you are trying to accomplish here but since you are trying to modify individual bits of a byte I would propose using the following data structure:
union bit_byte
{
struct{
unsigned bit0 : 1;
unsigned bit1 : 1;
unsigned bit2 : 1;
unsigned bit3 : 1;
unsigned bit4 : 1;
unsigned bit5 : 1;
unsigned bit6 : 1;
unsigned bit7 : 1;
} bits;
unsigned char all;
};
This will allow you to access each bit of your byte and still get your byte representation. Here some quick sample code:
bit_byte myValue;
myValue.bits.bit0 = 1; // Set the LSB
// Test the LSB
if(myValue.bits.bit0 == 1) {
myValue.bits.bit7 = 1;
}
printf("%i", myValue.all);

bitwise:
set bit => a |= 1 << x;
reset bit => a &= ~(1 << x);
bit check => a & (1 << x);
flip bit => a ^= (1 << x)
If you can not manage this you can always use std::bitset.
Helper macros:
#define SET_BIT(where, bit_number) ((where) |= 1 << (bit_number))
#define RESET_BIT(where, bit_number) ((where) &= ~(1 << (bit_number)))
#define FLIP_BIT(where, bit_number) ((where) ^= 1 << (bit_number))
#define GET_BIT_VALUE(where, bit_number) (((where) & (1 << (bit_number))) >> bit_number) //this will retun 0 or 1
Helper application to print bits:
#include <iostream>
#include <cstdint>
#define GET_BIT_VALUE(where, bit_number) (((where) & (1 << (bit_number))) >> bit_number)
template<typename T>
void print_bits(T const& value)
{
for(uint8_t bit_count = 0;
bit_count < (sizeof(T)<<3);
++bit_count)
{
std::cout << GET_BIT_VALUE(value, bit_count) << std::endl;
}
}
int main()
{
unsigned int f = 8;
print_bits(f);
}

C++ Compressing eight booleans into a character

I have a large mass of integers that I'm reading from a file. All of them will be either 0 or 1, so I have converted each read integer to a boolean.
What I need to do is take advantage of the space (8 bits) that a character provides by packing every 8 bits/booleans into a single character. How can I do this?
I have experimented with binary operations, but I'm not coming up with what I want.
int count = 7;
unsigned char compressedValue = 0x00;
while(/*Not end of file*/)
{
...
compressedValue |= booleanValue << count;
count--;
if (count == 0)
{
count = 7;
//write char to stream
compressedValue &= 0;
}
}
Update
I have updated the code to reflect some corrections suggested so far. My next question is, how should I initialize/clear the unsigned char?
Update
Reflected the changes to clear the character bits.
Thanks for the help, everyone.

Several notes:
while(!in.eof()) is wrong, you have to first try(!) to read something and if that succeeded, you can use the data.
Use an unsigned char to get an integer of at least eight bits. Alternatively, look into stdint.h and use uint8_t (or uint_least8_t).
The shift operation is in the wrong direction, use uint8_t(1) << count instead.
If you want to do something like that in memory, I'd use a bigger type, like 32 or 64 bits, because reading a byte is still a single RAM access even if much more than a byte could be read at once.
After writing a byte, don't forget to zero the temporary.

As Mooing Duck suggested, you can use a bitset.
The source code is only a proof of concept - especially the file-read has to be implemented.
#include <bitset>
#include <cstdint>
#include <iostream>
int main() {
char const fcontent[56] { "\0\001\0\001\0\001\0\001\0\001"
"\001\001\001\001\001\001\001\001\001\001\001\001"
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
"\0\001\0\001\0\001\0\001" };
for( int i { 0 }; i < 56; i += 8 ) {
std::bitset<8> const bs(fcontent+i, 8, '\0', '\001');
std::cout << bs.to_ulong() << " ";
}
std::cout << std::endl;
return 0;
}
Output:
85 127 252 0 0 1 84

The standard guaranties that vector<bool> is packed the way you want. Don't reinvent the wheel.more info here

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js