Data structures with different sized bit fields - c++

If I have a requirement to create a data structure that has the following fields:
16-bit Size field
3-bit Version field
1-bit CRC field
How would I code this struct? I know the Size field would be an unsigned short type, but what about the other two fields?

First, unsigned short isn't guaranteed to be only 16 bits, just at least 16 bits.
You could do this:
struct Data
{
unsigned short size : 16;
unsigned char version : 3;
unsigned char crc : 1;
};
Assuming you want no padding between the fields, you'll have to issue the appropriate instructions to your compiler. With gcc, you can decorate the structure with __attribute__((packed)):
struct Data
{
// ...
} __attribute__((packed));
In Visual C++, you can use #pragma pack:
#pragma pack(push, 0)
struct Data
{
// ...
};
#pragma pack(pop)

The following class implements the fields you are looking for as a kind of bitfields.
struct Identifier
{
unsigned int a; // only bits 0-19 are used
unsigned int getSize() const {
return a & 0xFFFF; // access bits 0-15
}
unsigned int getVersion() const {
return (a >> 16) & 7; // access bits 16-18
}
unsigned int getCrc() const {
return (a >> 19) & 1; // access bit 19
}
void setSize(unsigned int size) {
a = a - (a & 0xFFF) + (size & 0xFFF);
}
void setVersion(unsigned int version) {
a = a - (a & (7<<16)) + ((version & 7) << 16);
}
void setCrc(unsigned int crc) {
a = a - (a & (1<<19)) + ((crc & 1) << 19);
}
};

Related

Standard way of overlay flexible array member

So the server sends the data just as packed structures, so what only need to decode is to overlay the structure pointer on the buffer. However one of the structure is a dynamic array kind of data, but I learned that flexible array member is not a C++ standard feature. How can I do it in standard C++ way, but without copying like a vector?
// on wire format: | field a | length | length of struct b |
// the sturcts are defined packed
__pragma(pack(1))
struct B {
//...
};
struct Msg {
int32_t a;
uint32_t length;
B *data; // how to declare this?
};
__pragma(pack())
char *buf = readIO();
// overlay, without copy and assignments of each field
const Msg *m = reinterpret_cast<const Msg *>(buf);
// access m->data[i] from 0 to length
The common way to do this in C was to declare data as an array of length one as the last struct member. You then allocate the space needed as if the array was larger.
Seems to work fine in C++ as well. You should perhaps wrap access to the data in a span or equivalent, so the implementation details don't leak outside your class.
#include <string>
#include <span>
struct B {
float x;
float y;
};
struct Msg {
int a;
std::size_t length;
B data[1];
};
char* readIO()
{
constexpr int numData = 3;
char* out = new char[sizeof(Msg) + sizeof(B) * (numData - 1)];
return out;
}
int main(){
char *buf = readIO();
// overlay, without copy and assignments of each field
const Msg *m = reinterpret_cast<const Msg *>(buf);
// access m->data[i] from 0 to length
std::span<const B> data(m->data, m->length);
for(auto& b: data)
{
// do something
}
return 0;
}
https://godbolt.org/z/EoMbeE8or
A standard solution is to not represent the array as a member of the message, but rather as a separate object.
struct Msg {
int a;
size_t length;
};
const Msg& m = *reinterpret_cast<const Msg*>(buf);
span<const B> data = {
reinterpret_cast<const B*>(buf + sizeof(Msg)),
m.length,
};
Note that reinterpretation / copying of bytes is not portable between systems with different representations (byte endianness, integer sizes, alignments, subobject packing etc.), and same representation is typically not something that can be assumed in network communication.
// on wire format: | field a | length | length of struct b |
You can't overlay the struct, because you can't guarantee that the binary representation of Msg will match the on wire format. Also int is at least 16 bits, can be any number of bits greater than 16, and size_t has various size depending on architecture.
Write actual accessors to the data. Use fixed width integer types. It will only work if the data actually point to a properly aligned region. This method allows you to write assertions and throw exceptions when stuff goes bad (for example, you can throw on out-of-bounds access to the array).
struct Msg {
constexpr static size_t your_required_alignment = alingof(uint32_t);
char *buf;
Msg (char *buf) : buf(buf) {
assert((uintptr_t)buf % your_required_alignment == 0);
}
int32_t& get_a() { return *reinterpret_cast<int32_t*>(buf); }
uint32_t& length() { return *reinterpret_cast<uint32_t *>(buf + sizeof(int32_t)); }
struct Barray {
char *buf;
Barray(char *buf) : buf(buf) {}
int16_t &operator[](size_t idx) {
return *reinterpret_cast<int16_t*>(buf + idx * sizeof(int16_t));
}
}
Barray data() {
return buf + sizeof(int32_t) + sizoef(uint32_t);
}
};
int main() {
Msg msg(readIO());
std::cout << msg.a() << msg.length();
msg.data()[1] = 5;
// or maybe even implement straight operator[]:
// msg[1] = 5;
}
If the data do not point to a properly aligned region, you have to copy the data, there is no possibility to access them using other types then char.

Access bits in memory

I want to assemble a message bit by bit, then handle the message as a vector of unsigned characters ( e.g. to calculate the CRC )
I can assemble the message OK, using either a std::vector<bool> or a std::bitset
I can copy the assembled message to a std::vector doing it bit by bit. ( Note: the meesage is padded so that its length is an integer number of bytes )
// assemble message
std::vector<bool> bitMessage;
...
// copy the bits one by one into bytes and add them to the message
std::vector<unsigned char> myMessage;
// loop over bytes
for (int kbyte = 0;
kbyte < bitMessage.size() / 8;
kbyte++)
{
unsigned char byte = 0;
// loop over bits
for (int kbit = 0;
kbit < 8;
kbit++)
{
// add bit to byte
byte += bitMessage[8 * kbyte + kbit] << kbit;
}
// add byte to message
myMessage.push_back(byte);
}
This works.
But it seems awfully slow! I would like to use std::memcpy.
For a 'normal' vector I would do
memcpy(
myMessage.data(),
bitMessage.data(),
bitMessage.size() / 8 );
or
memcpy(
&myMessage[0],
&bitMessage[0],
bitMessage.size() / 8 );
but neither of these methods is possible with either a vector<bool> or bitset
Question: Is there a way to get a pointer to the memory where the bits are stored?
The answer is: not with std::vector<bool> or std::bitset
However, with some hints , especially from #Ayxan Haqverdili, it is possible to write a small class that will accept single bits and construct a well mannered std::vector<unsigned char> as we go along.
/** Build a message bit by bit, creating an unsigned character vector of integer length
*
* Hides the messy bit twiddling required,
* allowing bits to be added to the end of the message
*
* The message is automatically padded at the end with zeroes
*/
class cTwiddle
{
public:
std::vector<unsigned char> myMessage;
cTwiddle() : myBitLength(0) {}
/** add a bit to end of message
* #param[in] bit
*/
void add(bool bit)
{
// check if message vector is full
if (!(myBitLength % 8))
{
// add byte to end of message
myMessage.push_back(0);
}
// control order bits are added to a byte
int shift = 7 - (myBitLength % 8); // add bits from left to right ( MSB first )
// int shift = (myBitLength % 8); // add bits from right to left ( LSB first )
myMessage.back() += (1 & bit) << shift;
myBitLength++;
}
private:
int myBitLength;
};
Apparently neither of those classes define the layout. Just write your own class and define the layout you want:
template <int size>
class BitSet final {
private:
unsigned char buffer[size / 8 + (size % 8 != 0)] = {};
public:
constexpr bool get(size_t index) const noexcept {
return (buffer[index / 8] >> (index % 8)) & 1U;
}
constexpr void set(size_t index) noexcept {
buffer[index / 8] |= (1U << (index % 8));
}
constexpr void clear(size_t index) noexcept {
buffer[index / 8] &= ~(1U << (index % 8));
}
};
Memcpy-ing this class is perfectly fine. Otherwise, you might also provide direct access to the byte array.
Alternatively, you can dynamically allocate the buffer:
#include <memory>
class DynBitSet final {
private:
size_t size = 0;
std::unique_ptr<unsigned char[]> buffer;
public:
explicit DynBitSet(size_t bitsize)
: size(bitsize / 8 + (bitsize % 8 != 0)),
buffer(new unsigned char[size]{}) {}
bool get(size_t index) const noexcept {
return (buffer[index / 8] >> (index % 8)) & 1U;
}
void set(size_t index) noexcept { buffer[index / 8] |= (1U << (index % 8)); }
void clear(size_t index) noexcept {
buffer[index / 8] &= ~(1U << (index % 8));
}
auto bitSize() const noexcept { return size * 8; }
auto byteSize() const noexcept { return size; }
auto const* byteBuffer() const noexcept { return buffer.get(); }
};
Is there a way to get a pointer to the memory where the bits are stored [in std::vector]?
No. The idea is that it should be not possible.
Is there a way
Fun fact: In glibc the member in iterator is public.
#include <vector>
#include <iostream>
int main() {
std::vector<bool> vec{1,0,1,0,1,1,1,1};
std::cout << *vec.begin()._M_p << '\n';
}

How do I extract little-endian unsigned short from long pointer?

I have a long pointer value that points to a 20 byte header structure followed by a larger array. Dec(57987104)=Hex(0374D020). All the values are stored little endian. 1400 when swapped is 0014 which in decimal is 20.
The question here is how do I get the first value which is a 2 byte unsigned short. I have a C++ dll to convert this for me. I'm running Windows 10.
GetCellData_API unsigned short __stdcall getUnsignedShort(unsigned long ptr)
{
unsigned long *p = &ptr;
unsigned short ret = *p;
return ret;
}
But when I call this from VBA using Debug.Print getUnsignedShort(57987104) I get 30008 when it should be 20.
I might need to do an endian swap but I'm not sure how to incorporate this from CodeGuru: How do I convert between big-endian and little-endian values?
inline void endian_swap(unsigned short& x)
{
x = (x >> 8) |
(x << 8);
}
How do I extract little endian unsigned short from long pointer?
I think I'd be inclined to write your interface function in terms of a general template function that describes the operation:
#include <utility>
#include <cstdint>
// Code for the general case
// you'll be amazed at the compiler's optimiser
template<class Integral>
auto extract_be(const std::uint8_t* buffer)
{
using accumulator_type = std::make_unsigned_t<Integral>;
auto acc = accumulator_type(0);
auto count = sizeof(Integral);
while(count--)
{
acc |= accumulator_type(*buffer++) << (8 * count);
}
return Integral(acc);
}
GetCellData_API unsigned short __stdcall getUnsignedShort(std::uintptr_t ptr)
{
return extract_be<std::uint16_t>(reinterpret_cast<const std::uint8_t*>(ptr));
}
As you can see from the demo on godbolt, the compiler does all the hard work for you.
Note that since we know the size of the data, I have used the sized integer types exported from <cstdint> in case this code needs to be ported to another platform.
EDIT:
Just realised that your data is actually LITTLE ENDIAN :)
template<class Integral>
auto extract_le(const std::uint8_t* buffer)
{
using accumulator_type = std::make_unsigned_t<Integral>;
auto acc = accumulator_type(0);
constexpr auto size = sizeof(Integral);
for(std::size_t count = 0 ; count < size ; ++count)
{
acc |= accumulator_type(*buffer++) << (8 * count);
}
return Integral(acc);
}
GetCellData_API unsigned short __stdcall getUnsignedShort(std::uintptr_t ptr)
{
return extract_le<std::uint16_t>(reinterpret_cast<const std::uint8_t*>(ptr));
}
Lets say youre pointing with pulong pulong[6] you are pointing 6 sixth member of the table
unsigned short psh*;
unsigned char puchar*
unsigend char ptable[4];
ZeroMemory(ptable,4);
puchar[3]=((char *)( &pulong[6]))[0];
puchar[2]=((char *)( &pulong[6]))[1];
puchar[1]=((char *)( &pulong[6]))[2];
puchar[0]=((char *)( &pulong[6]))[3];
psh=(unsigned short *) puchar;
//first one
psh[0];
//second one
psh[1];
THis was what was in my mind while mistaking me

Creating constructor for a struct(union) in C++

What is the best way to create a constructor for a struct(which has a union member, does it matter?) to convert uint8_t type into the struct?
Here is my example to clarify more:
struct twoSixByte
{
union {
uint8_t fullByte;
struct
{
uint8_t twoPart : 2;
uint8_t sixPart : 6;
} bits;
};
};
uint32_t extractByte(twoSixByte mixedByte){
return mixedByte.bits.twoPart * mixedByte.bits.sixPart;
}
uint8_t tnum = 182;
print(extractByte(tnum)); // must print 2 * 54 = 108
P.S.
Finding from comments & answers, type-punning for unions is not possible in C++.
The solutions given are a little bit complicated specially where there are lots of these structures in the code. There are even situations where a byte is divided into multiple bit parts(more than two). So without taking advantage of unions and instead using bitsets ans shifting bits adds a lot of burden to the code.
Instead, I managed for a much simpler solution. I just converted the type before passing it to the function. Here is the fixed code:
struct twoSixByte
{
union {
uint8_t fullByte;
struct
{
uint8_t twoPart : 2;
uint8_t sixPart : 6;
} bits;
};
};
uint32_t extractByte(twoSixByte mixedByte){
return mixedByte.bits.twoPart * mixedByte.bits.sixPart;
}
uint8_t tnum = 182;
twoSixByte mixedType;
mixedType.fullByte = tnum;
print(extractByte(mixedByte)); // must print 2 * 54 = 108
Unless there is a pressing need for you to use a union, don't use it. Simplify your class to:
struct twoSixByte
{
twoSixByte(uint8_t in) : twoPart((in & 0xC0) >> 6), sixPart(in & 0x3F) {}
uint8_t twoPart : 2;
uint8_t sixPart : 6;
};
If there is a need to get the full byte, you can use:
uint8_t fullByte(twoSixByte mixedByte)
{
return ((mixedByte.twoPart << 6) | mixedByte.sixPart);
}
You could avoid the union and type punning and use a struct with the relevant member function. Note that we don't need a constructor if the struct is regarded as an aggregate to be initialized:
#include <cstdint>
struct twoSixByte {
uint8_t fullByte; // no constructor needed, initializing as an aggregate
uint32_t extractByte(){
return ((fullByte & 0b1100'0000) >> 6) * (fullByte & 0b0011'1111);
}
};
int main()
{
twoSixByte tnum{182};
auto test = tnum.extractByte(); // test == 2 * 54 == 108
}

C++ most optimal conversion

Is that most optimal for server (speed) conversion of data ?
Could I change it for better performance ?
This is used in packet parser to set/get packet data.
void Packet::setChar(char val, unsigned int offset)
{
raw[offset + 8] = val;
}
short Packet::getChar(unsigned int offset)
{
return raw[offset + 8];
}
void Packet::setShort(short val, unsigned int offset)
{
raw[offset + 8] = val & 0xff;
raw[offset + 9] = (val >> 8) & 0xff;
}
short Packet::getShort(unsigned int offset)
{
return (short)((raw[offset + 9]&0xff) << 8) | (raw[offset + 8]&0xff);
}
void Packet::setInt(int val, unsigned int offset)
{
raw[offset + 8] = val & 0xff;
raw[offset + 9] = (val >> 8) & 0xff;
raw[offset + 10] = (val >> 16) & 0xff;
raw[offset + 11] = (val >> 24) & 0xff;
}
int Packet::getInt(unsigned int offset)
{
return (int)((raw[offset + 11]&0xff) << 24) | ((raw[offset + 10]&0xff) << 16) | ((raw[offset + 9]&0xff) << 8) | (raw[offset + 8]&0xff);
}
Class defs :
class Packet
{
public:
Packet(unsigned int length);
Packet(char * raw);
///header
void setChar(char val, unsigned int offset);
short getChar(unsigned int offset);
void setShort(short val, unsigned int offset);
short getShort(unsigned int offset);
void setInt(int val, unsigned int offset);
int getInt(unsigned int offset);
void setLong(long long val, unsigned int offset);
long getLong(unsigned int offset);
char * getRaw();
~Packet();
protected:
private:
char * raw;
};
#EDIT added class definitions
Char raw is inirialized with packet (new char).
I do agree with the comments saying "if it's not shown to be a problem, don't change it".
If your hardware is little endian, AND you either know that the offset is always aligned, or the processor supports unaligned accesses (e.g. x86), then you could speed up the setting of the larger data types by simply storing the whole item in one move (and yes, there will probably be people saying "it's undefined" - and it may well be undefined, but I've yet to see a compiler that doesn't do this correctly, because it's a fairly common thing to do in various types of code).
So something like this:
void Packet::setInt(int val, unsigned int offset)
{
int *ptr = static_cast<int*>(&raw[offset + 8]);
*ptr = val;
}
void Packet::getInt(int val, unsigned int offset)
{
int *ptr = static_cast<int*>(&raw[offset + 8]);
return *ptr;
}
Another thing I would DEFINITELY do is to ensure that the functions are present in a headerfile, so that the compiler has the choice to inline the functions. This will quite likely give you MORE benefit than fiddling with the code inside the functions, because the overhead of calling a function vs. being able to use the function inline will be quite noticeable. So that would be my first step - assuming you think it is a problem in the first place. For most things, stuffing the data into the buffer is not the "slow part" of sending a data packet - it is either the forming of the content, or the bytes passing down the wire to the other machine (which of the two depends on how fast your line is, and what calculations go into preparing the data in the first place).
It looks like that your implementation is already almost as efficient as possible. It simply isn't possible to optimize it further without a major overhaul of the application and even then, you'll save only few CPU cycles.
By the way, make sure that the function definitions are present in the header file, or #included to it. Otherwise, each output operation will need a function call, which is quite expensive for what you're doing.