C++ Struct packing order - c++

I have a union that looks similar to the following:
typedef
union _thing
{
struct thing_indiv {
uint16_t one:5;
uint16_t two:4;
uint16_t three:5;
uint16_t four:5;
uint16_t five:6;
uint16_t six:6;
uint16_t seven:6;
uint16_t eight:7;
uint16_t nine:4;
uint16_t ten:5;
uint16_t eleven:6;
uint16_t twelve:5;
uint16_t thirteen:5;
uint16_t fourteen:4;
uint16_t fifteen:2;
uint16_t unused:5;
} __attribute__((packed)) thing_split;
uint8_t thing_comb[10];
} thing;
But it doesn't behave how I expect. I want to assign bytes to thing.thing_comb and retrieve the relevant items from thing.thing_split.
For example, if thing_comb = { 0xD6, 0x27, 0xAD, 0xB6. ..} I would expect thing.thing_split.one to contain 0x1A (the 5 most significant bits of 0xD6, but it does not, it contains 0x16, the 5 least significant bits. I declared each of the fields as uint16_t to keep gcc from complaining about crossing byte boundaries (I experience the same behavior with uint8_t).
Is there a way to lay out this struct to obtain this behavior?

First, type punning with an union in C++ is Undefined Behaviour.
Second, the Compiler is free to do anything it wants with a bitfield. It is not forced to lay it out like you want it to.
You need to use regular bit-packing with bitshifts to obtain the behaviour you want.
I had a similar question not so long ago:
How to use bitfields that make up a sorting key without falling into UB?

Related

Converting uint8_t* buffer to uint16_t and changing endianness

I'd like to process data provided by an external library.
The lib holds the data and provides access to it like this:
const uint8_t* data;
std::pair<const uint8_t*, const uint8_t*> getvalue() const {
return std::make_pair(data + offset, data + length);
}
I know that the current data contains two uint16_t numbers, but I need to change their endianness.
So altogether the data is 4 bytes long and contains this numbers:
66 4 0 0
So I'd like to get two uint16_t numbers with 1090 and 0 value respectively.
I can do basic arithmetic and in one place change the endianness:
pair<const uint8_t*, const uint8_t*> dataPtrs = library.value();
vector<uint8_t> data(dataPtrs.first, dataPtrs.second);
uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]
However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).
How can I better create uint16_t from uint8_t*? I'd avoid memcpy if possible, and use something more modern/safe.
Boost has some nice header-only endian library which can work, but it needs an uint16_t input.
For going further, Boost also provides data types for changing endianness, so I could create a struct:
struct datatype {
big_int16_buf_t data1;
big_int16_buf_t data2;
}
Is it possible to safely (paddings, platform-dependency, etc) cast a valid, 4 bytes long uint8_t* to datatype? Maybe with something like this union?
typedef union {
uint8_t u8[4];
datatype correct_data;
} mydata;
Maybe with something like this union?
No. Type punning with unions is not well defined in C++.
This would work assuming big_int16_buf_t and therefore datatype is trivially copiable:
datatype d{};
std::memcpy(&d, data, sizeof d);
uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]
However I'd like to do something more elegant
This is actually (subjectively, in my opinion) quite an elegant way because it works the same way on all systems. This reads the data as little endian, whether the CPU is little, big or some other endian. This is well portable.
However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).
The vector seems entirely pointless. You could just as well use:
const std::uint8_t* data = dataPtrs.first;
How can I better create uint16_t from uint8_t*?
If you are certain that the data sitting behind the uint8_t pointer is truly a uint16_t, C++ allows: auto u16 = *static_cast<uint16_t const*>(data); Otherwise, this is UB.
Given a big endian value, transforming this into little endian can be done with the ntohs function (under linux, other OSes have similar functions).
But beware, if the pointer you hold points to two individual uint8_t values, you mustn't convert them by pointer-cast. In that case, you have to manually specify which value goes where (conceivably with a function template). This will be the most portable solution, and in all likelihood the compiler will create efficient code out of the shifts and ors.

What purpose to use union in struct?

I'm currently struggling with uart snippet codes from embedded UART program.
Then I came across what I can't undersatnd when I analysing code.
Q1. In case of using "union" in "struct". what is the benefit and what purpose to use like this?
#define __IO volatile
typedef struct {
union {
__IO uint32_t RR;
__IO uint32_t TR;
__IO uint32_t DL;
__IO uint32_t RR_TR_DL;
};
union {
__IO uint32_t DH;
__IO uint32_t IR;
__IO uint32_t DH_IER;
};
} UART_TypeDef;
Q2. In case of using "union" in "struct" in "struct". what is the benefit and what purpose to use like this?
typedef struct {
union {
struct{
__IO uint32_t CTRLR0;
__IO uint32_t SSI_COMP_VERSION;
union {
__IO uint32_t DR;
__IO uint32_t DR0;
};
__IO uint32_t DR1;
__IO uint32_t RSVD_2;
};
uint8_t RESERVED[0x1000];
};
} SSI_TypeDef;
The first case is basically "aliasing" of the field names. The UART_TypeDef type consists of two uint32_t fields, the first which can be referred to as any of RR, TR, DL or RR_TR_DL. Ditto for the second field, which can be DH, IR or DH_IER.
The second case, SSI_TypeDef, is similar in respect to the inner unions, consisting of three uint32_t fields, CTRLR0/SSI_COMP_VERSION, DR/DR0 and DR1/RSVD_2 (in all cases, either name can be used for the fields).
But the structure as a whole is sized at 4K, due to the unioning with uint8_t RESERVED[0x1000].
The aliasing is useful if, for example, the same underlying field can be accessed as either RR or TR, depending on context. For example, a device may have different behaviour depending on whether you read or write the location.
Say, for example, that you write to a given address (a memory mapped I/O operation) to indicate to the other end that you are read-ready (able to receive data). Further assume that reading that exact same location will let you know whether you're able to transmit.
First, let's set up said memory mapped I/O address (say it's at 0xf000):
UART_TypeDef *utd = (UART_TypeDef *)0xf000; // very shifty :-)
Now both these statement refer to the same memory address:
int transmitReady = utd->TR; // Can I transmit?
utd->RR = 1; // Tell other end it can send.
Being able to use distinct names for the same underlying thing can aid readability.

Enforce struct size 8 byte

I have a struct that is supposed to be 8 byte in size.
struct Slot {
uint8_t T;
uint8_t S;
uint32_t O : 24;
uint32_t L : 24;
}
However, sizeof(Slot) tells me the size is 12 byte.
So the compiler seems to pad the data although it shouldn't be necessary (probably because the 24-bit cannot be aligned properly).
A hacky solution would be to use 3 one-byte fields instead of a single three-byte field:
struct Slot2 {
uint8_t T;
uint8_t S;
uint8_t O1;
uint8_t O2;
uint8_t O3;
uint8_t L1;
uint8_t L2;
uint8_t L3;
}; // sizeof(Slot2) = 8
Is there any other way to achieve this?
This gives size 8 bytes on MSVC without packing pragma.
struct Slot {
uint32_t O : 24;
uint32_t T : 8;
uint32_t L : 24;
uint32_t S : 8;
};
There is no way anyone can tell what your code will do or how the data will end up in memory, because the behavior of bit fields is poorly specified by the C standard. See this.
It is not specified what will happen when you use an uint32_t for a bit field.
You can't know if there will be padding bits.
You can't know if there will be padding bytes.
You can't know where padding bits or bytes will end up.
You can't know whether 8 bits of the 2nd 24 bit chunk end up immediately after previous data, or if it is aligned to the next 32 bit segment.
You can't know which bit that is msb and which that is lsb.
Endianess will cause problems.
The solution is to not use bit fields at all. Use the bitwise operators instead.
Your "hack" solution is exactly the right one. I suspect that the layout is determined by some outside factors, so you won't be able to map this to a struct in any better way. I suspect the order of bytes in your 24 bit numbers is also determined by the outside, and not by your compiler.
To handle that kind of situation, a struct of bytes or just an array of bytes is the easiest and portable solution.
I think, what you want, 8 bytes, is not something that the C standard can gurantee, with your first definition.
Related: from C11 standard, Chapter §6.7.2.1,
An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined.
You can have a way out however, if you can adjust the variable so that they can fit properly in 32-bit alignment, then
24 + 8 + 24 + 8 = 64 bits = 8 bytes.
you can have a structure of size 8 bytes.
With this compiler dependant solution (works with gcc, msvc) the struct will be 8 bytes:
#pragma pack(push, 1)
struct Slot {
uint8_t T;
uint8_t S;
uint32_t O : 24;
uint32_t L : 24;
};
#pragma pack(pop)
This will set the alignment of the struct to 1 byte.
On MSVC the following works and keeps your variable orders the same:
struct Slot {
uint64_t T : 8;
uint64_t S : 8;
uint64_t O : 24;
uint64_t L : 24;
};
This is not guaranteed across compilers though. YMMV on other compilers.
Try something like as shown below:
struct Slot {
uint32_t O : 24;
uint8_t T;
uint32_t L : 24;
uint8_t S;
};

Inconsistent results when dealing with endianness and arrays in C++

I'm wondering why — in my sample code — when converting a reference to the first element of m_arr to a pointer of bigger size the program reads the memory to m_val in little-endian byte order? With this way of thinking *(std::uint8_t*)m_arr should point to 0x38, but it doesn't.
My CPU uses little-endian byte order.
#include <iostream>
#include <iomanip>
int main() {
std::uint8_t m_arr[2] = { 0x5a, 0x38 };
// explain why m_val is 0x385a and not 0x5a38
std::uint16_t m_val = *(std::uint16_t*)m_arr;
std::cout << std::hex << m_val << std::endl;
return 0;
}
Byte ordering is the order in which bytes are laid out when referenced as their native type. Regardless of whether your machine is big or little endian, a sequence of bytes is always in its natural order.
The situation you describe (where the first byte is 0x38) is what you would observe if you created a uint16_t and got a uint8_t pointer to it. Instead, you have a uint8_t array and you get a uint16_t pointer to it.
Little endian means that the least significant byte goes first:
So translate that logic to your array, { 0x5a, 0x38 }. On a little endian system, the 0x5a is least significant and 0x38 is most significant... hence you get 0x385a.

Structures with bitwise data in C++ [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Converting Bit Field to int
I am working on an application, part of which handles 16-bit words that contain a number of 1-bit flags. I am handling the data using a structure similar to the one shown below:
struct mystruct
{
uint16_t Reserved1 :3;
uint16_t WordErr :1;
uint16_t SyncErr :1;
uint16_t WordCntErr :1;
uint16_t Reserved2 :10;
};
i.e. the structure contains a single 16-bit variable that is handled as a number of smaller (in some cases, 1-bit flag) pieces.
My question is this, is there a simple way to handle the entire 16-bit word as one value, say, to output it to the console, or a file, or add it to another data structure? I don't know of any way of doing this besides shifting the individual structure elements and adding them to a temporary uint16_t variable. It seems that there is probably a simpler way of extracting the entire word, but I can't find any information on how the compiler handles a structure like this.
EDIT: I suppose this may be obvious but what I am trying to do in a nutshell is be able to access the 1-bit flags individually, as well as use the structure as a single variable of type uint16_t (i.e. unsigned short, 16 bits).
The standard approach here is to use anonymous structs/unions, like this:
union mystruct
{
struct
{
uint16_t Reserved1 :3;
uint16_t WordErr :1;
uint16_t SyncErr :1;
uint16_t WordCntErr :1;
uint16_t Reserved2 :10;
};
uint16_t word_field;
};
or, if union is not good as a top level object,
struct mystruct
{
union
{
struct
{
uint16_t Reserved1 :3;
uint16_t WordErr :1;
uint16_t SyncErr :1;
uint16_t WordCntErr :1;
uint16_t Reserved2 :10;
};
uint16_t word_field;
};
};
This definition allows direct access to the inner fields, like:
mystruct s1;
s1.WordCntErr = 1;
Strictly speaking, compiler is not giving any guarantees on how different members of the union will overlap each other. It can use different alignments and even shifts. A lot of people here will readily point this out. Nevertheless, looking at this from the practical standpoint, if all fields of the union have the same size you can safely assume that they occupy the same piece of memory. For example, the code
s1.word_field = 0;
will zero out all bit fields. Tons of code are using this. It is unthinkable that this will ever stop working.
The short answer is you can't do it. The longer answer is that you can do it, but the details depend on your compiler. This particular bit-field layout looks suspiciously like it's supposed to map to a hardware register, in which case you've already got compiler dependencies: the details of how the bit-fields are arranged is implementation-defined. So while you're assuring yourself that the compiler lays them out the way you expect, you can also check whether it supports type puns through a union. Although writing to one field of a union and reading from another formally produces undefined behavior, both in C and in C++, most (all?) compilers support it in simple cases like this.
An alternative to the undefined behavior that comes from the union technique, you could copy the data:
mystruct m;
m.Reserved1 = 0;
m.WordErr = 1;
m.SyncErr = 0;
m.WordCntErr = 0;
m.Reserved2 = 0;
uint16_t value = 0;
memcpy(&value, &m, sizeof(value));
[Code]
Of course, the output is platform-specific / endian-sensitive, so if you plan on writing it out so you can read it in again then take that into account.
That's what a union is for. I hardly ever need to use one, so my syntax may be rusty, but it looks something like this:
union myunion
{
struct mystruct
{
uint16_t Reserved1 :3;
uint16_t WordErr :1;
uint16_t SyncErr :1;
uint16_t WordCntErr :1;
uint16_t Reserved2 :10;
};
uint16_t word;
};
Of course, that adds typing whenever you access it, so you might want to just try a typecast if you only need it occasionally.