Structures with bitwise data in C++ [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Converting Bit Field to int
I am working on an application, part of which handles 16-bit words that contain a number of 1-bit flags. I am handling the data using a structure similar to the one shown below:
struct mystruct
{
uint16_t Reserved1 :3;
uint16_t WordErr :1;
uint16_t SyncErr :1;
uint16_t WordCntErr :1;
uint16_t Reserved2 :10;
};
i.e. the structure contains a single 16-bit variable that is handled as a number of smaller (in some cases, 1-bit flag) pieces.
My question is this, is there a simple way to handle the entire 16-bit word as one value, say, to output it to the console, or a file, or add it to another data structure? I don't know of any way of doing this besides shifting the individual structure elements and adding them to a temporary uint16_t variable. It seems that there is probably a simpler way of extracting the entire word, but I can't find any information on how the compiler handles a structure like this.
EDIT: I suppose this may be obvious but what I am trying to do in a nutshell is be able to access the 1-bit flags individually, as well as use the structure as a single variable of type uint16_t (i.e. unsigned short, 16 bits).

The standard approach here is to use anonymous structs/unions, like this:
union mystruct
{
struct
{
uint16_t Reserved1 :3;
uint16_t WordErr :1;
uint16_t SyncErr :1;
uint16_t WordCntErr :1;
uint16_t Reserved2 :10;
};
uint16_t word_field;
};
or, if union is not good as a top level object,
struct mystruct
{
union
{
struct
{
uint16_t Reserved1 :3;
uint16_t WordErr :1;
uint16_t SyncErr :1;
uint16_t WordCntErr :1;
uint16_t Reserved2 :10;
};
uint16_t word_field;
};
};
This definition allows direct access to the inner fields, like:
mystruct s1;
s1.WordCntErr = 1;
Strictly speaking, compiler is not giving any guarantees on how different members of the union will overlap each other. It can use different alignments and even shifts. A lot of people here will readily point this out. Nevertheless, looking at this from the practical standpoint, if all fields of the union have the same size you can safely assume that they occupy the same piece of memory. For example, the code
s1.word_field = 0;
will zero out all bit fields. Tons of code are using this. It is unthinkable that this will ever stop working.

The short answer is you can't do it. The longer answer is that you can do it, but the details depend on your compiler. This particular bit-field layout looks suspiciously like it's supposed to map to a hardware register, in which case you've already got compiler dependencies: the details of how the bit-fields are arranged is implementation-defined. So while you're assuring yourself that the compiler lays them out the way you expect, you can also check whether it supports type puns through a union. Although writing to one field of a union and reading from another formally produces undefined behavior, both in C and in C++, most (all?) compilers support it in simple cases like this.

An alternative to the undefined behavior that comes from the union technique, you could copy the data:
mystruct m;
m.Reserved1 = 0;
m.WordErr = 1;
m.SyncErr = 0;
m.WordCntErr = 0;
m.Reserved2 = 0;
uint16_t value = 0;
memcpy(&value, &m, sizeof(value));
[Code]
Of course, the output is platform-specific / endian-sensitive, so if you plan on writing it out so you can read it in again then take that into account.

That's what a union is for. I hardly ever need to use one, so my syntax may be rusty, but it looks something like this:
union myunion
{
struct mystruct
{
uint16_t Reserved1 :3;
uint16_t WordErr :1;
uint16_t SyncErr :1;
uint16_t WordCntErr :1;
uint16_t Reserved2 :10;
};
uint16_t word;
};
Of course, that adds typing whenever you access it, so you might want to just try a typecast if you only need it occasionally.

Related

Why is this struct not the size I expect?

I am taking binary input from a file to a buffer vector then casting the pointer of that buffer to be my struct type.
The goal is for the data to populate the struct perfectly.
I know the size of all the various fields and the order they're going to come in.
As a result my struct needs to be tightly packed and be 42 bytes long.
My issue is that it is coming out at 44 bytes long when I test it.
Also, the first value lines up. After that, the data is incorrect.
Here's the struct:
#pragma pack(push, 1)
struct myStruct
{
uint8_t ID;
uint32_t size: 24;
uint16_t value;
char name[12];
char description[4];
char shoppingList[14];
char otherValue[6];
};
#pragma pack(pop)
Also, the first value lines up. After that, the data is incorrect.
uint32_t size: 24;
If you want to guarantee portably that this is three bytes with no padding before the next member, you're going to need to use a byte buffer and do the conversions yourself.
#pragma pack is an extension, and the packing of bitfield members is anyway implementation-defined.
FWIW both GCC and CLANG do seem to do what you want in this case, but unless it's defined by a platform ABI depending on this is still brittle.

C++ Struct packing order

I have a union that looks similar to the following:
typedef
union _thing
{
struct thing_indiv {
uint16_t one:5;
uint16_t two:4;
uint16_t three:5;
uint16_t four:5;
uint16_t five:6;
uint16_t six:6;
uint16_t seven:6;
uint16_t eight:7;
uint16_t nine:4;
uint16_t ten:5;
uint16_t eleven:6;
uint16_t twelve:5;
uint16_t thirteen:5;
uint16_t fourteen:4;
uint16_t fifteen:2;
uint16_t unused:5;
} __attribute__((packed)) thing_split;
uint8_t thing_comb[10];
} thing;
But it doesn't behave how I expect. I want to assign bytes to thing.thing_comb and retrieve the relevant items from thing.thing_split.
For example, if thing_comb = { 0xD6, 0x27, 0xAD, 0xB6. ..} I would expect thing.thing_split.one to contain 0x1A (the 5 most significant bits of 0xD6, but it does not, it contains 0x16, the 5 least significant bits. I declared each of the fields as uint16_t to keep gcc from complaining about crossing byte boundaries (I experience the same behavior with uint8_t).
Is there a way to lay out this struct to obtain this behavior?
First, type punning with an union in C++ is Undefined Behaviour.
Second, the Compiler is free to do anything it wants with a bitfield. It is not forced to lay it out like you want it to.
You need to use regular bit-packing with bitshifts to obtain the behaviour you want.
I had a similar question not so long ago:
How to use bitfields that make up a sorting key without falling into UB?

Reserving a bit for discriminating the type of a union in C++

I currently have code that looks like this:
union {
struct {
void* buffer;
uint64_t n : 63;
uint64_t flag : 1;
} a;
struct {
unsigned char buffer[15];
unsigned char n : 7;
unsigned char flag : 1;
} b;
} data;
It is part of an attempted implementation of a data structure that does small-size optimization. Although it works on my machine with the compiler I am using, I am aware that there is no guarantee that the two flag bits from each of the structs actually end up in the same bit. Even if they did, it would still technically be undefined behavior to read it from the struct that wasn't most recently written. I would like to use this bit to discriminate between which of the two types is currently stored.
Is there a safe and portable way to achieve the same thing without increasing the size of the union? For our purpose, it can not be larger than 16 bytes.
If not, could it be achieved by sacrificing an entire byte (of n in the first struct and of buffer in the second), instead of a bit?

Converting uint8_t* buffer to uint16_t and changing endianness

I'd like to process data provided by an external library.
The lib holds the data and provides access to it like this:
const uint8_t* data;
std::pair<const uint8_t*, const uint8_t*> getvalue() const {
return std::make_pair(data + offset, data + length);
}
I know that the current data contains two uint16_t numbers, but I need to change their endianness.
So altogether the data is 4 bytes long and contains this numbers:
66 4 0 0
So I'd like to get two uint16_t numbers with 1090 and 0 value respectively.
I can do basic arithmetic and in one place change the endianness:
pair<const uint8_t*, const uint8_t*> dataPtrs = library.value();
vector<uint8_t> data(dataPtrs.first, dataPtrs.second);
uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]
However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).
How can I better create uint16_t from uint8_t*? I'd avoid memcpy if possible, and use something more modern/safe.
Boost has some nice header-only endian library which can work, but it needs an uint16_t input.
For going further, Boost also provides data types for changing endianness, so I could create a struct:
struct datatype {
big_int16_buf_t data1;
big_int16_buf_t data2;
}
Is it possible to safely (paddings, platform-dependency, etc) cast a valid, 4 bytes long uint8_t* to datatype? Maybe with something like this union?
typedef union {
uint8_t u8[4];
datatype correct_data;
} mydata;
Maybe with something like this union?
No. Type punning with unions is not well defined in C++.
This would work assuming big_int16_buf_t and therefore datatype is trivially copiable:
datatype d{};
std::memcpy(&d, data, sizeof d);
uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]
However I'd like to do something more elegant
This is actually (subjectively, in my opinion) quite an elegant way because it works the same way on all systems. This reads the data as little endian, whether the CPU is little, big or some other endian. This is well portable.
However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).
The vector seems entirely pointless. You could just as well use:
const std::uint8_t* data = dataPtrs.first;
How can I better create uint16_t from uint8_t*?
If you are certain that the data sitting behind the uint8_t pointer is truly a uint16_t, C++ allows: auto u16 = *static_cast<uint16_t const*>(data); Otherwise, this is UB.
Given a big endian value, transforming this into little endian can be done with the ntohs function (under linux, other OSes have similar functions).
But beware, if the pointer you hold points to two individual uint8_t values, you mustn't convert them by pointer-cast. In that case, you have to manually specify which value goes where (conceivably with a function template). This will be the most portable solution, and in all likelihood the compiler will create efficient code out of the shifts and ors.

force a bit field read to 32 bits

I am trying to perform a less-than-32bit read over the PCI bus to a VME-bridge chip (Tundra Universe II), which will then go onto the VME bus and picked up by the target.
The target VME application only accepts D32 (a data width read of 32bits) and will ignore anything else.
If I use bit field structure mapped over a VME window (nmap'd into main memory) I CAN read bit fields >24 bits, but anything less fails. ie :-
struct works {
unsigned int a:24;
};
struct fails {
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
};
struct main {
works work;
fails fail;
}
volatile *reg = function_that_creates_and_maps_the_vme_windows_returns_address()
This shows that the struct works is read as a 32bit, but a read via fails struct of a for eg reg->fail.a is getting factored down to a X bit read. (where X might be 16 or 8?)
So the questions are :
a) Where is this scaled down? Compiler? OS? or the Tundra chip?
b) What is the actual size of the read operation performed?
I basiclly want to rule out everything but the chip. Documentation on that is on the web, but if it can be proved that the data width requested over the PCI bus is 32bits then the problem can be blamed on the Tundra chip!
edit:-
Concrete example, code was:-
struct SVersion
{
unsigned title : 8;
unsigned pecversion : 8;
unsigned majorversion : 8;
unsigned minorversion : 8;
} Version;
So now I have changed it to this :-
union UPECVersion
{
struct SVersion
{
unsigned title : 8;
unsigned pecversion : 8;
unsigned majorversion : 8;
unsigned minorversion : 8;
} Version;
unsigned int dummy;
};
And the base main struct :-
typedef struct SEPUMap
{
...
...
UPECVersion PECVersion;
};
So I still have to change all my baseline code
// perform dummy 32bit read
pEpuMap->PECVersion.dummy;
// get the bits out
x = pEpuMap->PECVersion.Version.minorversion;
And how do I know if the second read wont actually do a real read again, as my original code did? (Instead of using the already read bits via the union!)
Your compiler is adjusting the size of your struct to a multiple of its memory alignment setting. Almost all modern compilers do this. On some processors, variables and instructions have to begin on memory addresses that are multiples of some memory alignment value (often 32-bits or 64-bits, but the alignment depends on the processor architecture). Most modern processors don't require memory alignment anymore - but almost all of them see substantial performance benefit from it. So the compilers align your data for you for the performance boost.
However, in many cases (such as yours) this isn't the behavior you want. The size of your structure, for various reasons, can turn out to be extremely important. In those cases, there are various ways around the problem.
One option is to force the compiler to use different alignment settings. The options for doing this vary from compiler to compiler, so you'll have to check your documentation. It's usually a #pragma of some sort. On some compilers (the Microsoft compilers, for instance) it's possible to change the memory alignment for only a very small section of code. For example (in VC++):
#pragma pack(push) // save the current alignment
#pragma pack(1) // set the alignment to one byte
// Define variables that are alignment sensitive
#pragma pack(pop) // restore the alignment
Another option is to define your variables in other ways. Intrinsic types are not resized based on alignment, so instead of your 24-bit bitfield, another approach is to define your variable as an array of bytes.
Finally, you can just let the compilers make the structs whatever size they want and manually record the size that you need to read/write. As long as you're not concatenating structures together, this should work fine. Remember, however, that the compiler is giving you padded structs under the hood, so if you make a larger struct that includes, say, a works and a fails struct, there will be padded bits in between them that could cause you problems.
On most compilers, it's going to be darn near impossible to create a data type smaller than 8 bits. Most architectures just don't think that way. This shouldn't be a huge problem because most hardware devices that use datatypes of smaller than 8-bits end up arranging their packets in such a way that they still come in 8-bit multiples, so you can do the bit manipulations to extract or encode the values on the data stream as it leaves or comes in.
For all of the reasons listed above, a lot of code that works with hardware devices like this work with raw byte arrays and just encode the data within the arrays. Despite losing a lot of the conveniences of modern language constructs, it ends up just being easier.
I am wondering about the value of sizeof(struct fails). Is it 1? In this case, if you perform the read by dereferencing a pointer to a struct fails, it looks correct to issue a D8 read on the VME bus.
You can try to add a field unsigned int unused:29; to your struct fails.
The size of a struct is not equal to the sum of the size of its fields, including bit fields. Compilers are allowed, by the C and C++ language specifications, to insert padding between fields in a struct. Padding is often inserted for alignment purposes.
The common method in embedded systems programming is to read the data as an unsigned integer then use bit masking to retrieve the interesting bits. This is due to the above rule that I stated and the fact that there is no standard compiler parameter for "packing" fields in a structure.
I suggest creating an object ( class or struct) for interfacing with the hardware. Let the object read the data, then extract the bits as bool members. This puts the implementation as close to the hardware. The remaining software should not care how the bits are implemented.
When defining bit field positions / named constants, I suggest this format:
#define VALUE (1 << BIT POSITION)
// OR
const unsigned int VALUE = 1 << BIT POSITION;
This format is more readable and has the compiler perform the arithmetic. The calculation takes place during compilation and has no impact during run-time.
As an example, the Linux kernel has inline functions that explicitly handle memory-mapped IO reads and writes. In newer kernels it's a big macro wrapper that boils down to an inline assembly movl instruction, but it older kernels it was defined like this:
#define readl(addr) (*(volatile unsigned int *) (addr))
#define writel(b,addr) ((*(volatile unsigned int *) (addr)) = (b))
Ian - if you want to be sure as to the size of things you're reading/writing I'd suggest not using structs like this to do it - it's possible the sizeof of the fails struct is just 1 byte - the compiler is free to decide what it should be based on optimizations etc- I'd suggest reading/writing explicitly using int's or generally the things you need to assure the sizes of and then doing something else like converting to a union/struct where you don't have those limitations.
It is the compiler that decides what size read to issue. To force a 32 bit read, you could use a union:
union dev_word {
struct dev_reg {
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
} fail;
uint32_t dummy;
};
volatile union dev_word *vme_map_window();
If reading the union through a volatile-qualified pointer isn't enough to force a read of the whole union (I would think it would be - but that could be compiler-dependent), then you could use a function to provide the required indirection:
volatile union dev_word *real_reg; /* Initialised with vme_map_window() */
union dev_word * const *reg_func(void)
{
static union dev_word local_copy;
static union dev_word * const static_ptr = &local_copy;
local_copy = *real_reg;
return &static_ptr;
}
#define reg (*reg_func())
...then (for compatibility with the existing code) your accesses are done as:
reg->fail.a
The method described earlier of using the gcc flag -fstrict-volatile-bitfields and defining bitfield variables as volatile u32 works, but the total number of bits defined must be greater than 16.
For example:
typedef union{
vu32 Word;
struct{
vu32 LATENCY :3;
vu32 HLFCYA :1;
vu32 PRFTBE :1;
vu32 PRFTBS :1;
};
}tFlashACR;
.
tFLASH* const pFLASH = (tFLASH*)FLASH_BASE;
#define FLASH_LATENCY pFLASH->ACR.LATENCY
.
FLASH_LATENCY = Latency;
causes gcc to generate code
.
ldrb r1, [r3, #0]
.
which is a byte read. However, changing the typedef to
typedef union{
vu32 Word;
struct{
vu32 LATENCY :3;
vu32 HLFCYA :1;
vu32 PRFTBE :1;
vu32 PRFTBS :1;
vu32 :2;
vu32 DUMMY1 :8;
vu32 DUMMY2 :8;
};
}tFlashACR;
changes the resultant code to
.
ldr r3, [r2, #0]
.
I believe the only solution is to
1) edit/create my main struct as all 32bit ints (unsigned longs)
2) keep my original bit-field structs
3) each access I require,
3.1) I have to read the struct member as a 32bit word, and cast it into the bit-field struct,
3.2) read the bit-field element I require. (and for writes, set this bit-field, and write the word back!)
(1) Which is a same, because then I lose the intrinsic types that each member of the "main/SEPUMap" struct are.
End solution :-
Instead of :-
printf("FirmwareVersionMinor: 0x%x\n", pEpuMap->PECVersion);
This :-
SPECVersion ver = *(SPECVersion*)&pEpuMap->PECVersion;
printf("FirmwareVersionMinor: 0x%x\n", ver.minorversion);
Only problem I have is writting! (Writes are now Read/Modify/Writes!)
// Read - Get current
_HVPSUControl temp = *(_HVPSUControl*)&pEpuMap->HVPSUControl;
// Modify - set to new value
temp.OperationalRequestPort = true;
// Write
volatile unsigned int *addr = reinterpret_cast<volatile unsigned int*>(&pEpuMap->HVPSUControl);
*addr = *reinterpret_cast<volatile unsigned int*>(&temp);
Just have to tidy that code up into a method!
#define writel(addr, data) ( *(volatile unsigned long*)(&addr) = (*(volatile unsigned long*)(&data)) )
I had same problem on ARM using GCC compiler, where write into memory is only through bytes rather than 32bit word.
The solution is to define bit-fields using volatile uint32_t (or required size to write):
union {
volatile uint32_t XY;
struct {
volatile uint32_t XY_A : 4;
volatile uint32_t XY_B : 12;
};
};
but while compiling you need add to gcc or g++ this parameter:
-fstrict-volatile-bitfields
more in gcc documentation.