Struct Bit Packing and LSB / MSB ambiguity C++

Struct Bit Packing and LSB / MSB ambiguity C++ - c++

I had to write a c++ code for the following packet header:
Original image link, PNG version of the above JPEG.
Here is the struct code I wrote for the above Packet Format. I want to know if the uint8_t or the uint16_t bit fields are correct
struct TelemetryTransferFramePrimaryHeader
{
//-- 6 Ocets Long --//
//-- Master Channel ID (2 octets)--//
uint16_t TransferFrameVersionNumber : 2;
uint16_t SpacecraftID : 10;
uint16_t VirtualChannelID : 3;
uint16_t OCFFlag : 1;
//-----------------//
uint8_t MasterChannelFrameCount;
uint8_t VirtualChannelFrameCount;
//-- Transfer Frame Data Field Status (2 octets) --//
uint16_t TransferFrameSecondaryHeaderFlag : 1;
uint16_t SyncFlag : 1;
uint16_t PacketOrderFlag : 1;
uint16_t SegmentLengthID : 2;
uint16_t FirstHeaderPointer : 11;
//-----------------//
};
How do I ensure that that the LSB -> MSB is preserved in the struct ?
I keep getting confused, and I've tried reading up but it ends up confusing me even more.
PS: I am using a 32bit processor.

Exactly how bits are mapped when using bit fields is implementation-specific. So it's very hard to say for sure if you did it right, we'd need to know the exact CPU and compiler (and compiler version, of course).
In short; don't do this. Bit fields are not very usable for things like this.
Do it manually instead, by declaring the words as needed and setting the bits inside them.

IMHO anyone trying to construct a struct in this way is in a state of sin.
The C99 Standard, for example, says:
An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
Even if you could predict that your compiler would construct bit-fields in units of (say) uint32_t, and the fields were arranged first field LS bits... you still have endian-ness to deal with !
So... as unwind says... do it by hand !

I agree that you should not do this. However STMicroelectronics uses bitfields to access the bits of its Cortex-M3/M4 microcontroller registers. So any compiler vendor that wants its users to be able to use the STMicroelectronics Cortex-M3/M4 libraries needs to support the allocation of bitfields starting at the least significant bit. In my compiler this is the default, but it is also optional so I could reverse it if I wanted to.

Related

Unexpected behaviour using bit-fields and unions

I was experimenting with bit-fields and unions and created this:
union REG{
struct{
char posX: 7;
char posY: 7;
unsigned char dir: 2;
};
unsigned short reg;
};
And when I run sizeof( short ), I get 2, but when I run sizeof( REG ), I get 4. That's weird to me because when I sum the bits I get 7+7+2=16, which is the size in bits of a 2 byte datatype.
I'm currently using the Dev-C++ editor with compiler TDM-GCC 9.9.2 64-bit Debug.
This is my first question, so please tell me if you need more information... Thanks in advance!
Edit: Upon further experimentation I realized the size is the same (2 bytes) when I set the size of posX and posY to 6 bits. But that still puzzles because the sum is 14 bits which is less than 2 bytes...
Edit 2: Thanks to AviBerger I realized that replacing the char/unsigned char datatype with short/unsigned short the result of '''sizeof( REG )''' turns into 2. But I still can't figure out "Why does this happen?"

From the spec we have
An implementation may allocate any addressable storage unit large enough to hold a bit-
field. If enough space remains, a bit-field that immediately follows another bit-field in a
structure shall be packed into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is
implementation-defined. The order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined. The alignment of the
addressable storage unit is unspecified.
So the actual behavior depends on what size allocation unit the compiler chooses for the bitfields, and whether it allows fields to span mulitple allocation units. This choice is implementation defined, but a common implementation is to use the declared type of the bit field as the allocation unit, and not allow crossing allocation unit boundaries. So when you use (unisgned) char, it uses an 8-bit allocation unit. This means that no two of the bitfields can be combined into a single byte (7+7 > 8 and 7+2 > 8), so it ends up taking 3 allocation units (bytes), which then rounds up to 4 for alignment when combined with a short in the union.
When you change the bitfield size to 6, now the second and third bitfields can fit in a byte (6+2 = 8), so it only takes two allocation units.
When you change the bitfield type to short it uses a 16-bit allocation unit, so all 3 bitfields can fit in one allocation unit.

There are several points of finesse when working with struct and union. The most common is that fields are generously padded to be aligned to the CPU's word size.
struct {
char c1;
char c2;
} s1;
seems like it should be a two byte structure, but surprisingly often sizeof (s1) will be not 2, but 4—or even 8. This was the case even in the 1980s with 16-bit machines.
This is because C and C++ compilers will align each char element of a structure to a two byte or four byte boundary. I have yet to see structure elements aligned to 8 byte boundary, but we haven't got 64 bit architecture being that needy—yet.
The solution is to invoke a compilation option to "pack structures". This can either be done on the compiler command line or including a suitable #pragma option before the structure declaration:
#pragma pack(1) // this particular syntax is valid for many compilers
struct {
char c11;
char c12;
} s2;
#pragma pack(4)

Taken from the standard (n4835):
11.4.9 Bit-ﬁelds [class.bit]
1 [...] Allocation of bit-ﬁelds within a class object is implementation-deﬁned. Alignment of bit-ﬁelds is implementation-deﬁned. Bit-ﬁelds are packed into some addressable allocation unit. [Note: Bit-ﬁelds straddle allocation units on some machines and not on others. Bit-ﬁelds are assigned right-to-left on some machines, left-to-right on others. —end note]
As you see the size and alignment are implementation defined. So you might get the expected behaviour on other compilers/platforms, but on your compiler/platform you get different results than you expect.

Bitwise structure definition language generating c++ code

Before any question is asked: I am dealing with actual hardware.
I am searching for a meta-language that would allow me to specify data structure contents where fields have different bit length (this includes fields like 1, 3 or 24 or 48 bits long), with respect to endianess, and would generate C++ code accessing the data.
The question was put on hold due to being too vague, so I'll try to make it as clear as possible:
I am searching for a language that:
accepts simple structure description and generate useful C++ code,
would allow to precisely specify integers ranging from 1 bit to multiple (up to 8) bytes long, along with data (typically string),
would isolate me from need to convert endianess,
produces exact, predictable output that does not come with overhead (like in protocol buffers)
ASN.1 sounds almost good for the purpose, but it adds its own overhead (meaning, I cannot produce a simple structure that has 2 bytes split into 4 nibbles) - what i'm looking for is a language that would offer exact representation of the structure.
For example, I would want to abstract this:
struct Command {
struct Record {
int8_t track;
int8_t point;
int8_t index;
int16_t start_position; // big endian, misaligned
int32_t length; // big endian, misaligned;
} __attribute__((packed)); // structure length = 11 bytes.
int8_t current : 1;
int8_t command : 7;
int8_t reserved;
int16_t side : 3; // entire int16_t needs to be
int16_t layer : 3; // converted from big endian, because
int16_t laser_mark : 3; // this field spans across bytes.
int16_t laser_power : 3;
int16_t reserved_pad : 2;
int16_t laser_tag : 2;
int32_t mode_number : 8; // again, entire 32 bit field needs to be converted
int32_t record_count : 24; // from big endian to read this count properly.
Record records[];
} __attribute__((packed));
the above needs to be packed exactly to the structure carrying 8 + record_count * 11 bytes, all formed accurately, no additional data, no additional bits or bytes set.
The above is just an example. it's made simple so that I don't clog the site with actual structures that have oftentimes hundreds of fields. It has been simplified, but shows many of the features that I am looking forward to see (two remaining features are 48 or 64-bit integers and plain data (bytes[]))
If this question is still too vague, please explain what it is that I should add in the comments. thanks!

A simple table that tracks individual field sizes and is used to spin out offsets of each element into your structure sounds like the easiest solution. This won't scale to deeply nested structures, but could be tuned to support handling of the unassigned bit cases you identify.
Then, you can use this to generate constants or even named property accessors to extract and update the individual fields. Given the size of the individual elements, macros are likely to make life even harder, but any mainstream compiler should inline the code. You mileage could vary with a template-based implementation.
If would help if you could use a common representation for both sides of the application (host and device) to further reduce the likelihood of transcription errors.
The PLC world has a number of different mechanisms for layout, but these are all very hardwired into their eco-systems and so would not really help.
Alternately, if you have the tooling available, you could consider something like ASN.1 structures for the representation. In the extreme, you could even use an open source generator to come up with an unencoded generator directly from the MIB.

Endianess inside a byte

Recently I'm tracking down a bug that appears when the two sides of the network communication have different endianness. One side has already sent a telegram marking lastSegment while the other side is still waiting for the last segment endlessly.
I read this code:
#ifndef kBigEndian
struct tTelegram
{
u8 lastSegment : 1;
u8 reserved: 7;
u8 data[1];
};
#else
struct tTelegram
{
u8 reserved: 7;
u8 lastSegment : 1;
u8 data[1];
};
#endif
I know endianness is concerned for multi-byte type, e.g., int, long, etc. But why it cares in the previous code? lastSegment and reserved are inside a single byte.
Is that a bug?

You have 16 bits in your struct. On a 32-bit or 64-bit architecture, depending on the endianess, data may come "before" reserved and lastSegment or it may come "after" when looking at the raw binary. IE If we consider 32 bits, your struct may be packed along 32-bit boundaries. It might look like this:
padbyte1 padbyte2 data lastSegment+reserved
or it may look like this
lastSegment+reserved data padbyte1 padbyte2
So when you put those 16 bits over the wire then reinterpret them on the other side, do you know if you're getting data or lastSegment?
Your problem isn't within the byte, its where data lies in relation to reserved and lastSegment.

When it comes to bitfields, ordering is not guaranteed even between different compilers running on the same CPU. You could theoretically even get a change of order just by changing flags with the same compiler (though, in fairness, I have to add that I've never actually seen that happen).

In C++ what is the proper term for splitting an int into bits

I see in some C++ code things like:
// Header
struct SomeStruct {
uint32_t nibble1:4, bitField1:1, bitField2:1, bitField3:1, bitField4:1,
padding:11, field5Bits:5, byteField:8;
};
What is this called? I typically like to google before asking here, but I have no idea what to even type in. I'm hoping to understand this when it comes to endianness - is bit order something to consider or just byte order? Also, what is the type of each field - bitFieldX should be a bool, while field5Bits should be a uint8_t. At least that's what I would think.
Thanks.

They are called bitfields (MSVC) (GCC)
Endianess usually refers to the order of bytes. However bit order can be important, see the above links.
They behave as an unsigned int (uint32_t) in your case.

In general, the term for selecting several bits out of a larger binary integer representation is masking.

What you posted is a packed structure. The elements within the structure are know as bitfields as others have posted. These are often used to represent communication protocol structures, where the protocol specifies fields that are less than one byte, or not aligned to a byte, half-word or word alignment that would normally take place.
Since there is only one type listed, each member of the structure is the same type, uint_32.
Endianess does matter for anthing that is part of a data type that is larger than 1 byte.

C/C++: Force Bit Field Order and Alignment

I read that the order of bit fields within a struct is platform specific. What about if I use different compiler-specific packing options, will this guarantee data is stored in the proper order as they are written? For example:
struct Message
{
unsigned int version : 3;
unsigned int type : 1;
unsigned int id : 5;
unsigned int data : 6;
} __attribute__ ((__packed__));
On an Intel processor with the GCC compiler, the fields were laid out in memory as they are shown. Message.version was the first 3 bits in the buffer, and Message.type followed. If I find equivalent struct packing options for various compilers, will this be cross-platform?

No, it will not be fully-portable. Packing options for structs are extensions, and are themselves not fully portable. In addition to that, C99 §6.7.2.1, paragraph 10 says: "The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined."
Even a single compiler might lay the bit field out differently depending on the endianness of the target platform, for example.

Bit fields vary widely from compiler to compiler, sorry.
With GCC, big endian machines lay out the bits big end first and little endian machines lay out the bits little end first.
K&R says "Adjacent [bit-]field members of structures are packed into implementation-dependent storage units in an implementation-dependent direction. When a field following another field will not fit ... it may be split between units or the unit may be padded. An unnamed field of width 0 forces this padding..."
Therefore, if you need machine independent binary layout you must do it yourself.
This last statement also applies to non-bitfields due to padding -- however all compilers seem to have some way of forcing byte packing of a structure, as I see you already discovered for GCC.

Bitfields should be avoided - they aren't very portable between compilers even for the same platform. from the C99 standard 6.7.2.1/10 - "Structure and union specifiers" (there's similar wording in the C90 standard):
An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
You cannot guarantee whether a bit field will 'span' an int boundary or not and you can't specify whether a bitfield starts at the low-end of the int or the high end of the int (this is independant of whether the processor is big-endian or little-endian).
Prefer bitmasks. Use inlines (or even macros) to set, clear and test the bits.

endianness are talking about byte orders not bit orders. Nowadays , it is 99% sure that bit orders are fixed. However, when using bitfields, endianness should be taken in count. See the example below.
#include <stdio.h>
typedef struct tagT{
int a:4;
int b:4;
int c:8;
int d:16;
}T;
int main()
{
char data[]={0x12,0x34,0x56,0x78};
T *t = (T*)data;
printf("a =0x%x\n" ,t->a);
printf("b =0x%x\n" ,t->b);
printf("c =0x%x\n" ,t->c);
printf("d =0x%x\n" ,t->d);
return 0;
}
//- big endian : mips24k-linux-gcc (GCC) 4.2.3 - big endian
a =0x1
b =0x2
c =0x34
d =0x5678
1 2 3 4 5 6 7 8
\_/ \_/ \_____/ \_____________/
a b c d
// - little endian : gcc (Ubuntu 4.3.2-1ubuntu11) 4.3.2
a =0x2
b =0x1
c =0x34
d =0x7856
7 8 5 6 3 4 1 2
\_____________/ \_____/ \_/ \_/
d c b a

Most of the time, probably, but don't bet the farm on it, because if you're wrong, you'll lose big.
If you really, really need to have identical binary information, you'll need to create bitfields with bitmasks - e.g. you use an unsigned short (16 bit) for Message, and then make things like versionMask = 0xE000 to represent the three topmost bits.
There's a similar problem with alignment within structs. For instance, Sparc, PowerPC, and 680x0 CPUs are all big-endian, and the common default for Sparc and PowerPC compilers is to align struct members on 4-byte boundaries. However, one compiler I used for 680x0 only aligned on 2-byte boundaries - and there was no option to change the alignment!
So for some structs, the sizes on Sparc and PowerPC are identical, but smaller on 680x0, and some of the members are in different memory offsets within the struct.
This was a problem with one project I worked on, because a server process running on Sparc would query a client and find out it was big-endian, and assume it could just squirt binary structs out on the network and the client could cope. And that worked fine on PowerPC clients, and crashed big-time on 680x0 clients. I didn't write the code, and it took quite a while to find the problem. But it was easy to fix once I did.

Thanks #BenVoigt for your very useful comment starting
No, they were created to save memory.
Linux source does use a bit field to match to an external structure: /usr/include/linux/ip.h has this code for the first byte of an IP datagram
struct iphdr {
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8 ihl:4,
version:4;
#elif defined (__BIG_ENDIAN_BITFIELD)
__u8 version:4,
ihl:4;
#else
#error "Please fix <asm/byteorder.h>"
#endif
However in light of your comment I'm giving up trying to get this to work for the multi-byte bit field frag_off.

Of course the best answer is to use a class which reads/writes bit fields as a stream. Using the C bit field structure is just not guaranteed. Not to mention it is considered unprofessional/lazy/stupid to use this in real world coding.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js