Portable bit fields for Handles - c++

I want to use and store "Handles" to data in an object buffer to reduce allocation overhead. The handle is simply an index into an array with the object. However I need to detect use-after-reallocations, as this could slip in quite easily. The common approach seems to be using bit fields. However this leads to 2 problems:
Bit fields are implementation defined
Bit shifting is not portable across big/little endian machines.
What I need:
Store handle to file (file handler can manage either integer types (byte swapping) or byte arrays)
Store 2 values in the handle with minimum space
What I got:
template<class T_HandleDef, typename T_Storage = uint32_t>
struct Handle
{
typedef T_HandleDef HandleDef;
typedef T_Storage Storage;
Handle(): handle_(0){}
private:
const T_Storage handle_;
};
template<unsigned T_numIndexBits = 16, typename T_Tag = void>
struct HandleDef{
static const unsigned numIndexBits = T_numIndexBits;
};
template<class T_Handle>
struct HandleAccessor{
typedef typename T_Handle::Storage Storage;
typedef typename T_Handle::HandleDef HandleDef;
static const unsigned numIndexBits = HandleDef::numIndexBits;
static const unsigned numMagicBits = sizeof(Storage) * 8 - numIndexBits;
/// "Magic" struct that splits the handle into values
union HandleData{
struct
{
Storage index : numIndexBits;
Storage magic : numMagicBits;
};
T_Handle handle;
};
};
A usage would be for example:
typedef Handle<HandleDef<24> > FooHandle;
FooHandle Create(unsigned idx, unsigned m){
HandleAccessor<FooHandle>::HandleData data;
data.idx = idx;
data.magic = m;
return data.handle;
}
My goal was to keep the handle as opaque as possible, add a bool check but nothing else. Users of the handle should not be able to do anything with it but passing it around.
So problems I run into:
Union is UB -> Replace its T_Handle by Storage and add a ctor to Handle from Storage
How does the compiler layout the bit field? I fill the whole union/type so there should be no padding. So probably the only thing that can be different is which type comes first depending on endianess, correct?
How can I store handle_ to a file and load it from a possible different endianess machine and still have index and magic be correct? I think I can store the containing Storage 'endian-correct' and get correct values, IF both members occupy exactly half the space (2 Shorts in an uint) But I always want more space for the index than for the magic value.
Note: There are already questions about bitfields and unions. Summary:
Bitfields may have unexpected padding (impossible here as whole type occupied)
Order of "members" depend on compiler (only 2 possible ways here, should be save to assume order depends entirely on endianess, so this may or may not actually help here)
Specific binary layout of bits can be achieved by manual shifting (or e.g. wrappers http://blog.codef00.com/2014/12/06/portable-bitfields-using-c11/) -> Is not an answer here. I need also a specific layout of the values IN the bitfield. So I'm not sure what I get, if I e.g. create a handle as handle = (magic << numIndexBits) | index and save/load this as binary (no endianess conversion) Missing a BigEndian machine for testing.
Note: No C++11, but boost is allowed.

Answer is pretty simple (based on another question I forgot the link to and comments by #Jeremy Friesner ):
As "numbers" are already an abstraction in C++ one can be sure to always have the same bit representation when the variable is in a CPU register (when it is used for anything calculation like) Also bit shifts in C++ are defined in an endian-independent way. This means x << 1 is always equal x * 2 (and hence big-endian)
Only time one get endianess problems is when saving to file, send/recv over network or accessing it from memory differently (e.g. via pointers...)
One cannot use C++ bitfields here, as one cannot be 100% sure about the order of the "entries". Bitfield containers might be ok, if they allow access to the data as a "number".
Savest is (still) using bitshifts, which are very simple in this case (only 2 values) During storing/serialization the number must then be stored in an endian-agnostic way.

Related

Why reserve memory in the structure?

I often see structures in the code, at the end of which there is a memory reserve.
struct STAT_10K4
{
int32_t npos; // position number
...
float Plts;
Pxts;
float Plto [NUM];
uint32_t reserv [(NUM * 3)% 2 + 1];
};
Why do they do this?
Why are some of the reserve values dependent on constants?
What can happen if you do not make such reserves? Or make a mistake in their size?
This is a form of manual padding of a class to make its size a multiple of some number. In your case:
uint32_t reserv [(NUM * 3)% 2 + 1];
NUM * 3 % 2 is actually nonsensical, as it would be equivalent to NUM % 2 (not considering overflow). So if the array size is odd, we pad the struct with one additional uint32_t, on top of + 1 additional ones. This padding means that STAT_10K4's size is always a multiple of 8 bytes.
You will have to consult the documentation of your software to see why exactly this is done. Perhaps padding this struct with up to 8 bytes makes some algorithm easier to implement. Or maybe it has some perceived performance benefit. But this is pure speculation.
Typically, the compiler will pad your structs to 64-bit boundaries if you use any 64-bit types, so you don't need to do this manually.
Note: This answer is specific to mainstream compilers and x86. Obviously this does not apply to compiling for TI-calculators with 20-bit char & co.
This would typically be to support variable-length records. A couple of ways this could be used will be:
1 If the maximum number of records is known then a simple structure definition can accomodate all cases.
2 In many protocols there is a "header-data" idiom. The header will be a fixed size but the data variable. The data will be received as a "blob". Thus the structure of the header can be declared and accessed by a pointer to the blob, and the data will follow on from that. For example:
typedef struct
{
uint32_t messageId;
uint32_t dataType;
uint32_t dataLenBytes;
uint8_t data[MAX_PAYLOAD];
}
tsMessageFormat;
The data is received in a blob, so a void* ptr, size_t len.
The buffer pointer is then cast so the message can be read as follows:
tsMessageFormat* pMessage = (psMessageFormat*) ptr;
for (int i = 0; i < pMessage->dataLenBytes; i++)
{
//do something with pMessage->data[i];
}
In some languages the "data" could be specified as being an empty record, but C++ does not allow this. Sometimes you will see the "data" omitted and you have to perform pointer arithmetic to access the data.
The alternative to this would be to use a builder pattern and/or streams.
Windows uses this pattern a lot; many structures have a cbSize field which allows additional data to be conveyed beyond the structure. The structure accomodates most cases, but having cbSize allows additional data to be provided if necessary.

Serializing/deserializing a bitfield structure in c++

I have a
typedef struct {
uint32_t Thread: HTHREAD_BUS_WIDTH;
uint32_t Member: 3;
uint32_t Proxy:3;
// Other members, fill out 32 bits
} MyStruct;
that I must transfer from one system to another as an item of
a buffer comprising 32-bit words.
What is the best way to serialize the struct, and on the other side,
to deserialize it? "best" means here safe casting, and no unneeded copying.
For one direction of casting, I have found (as member function)
int &ToInt() {
return *reinterpret_cast<int *>(this);}
Is there similar valid casting in the other way round, i.e. from integer to MyStruct; the best would be as a member function?
How can I define which bit means which field? (It may even the case,
that the deserialization happens in another program, in another language, in little/big endian systems?
How can I define which bit means which field?
You cannot. You have no control over the layout of bitfields.
"best" means here safe casting, and no unneeded copying.
There is no portable safe cast that could avoid copying.
A portable way to serialise bitfields is to manually shift into an integer, in the desired order. For example:
MyStruct value = something;
uint32_t out = 0;
out |= value.Thread;
out << HTHREAD_BUS_WIDTH;
out |= value.Member;
out << 3;
out |= value.Proxy;
In the shown example, the least significant bits contain the field Proxy while the other fields are adjacent in more significant bits.
Of course, in order to serialise this generated integer correctly, just like serialising any integer, you must take endianness into consideration. Serialisation of an integer can be portably implemented by repeatedly shifting the integer, and copying the bytes in order of significance into an array.
If you need to read from other system which might have different endianess you cannot rely on a portable bitfield. A solution is to "expanse" your structure so that each field is serialyzed as a 32 bit value in the "transport" buffer. A safe implementation could be something like:
typedef struct {
uint32_t Thread: HTHREAD_BUS_WIDTH;
uint32_t Member: 3;
uint32_t Proxy:3;
// Other members, fill out 32 bits
std::vector<uint32_t > to_buffer() const;
} MyStruct;
Implementation of to_buffer():
std::vector<uint32_t > MyStruct::to_buffer() const
{
std::vector<uint32_t> buffer;
buffer.push_back((uint32_t )(Thread);
buffer.push_back((uint32_t )(Member);
buffer.push_back((uint32_t )(Proxy);
// push other members
return buffer;
}
then on the reception side you can do the "buffer" to struct.
If you do not want to expanse the fields that do not use 32 bits you can always implement you own packing function by shifting and masking bits eg:
uint32_t menber_and_procy = (Member << 3) | proxy; // and so one for other members.
It is much more error prone.
From my own experience, if communication bandwith is not an issue, relying on "text like" content is a better choice (no endianess issues and very easy to debug).

C++. Struct padding / alignment on different platforms and atomatic check of layout compatibility

I have embedded device connected to PC
and some big struct S with many fields and arrays of custom defined type FixedPoint_t.
FixedPoint_t is a templated POD class with exactly one data member that vary in size from char to long depending on template params. Anyway it passes static_assert((std::is_pod<FixedPoint_t<0,8,8> >::value == true),"");
It will be good if this big struct has compatible underlaying memory representation on both embedded system and controlling PC. This allows significant simplification of communication protocol to commands like "set word/byte with offset N to value V". Assume endianess is the same on both platforms.
I see 3 solutions here:
Use something like #pragma packed on both sides.
BUT i got warning when i put attribute((packed)) to struct S declaration
warning: ignoring packed attribute because of unpacked non-POD field.
This is because FixedPoint_t is not declared as packed.
I don't want declare it as packed because this type is widely used in whole program and packing can lead to performance drop.
Make correct struct serialization. This is not acceptable because of code bloat, additional RAM usege...Protocol will be more complicated because i need random access to the struct. Now I think this is not an option.
Control padding manually. I can add some field, reorder others...Just to acheive no padding on both platforms. This will satisfy me at the moment. But i need good way to write a test that shows me is padding is there or not.
I can compare sum of sizeof() each field to sizeof(struct).
I can compare offsetof() each struct field on both platforms.
Both variants are ugly enough...
What do you recommend? Especially i am interested in manual padding controling and automaic padding detection in tests.
EDIT: Is it sufficient to compare sizeof(big struct) on two platforms to detect layout compatibility(assume endianess is equal)?? I think size should not match if padding will be different.
EDIT2:
//this struct should have padding on 32bit machine
//and has no padding on 8bit
typedef struct
{
uint8_t f8;
uint32_t f32;
uint8_t arr[5];
} serialize_me_t;
//count of members in struct
#define SERTABLE_LEN 3
//one table entry for each serialize_me_t data member
static const struct {
size_t width;
size_t offset;
// size_t cnt; //why we need cnt?
} ser_des_table[SERTABLE_LEN] =
{
{ sizeof(serialize_me_t::f8), offsetof(serialize_me_t, f8)},
{ sizeof(serialize_me_t::f32), offsetof(serialize_me_t, f32)},
{ sizeof(serialize_me_t::arr), offsetof(serialize_me_t, arr)},
};
void serialize(void* serialize_me_ptr, char* buf)
{
const char* struct_ptr = (const char*)serialize_me_ptr;
for(int i=0; i<SERTABLE_LEN; I++)
{
struct_ptr += ser_des_table[i].offset;
memcpy(buf, struct_ptr, ser_des_table[i].width );
buf += ser_des_table[i].width;
}
}
I strongly recommend to use option 2:
You are save for future changes (new PCD/ABI, compiler, platform, etc.)
Code-bloat can be kept to a minimum if well thought. There is just one function needed per direction.
You can create the required tables/code (semi-)automatically (I use Python for such). This way both sides will stay in sync.
You definitively should add a CRC to the data anyway. As you likely do not want to calculate this in the rx/tx-interrupt, you'll have to provide an array anyway.
Using a struct directly will soon become a maintenance nightmare. Even worse if someone else has to track this code.
Protocols, etc. tend to be reused. If it is a platform with different endianess, the other approach goes bang.
To create the data-structs and ser/des tables, you can use offsetof to get the offset of each type in the struct. If that table is made an include-file, it can be used on both sides. You can even create the struct and table e.g. by a Python script. Adding that to the build-process ensures it is always up-to-date and you avoid additional typeing.
For instance (in C, just to get idea):
// protocol.inc
typedef struct {
uint32_t i;
uint 16_t s[5];
uint32_t j;
} ProtocolType;
static const struct {
size_t width;
size_t offset;
size_t cnt;
} ser_des_table[] = {
{ sizeof(ProtocolType.i), offsetof(ProtocolType.i), 1 },
{ sizeof(ProtocolType.s[0]), offsetof(ProtocolType.s), 5 },
...
};
If not created automatically, I'd use macros to generate the data. Possibly by including the file twice: one to generate the struct definition and one for the table. This is possible by redefining the macros in-between.
You should care about the representation of signed integers and floats (implementation defined, floats are likely IEEE754 as proposed by the standard).
As an alternative to the width field, you can use an "type" code (e.g. a char which maps to an implementation-defined type. This way you can add custom types with same width, but different encoding (e.g. uint32_t and IEEE754-float). This will completely abstract the network protocol encoding from the physical machine (the best solution). Note noting hinders you from using common encodings which do not complicate code a single bit (literally).

Reversibly Combining Two uint32_t's Without Changing Datatype

Here's my issue: I need to pass back two uint32_t's via a single uint32_t (because of how the API is set up...). I can hard code whatever other values I need to reverse the operation, but the parameter passed between functions needs to stay a single uint32_t.
This would be trivial if I could just bit-shift the two 32-bit ints into a single 64-bit int (like what was explained here), but the compiler wouldn't like that. I've also seen mathematical pairing functions, but I'm not sure if that's what I need in this case.
I've thought of setting up a simple cipher: the unint32_t could be the cipher text, and I could just hard code the key. This is an example, but that seems like overkill.
Is this even possible?
It is not possible to store more than 32 bits of information using only 32 bits. This is a basic result of information theory.
If you know that you're only using the low-order 16 bits of each value, you could shift one left 16 bits and combine them that way. But there's absolutely no way to get 64 bits worth of information (or even 33 bits) into 32 bits, period.
Depending on how much trouble this is really worth, you could:
create a global array or vector of std::pair<uint32_t,uint32_t>
pass an index into the function, then your "reverse" function just looks up the result in the array.
write some code to decide which index to use when you have a pair to pass. The index needs to not be in use by anyone else, and since the array is global there may be thread-safety issues. Essentially what you are writing is a simple memory allocator.
As a special case, on a machine with 32 bit data pointers you could allocate the struct and reinterpret_cast the pointer to and from uint32_t. So you don't need any globals.
Beware that you need to know whether or not the function you pass the value into might store the value somewhere to be "decoded" later, in which case you have a more difficult resource-management problem than if the function is certain to have finished using it by the time it returns.
In the easy case, and if the code you're writing doesn't need to be re-entrant at all, then you only need to use one index at a time. That means you don't need an array, just one pair. You could pass 0 to the function regardless of the values, and have the decoder ignore its input and look in the global location.
If both special cases apply (32 bit and no retaining of the value), then you can put the pair on the stack, and use no globals and no dynamic allocation even if your code does need to be re-entrant.
None of this is really recommended, but it could solve the problem you have.
You can use an intermediate global data structure to store the pair of uint32_t on it, using your only uint32_t parameter as the index on the structure:
struct my_pair {
uint32_t a, b;
};
std::map<uint32_t, my_pair> global_pair_map;
uint32_t register_new_pair(uint32_t a, uint32_t b) {
// Add the pair of (a, b) to the map global_pair_map on a new key, and return the
// new key value.
}
void release_pair(uint32_t key) {
// Remove the key from the global_pair_map.
}
void callback(uint32_t user_data) {
my_pair& p = global_pair_map[user_data];
// Use your pair of uint32_t with p.a, and p.b.
}
void main() {
uint32_t key = register_new_pair(number1, number2);
register_callback(callback, key);
}

Pointer conversion to long porting issue in 64 bit env

I'm porting an application from 32 bit to 64 bit.
It is C style coding (legacy product) although it is C++. I have an issue where a combination of union and struct are used to store values. Here a custom datatype called "Any" is used that should hold data of any basic datatype. The implementation of Any is as follows:
typedef struct typedvalue
{
long data; // to hold all other types of 4 bytes or less
short id; // this tells what type "data" is holding
short sign; // this differentiates the double value from the rest
}typedvalue;
typedef union Any
{
double any_any;
double any_double; // to hold double value
typedvalue any_typedvalue;
}Any;
The union is of size 8 bytes. They have used union so that at a given time there will only be one value and they have used struct to differentiate the type. You can store a double, long, string, char, float and int values at any given time. Thats the idea.
If its a double value, the value is stored in any_double. if its any other type, then its stored in "data" and the type of the value is stored in the "id". The "sign" would tell if value "Any" is holding a double or another type.
any_any is used liberally in the code to copy the value in the address space irrespective of the type. (This is our biggest problem since we do not know at a given time what it will hold!)
If its a string or pointer "Any" is suppose to hold, it is stored in "data" (which is of type long). In 64 bit, here is where the problem lies. pointers are 8 bytes. So we will need to change the "long" to an equivalent 8 byte (long long). But then that would increase the size of the union to 16 bytes and the liberal usage of "any_any" will cause problems. There are too many usage of "any_any" and you are never sure what it can hold.
I already tried these steps and it turned unsuccessful:
1. Changed the "long data" to "long long data" in the struct, this will make the size of the union to 16 bytes. - This will not allow the data to be passed as "any_any" (8 bytes).
2. Declared the struct as a pointer inside union. And changed the "long data" to "long long data" inside struct. - the issue encountered here was that, since its a pointer we need to allocate memory for the struct. The liberal use of "any_any" makes it difficult for us to allocate memory. Sometimes we might overwrite the memory and hence erase the value.
3. Create a separate collection that will hold the value for "data" (a key value pair). - This will not work because this implementation is at the core of application, the collection will run into millions of data.
Can anybody help me in this?
"Can anybody help me" this sounds like a cry of desperation, and I totally understand it.
Whoever wrote this code had absolutely no respect for future-proofing, or of portability, and now you're paying the price.
(Let this be a lesson to anyone who says "but our platform is 32bit! we will never use 64bit!")
I know you're going to say "but the codebase is too big", but you are better off rewriting the product. And do it properly this time!
Ignoring that fact that the original design is insane, you could use <stdint.h> (or soon <cstdint> to get a little bit of predictability:
struct typedvalue
{
uint16_t id;
uint16_t sign;
uint32_t data;
};
union any
{
char any_raw[8];
double any_double
typedvalue any_typedvalue;
};
You're still not guaranteed that typedvalue will be tightly packed, since there are no alignment guarantees for non-char members. You could make a struct Foo { char x[8]; }; and type-pun your way around, like *(uint32_t*)(&Foo.x[0]) and *(uint16_t*)(&Foo.x[4]) if you must, but that too would be extremely ugly.
If you are in C++0x, I would definitely throw in a static assertion somewhere for sizeof(typedvalue) == sizeof(double).
If you need to store both an 8 byte pointer and a "type" field then you have no choice but to use at least 9 bytes, and on a 64-bit system alignment will likely pad that out to 16 bytes.
Your data structure should look something like:
typedef struct {
union {
void *any_pointer;
double any_double;
long any_long;
int any_int;
} any;
char my_type;
} any;
If using C++0x consider using a strongly typed enumeration for the my_type field. In earlier versions the storage required for an enum is implementation dependent and likely to be more than one byte.
To save memory you could use (compiler specific) directives to request optimal packing of the data structure, but the resulting mis-aligned memory accesses may cause performance issues.