Deserialize a byte array to a struct - c++

I get a transmission over the network that's an array of chars/bytes. It contains a header and some data. I'd like to map the header onto a struct. Here's an example:
#pragma pack(1)
struct Header
{
unsigned short bodyLength;
int msgID;
unsigned short someOtherValue;
unsigned short protocolVersion;
};
int main()
{
boost::array<char, 128> msgBuffer;
Header header;
for(int x = 0; x < sizeof(Header); x++)
msgBuffer[x] = 0x01; // assign some values
memcpy(&header, msgBuffer.data(), sizeof(Header));
system("PAUSE");
return 0;
}
Will this always work assuming the structure never contains any variable length fields? Is there a platform independent / idiomatic way of doing this?
Note:
I have seen quite a few libraries on the internet that let you serialize/deserialize, but I get the impression that they can only deserialize something if it has ben previously serialized with the same library. Well, I have no control over the format of the transmission. I'm definitely going to get a byte/char array where all the values just follow upon each other.

Just plain copying is very likely to break, at least if the data can come from a different architecture (or even just compiler) than what you are on. This is for reasons of:
Endianness
Structure packing
That second link is GCC-specific, but this applies to all compilers.
I recommend reading the fields byte-by-byte, and assembling larger field (ints, etc) from those bytes. This gives you control of endianness and padding.

Some processors require that certain types are properly aligned. They will not accept the specified packing and generate a hardware trap.
And even on common x86 packed structures can cause the code to run more slowly.
Also you will have to take care when working with different endianness platforms.
By the way, if you want a simple and platform-independent communication mechanism with bindings to many programming languages, then have a look at YAMI.

The #pragma pack(1) directive should work on most compilers but you can check by working out how big your data structure should be (10 in your case if my maths is correct) and using printf("%d", sizeof(Header)); to check that the packing is being done.
As others have said you still need to be wary of Endianness if you're going between architectures.

I strongly disagree with the idea of reading byte by byte. If you take care of the structure packing in the struct declaration, you can copy into the struct without a problem. For the endiannes problem again reading byte by byte solves the problem but does not give you a generic solution. That method is very lame. I have done something like this before for a similar job and it worked allright without a glitch.
Think about this. I have a structure, I also have a corresponding definition of that structure. You may construct this by hand but I have had written a parser for this and used it for other things as well.
For example, the definition of the structure you gave above is "s i s s". ( s = short , i = int ) Then I give the struct address , this definition and structure packing option of this struct to a special function that deals with the endiannes thing and voila it is done.
SwitchEndianToBig(&header, "s i s s", 4); // 4 = structure packing option

Tell me if I'm wrong, but AFAIK, doing it that way will guarantee you that the data is correct - assuming the types have the same size on your different platforms :
#include <array>
#include <algorithm>
//#pragma pack(1) // not needed
struct Header
{
unsigned short bodyLength;
int msgID;
unsigned short someOtherValue;
unsigned short protocolVersion;
float testFloat;
Header() : bodyLength(42), msgID(34), someOtherValue(66), protocolVersion(69), testFloat( 3.14f ) {}
};
int main()
{
std::tr1::array<char, 128> msgBuffer;
Header header;
const char* rawData = reinterpret_cast< const char* >( &header );
std::copy( rawData, rawData + sizeof(Header), msgBuffer.data()); // assuming msgBuffer is always big enough
system("PAUSE");
return 0;
}
If the types are different on your targeted plateforms, you have to uses aliases (typedef) for each type to be sure the size of each used type is the same.

I know who I'm communicating with, so I don't really have to worry about endianness. But I like to stay away from compiler specific commands anyway.
So how about this:
const int kHeaderSizeInBytes = 6;
struct Header
{
unsigned short bodyLength;
unsigned short msgID;
unsigned short protocolVersion;
unsigned short convertUnsignedShort(char inputArray[sizeof(unsigned short)])
{return (((unsigned char) (inputArray[0])) << 8) + (unsigned char)(inputArray[1]);}
void operator<<(char inputArray[kHeaderSizeInBytes])
{
bodyLength = convertUnsignedShort(inputArray);
msgID = convertUnsignedShort(inputArray + sizeof(bodyLength));
protocolVersion = convertUnsignedShort(inputArray + sizeof(bodyLength) + sizeof(msgID));
}
};
int main()
{
boost::array<char, 128> msgBuffer;
Header header;
for(int x = 0; x < kHeaderSizeInBytes; x++)
msgBuffer[x] = x;
header << msgBuffer.data();
system("PAUSE");
return 0;
}
Gets rid of the pragma, but it isn't as general purpose as I'd like. Every time you add a field to the header you have to modify the << function. Can you iterate over struct fields somehow, get the type of the field and call the corresponding function?

Related

How to convert a variable size struct to char array

I am trying to serialize a structure for sending as a UDP message. The issue I am having is that the structure contains a variable length array of sub-structures as below:
struct SubStruct
{
short val1;
short val2;
};
struct Message
{
short numSubStructs;
SubStruct* structs;
};
The method I use for sending my fixed length messages is to cast the struct to a unsigned char*. Below MSG_LENGTH is equal to sizeof(short) + numSubStructs * sizeof(SubStruct)
send(socket, reinterpret_cast<unsigned char*>(&someMessage), MSG_LENGTH);
This works fine for all my fixed length messages but not for the variable length messages. Looking at the data sent out over the socket, I'm pretty sure it is sending the actual address of the structs pointer.
My question is, is there a way of serializing this kind of structure other than looping through the pointer (array) and appending to some buffer?
Thanks
Try something like this:
char *serializedMessage = new char[sizeof(short) + someMessage.numSubStructs * sizeof(SubStruct)];
// Error check here
// Insert the count of structs
memcpy(serializedMessage, &someMessage.numSubStructs, sizeof(short));
// Copy the structs themselves.
memcpy(&serializedMessage[sizeof(short)], someMessage.structs,
someMessage.numSubStructs * sizeof(SubStruct));
// Transmit serializedMessage
delete[] serializedMessage;
NOTE This does not pay attention to the endianess of the data, so it is highly likely to fail if the source and target machines have different endianess.
I'm not aware of an elegant way to do this in C++. There are some ugly ways however. Basically, allocate a buffer large enough to hold the entire 'unrolled' structure and sub-structure. If the last member of the struct is the variable sized element then it is not too bad to maintain. If there are multiple nested structures then it gets to be unwieldy.
Here is a C style example.
struct example{
int array_length;
some_struct array[1]; // beware of padding in structure between the fields
}
int number_of_structs = 2;
example* ptr = malloc(sizeof(int)+ number_of_structs*sizeof(some_struct));
ptr->array_lenth = number_of_structs;
ptr->array[0].first_field = 1;
ptr->array[1].first_field = 2;
send(socket, ptr, sizeof(int)+ number_of_structs*sizeof(some_struct));
There are also some (nonstandard) ways to do this with zero length arrays.

Union hack for endian testing and byte swapping

For a union, writing to one member and reading from other member (except for char array) is UB.
//snippet 1(testing for endianess):
union
{
int i;
char c[sizeof(int)];
} x;
x.i = 1; // writing to i
if(x.c[0] == 1) // reading from c[0]
{ printf("little-endian\n");
}
else
{ printf("big-endian\n");
}
//snippet 2(swap bytes using union):
int swapbytes()
{
union // assuming 32bit, sizeof(int)==4
{
int i;
char c[sizeof(int)];
} x;
x.i = 0x12345678; // writing to member i
SWAP(x.ch[0],x.ch[3]); // writing to char array elements
SWAP(x.ch[1],x.ch[2]); // writing to char array elements
return x.i; // reading from x.i
}
Snippet 1 is legal C or C++ but not snippet 2. Am I correct? Can some one point to the section of standard where it says its OK to write to a member of union and read from another member which is a char array.
There is a really simple way that gets round the undefined behaviour (well undefinied behvaiour that is defined in pretty much every compiler out there ;)).
uint32_t i = 0x12345678;
char ch[4];
memcpy( ch, &i, 4 );
bool bLittleEndian = ch[0] == 0x78;
This has the added bonus that pretty much every compiler out there will see that you are memcpying a constant number of bytes and optimise out the memcpy completely resulting in exactly the same code as your snippet 1 while staying totally within the rules!
I believe it (snippet 1) is technically not allowed, but most compilers allow it anyway because people use this kind of code. GCC even documents that it is supported.
You will have problems on some machines where sizeof(int) == 1, and possibly on some that are neither big endian nor little endian.
Either use available functions that change words to the proper order, or set this with a configuration macro. You probably need to recognize compiler and OS anyway.

Best way to initialize a statically initialized per-struct character buffer?

Continuing from Absolute fastest (and hopefully elegant) way to return a certain char buffer given a struct type I want to now initialize once each static character buf per struct individually.
Ie, for:
#pragma pack(push, 1);
struct Header {
int a;
int b;
char c;
};
struct X {
int x;
int y;
};
struct Y {
char someStr[20];
};
struct Msg {
Header hdr;
union {
X x;
Y y;
};
};
#pragma pack(pop)
We have:
tempate<typename T>
struct Buffer {
static char buffer[sizeof(T)];
}
template<class T>
inline char* get_buffer() {
return Buffer<T>::buffer;
}
The two things I'm looking for are:
There are exactly 2 buffers: 1 for X and one for Y. They should each be the length of sizeof(Msg.hdr) + sizeof(Msg.x) and sizeof(Msg.hdr) + sizeof(Msg.y), respectively.
Each buffer will be retrieved a lot during the application lifetime and only some fields really (or need to) change.
2a. Msg for X backed by it's char buffer should be initialized to m.hdr.a = 1, m.hdr.b = 0; and for Msg Y it should be m.hdr.a = 16; m.hdr.b = 1; as an example.
The app will frequently fetch these buffers as type Msg backed by either X or Y (the app would know which one) and then change x and y or someStr only and then output it to the file for example then repeat.
Just wondering what nice way builds on these great examples by #6502 and #Fred Nurk to elegantly initialize these 2 buffers while being human readable. I'd prefer to keep using structs and to limit the use of reinterpret_cast<>() as much as possible as there may be aliasing issues that might develop.
Please let me know if I'm not clear and I will do my best to answer any questions and/or edit this question description.
Thanks.
*** Update: my usage pattern of these buffers is that I will be sending copying the char* out to a stream or file. hence I need to get a char* pointer to the underlying data. However I need to work on the char buffers via their structs for readability and convenience. Also this char buffer should be decoupled and not necessarily contained or "attached" to the struct as the structs are pretty much in separate files and used elsewhere where the buffers are not needed/wanted. Would just doing a simple static X x; static Y y; suffice or Maybe better buffers of length Header + X for X's Msg buffer? and then somehow just keep a char* reference to each Msg for X and Y? Will I run into aliasing issues potentially?
If you would be writing it in C, you could look into a fairly common C compiler extension called "cast to a union type", but in C++ it is no longer present.
In C++ there is no way around reinterpret_cast<> for what you require, but at least you can do it fairly safely by calculating the member offset on NULL pointer casted to the union, and then subtracting this offset from your data pointer before casting it to the union. I believe that on most compilers the offset will be 0, but it is better to be on the safe side.
template<class T>
union Aligner {
T t;
char buffer[sizeof(T)];
};
template<class T>
inline char* get_buffer(T* pt) {
return reinterpret_cast<Aligner<T>*>(reinterpret_cast<char*>(pt) - reinterpret_cast<ptrdiff_t>(&reinterpret_cast<Aligner<T>*>(NULL)->t))->buffer;
}

Variable sized packet structs with vectors

Lately I've been diving into network programming, and I'm having some difficulty constructing a packet with a variable "data" property. Several prior questions have helped tremendously, but I'm still lacking some implementation details. I'm trying to avoid using variable sized arrays, and just use a vector. But I can't get it to be transmitted correctly, and I believe it's somewhere during serialization.
Now for some code.
Packet Header
class Packet {
public:
void* Serialize();
bool Deserialize(void *message);
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
};
Packet ImpL
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
} Packet;
void* Packet::Serialize(int size) {
Packet* p = (Packet *) malloc(8 + 30);
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->data.assign(size,'&'); //just for testing purposes
}
bool Packet::Deserialize(void *message) {
Packet *s = (Packet*)message;
this->sender_id = ntohl(s->sender_id);
this->sequence_number = ntohl(s->sequence_number);
this->data = s->data;
}
During execution, I simply create a packet, assign it's members, and send/receive accordingly. The above methods are only responsible for serialization. Unfortunately, the data never gets transferred.
Couple of things to point out here. I'm guessing the malloc is wrong, but I'm not sure how else to compute it (i.e. what other value it would be). Other than that, I'm unsure of the proper way to use a vector in this fashion, and would love for someone to show me how (code examples please!) :)
Edit: I've awarded the question to the most comprehensive answer regarding the implementation with a vector data property. Appreciate all the responses!
This trick works with a C-style array at the end of the struct, but not with a C++ vector. There is no guarantee that the C++ vector class will (and it most likely won't) put its contained data in the "header object" that is present in the Packet struct. Instead, that object will contain a pointer to somewhere else, where the actual data is stored.
i think you might want to do like this:
`
struct PacketHeader
{
unsigned int senderId;
unsigned int sequenceNum;
};
class Packet
{
protected:
PacketHeader header;
std::vector<char> data;
public:
char* serialize(int& packetSize);
void deserialize(const char* data,int dataSize);
}
char* Packet::serialize(int& packetSize)
{
packetSize = this->data.size()+sizeof(PacketHeader);
char* packetData = new char[packetSize];
PacketHeader* packetHeader = (PacketHeader*)packetData;
packetHeader->senderId = htonl(this->header.senderId);
packetHeader->sequenceNum = htonl(this->header.sequenceNum);
char* packetBody = (packetData + sizeof(packetHeader));
for(size_t i=0 ; i<this->data.size() ; i++)
{
packetBody[i] = this->data.at(i);
}
return packetData;
}
void deserialize(const char* data,int dataSize)
{
PacketHeader* packetHeader = (PacketHeader*)data;
this->header.senderId = ntohl(packetHeader->senderId);
this->header.sequenceNum = ntohl(packetHeader->sequenceNum);
this->data.clear();
for(int i=sizeof(PacketHeader) ; i<dataSize ; i++)
{
this->data.push_back(data[i]);
}
}
`
those codes does not include bound checking and free allocated data, don't forget to delete the returned buffer from serialize() function, and also you can use memcpy instead of using loop to copy byte per byte into or from std::vector.
most compiler sometime add padding inside a structure, this would cause an issue if you send those data intact without disable the padding, you can do this by using #pragma pack(1) if you are using visual studio
disclaimer: i don't actually compile those codes, you might want to recheck it
I think the problem centres around you trying the 'serialise' the vector that way and you're probably assuming that the vector's state information gets transmitted. As you've found, that doesn't really work that way as you're trying to move an object across the network and things like pointers etc don't mean anything on the other machine.
I think the easiest way to handle this would be to change Packet to the following structure:
struct Packet {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int vector_size;
char data[1];
};
The data[1] bit is an old C trick for variable length array - it has to be the last element in the struct as you're essentially writing past the size of the struct. You have to get the allocation for the data structure right for this, otherwise you'll be in a world of hurt.
Your serialisation function then looks something like this:
void* Packet::Serialize(std::vector<char> &data) {
Packet* p = (Packet *) malloc(sizeof(Packet) + data.size());
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->vector_size = htonl(data.size());
::memcpy(p->data, data[0], size);
}
As you can see, we'll transmit the data size and the contents of the vector, copied into a plain C array which transmits easily. You have to keep in mind that in your network sending routine, you have to calculate the size of the structure properly as you'll have to send sizeof(Packet) + sizeof(data), otherwise you'll get the vector cut off and are back into nice buffer overflow territory.
Disclaimer - I haven't tested the code above, it's just written from memory so you might have to fix the odd compilation error.
I think you need to work directly with byte arrays returned by the socket functions.
For these purposes it's good to have two distinct parts of a message in your protocol. The first part is a fixed-size "header". This will include the size of the byes that follow, the "payload", or, data in your example.
So, to borrow some of your snippets and expand on them, maybe you'll have something like this:
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int data_length; // this is new
} PacketHeader;
So then when you get a buffer in, you'll treat it as a PacketHeader*, and check data_length to know how much bytes will appear in the byte vector that follows.
I would also add a few points...
Making these fields unsigned int is not wise. The standards for C and C++ don't specify how big int is, and you want something that will be predictable on all compilers. I suggest the C99 type uint32_t defined in <stdint.h>
Note that when you get bytes from the socket... It is in no way guaranteed to be the same size as what the other end wrote to send() or write(). You might get incomplete messages ("packets" in your terminology), or you might get multiple ones in a single read() or recv() call. It's your responsibility to buffer these if they are short of a single request, or loop through them if you get multiple requests in the same pass.
This cast is very dangerous as you have allocated some raw memory and then treated it as an initialized object of a non-POD class type. This is likely to cause a crash at some point.
Packet* p = (Packet *) malloc(8 + 30);
Looking at your code, I assume that you want to write out a sequence of bytes from the Packet object that the seralize function is called on. In this case you have no need of a second packet object. You can create a vector of bytes of the appropriate size and then copy the data across.
e.g.
void* Packet::Serialize(int size)
{
char* raw_data = new char[sizeof sender_id + sizeof sequence_number + data.size()];
char* p = raw_data;
unsigned int tmp;
tmp = htonl(sender_id);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
tmp = htonl(sequence_number);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
std::copy(data.begin(), data.end(), p);
return raw_data;
}
This may not be exactly what you intended as I'm not sure what the final object of your size parameter is and your interface is potentially unsafe as you return a pointer to raw data that I assume is supposed to be dynamically allocated. It is much safer to use an object that manages the lifetime of dynamically allocated memory then the caller doesn't have to guess whether and how to deallocate the memory.
Also the caller has no way of knowing how much memory was allocated. This may not matter for deallocation but presumably if this buffer is to be copied or streamed then this information is needed.
It may be better to return a std::vector<char> or to take one by reference, or even make the function a template and use an output iterator.

Actual total size of struct's members

I must write array of struct Data to hard disk:
struct Data {
char cmember;
/* padding bytes */
int imember;
};
AFAIK, most of compilers will add some padding bytes between cmember and imember members of Data, but I want save to file only actual data (without paddings).
I have next code for saving Datas array (in buffer instead of file for simplification):
bool saveData(Data* data, int dataLen, char* targetBuff, int buffLen)
{
int actualLen = sizeof(char) + sizeof(int); // this code force us to know internal
// representation of Data structure
int actualTotalLen = dataLen * actualLen;
if(actualTotalLen > buffLen) {
return false;
}
for(int i = 0; i &lt dataLen; i++) {
memcpy(targetBuff, &data[i].cmember, sizeof(char));
targetBuff += sizeof(char);
memcpy(targetBuff, &data[i].imember, sizeof(int));
targetBuff += sizeof(int);
}
return true;
}
As you can see, I calculate actual size of Data struct with the code: int actualLen = sizeof(char) + sizeof(int). Is there any alternative to this ? (something like int actualLen = actualSizeof(Data))
P.S. this is synthetic example, but I think you understand idea of my question...
Just save each member of the struct one at a time. If you overload << to write a variable to a file, you can have
myfile << mystruct.member1 << mystruct.member2;
Then you could even overload << to take an entire struct, and do that inside the struct's operator<<, so in the end you have:
myfile << mystruct;
Resulting in save code that looks like:
myfile << count;
for (int i = 0; i < count; ++i)
myFile << data[i];
IMO all that fiddling about with memory addresses and memcpy is too much of a headache when you could do it this way. This general technique is called serialization - hit google for more, it's a well-developed area.
You will have to pack your structure.
The way to do that changes depending on the compiler you are using.
For visual c++:
#pragma pack(push)
#pragma pack(1)
struct PackedStruct {
/* members */
};
#pragma pack(pop)
This will tell the compiler to not pad members in the structure and restore the pack parameter to its initial value. Be aware that this will affect performance. If this struicture is used in critical code, you might want to copy the unpacked structure into a packed structure.
Also, resist temptations to use the command line parameter that totally disable padding, this will greatly affect performance.
IIUC, you are trying to copy the values of the structure members rather than the structure as a whole and store it to disk. Your approach looks good to me. I do not agree with those suggesting #pragma pack -- since they will help you get a packed structure at runtime.
Few notes:
sizeof(char) == 1, always, by definition
use the offsetof() macro
do not try to instantiate a Data object directly from this targetBuff (i.e. via casting) -- this is when you get into alignment issues and trip. Instead, copy the members out as you did while writing the buffer and you should not have issues
There is not an easy solution to this problem. You can usually create separate structures and tell the compiler to pack them tightly, something like:
/* GNU has attributes */
struct PackedData {
char cmember;
int imember;
} __attribute__((packed));
or:
/* MSVC has headers and #pragmas */
#include <pshpack1.h>
struct PackedData {
char cmember;
int imember;
};
#include <poppack.h>
Then you have to write code that transforms your unpacked structures into packed structures and vice-versa. If you are using C++, you can create template helper functions that are predicated on the structure type and then specialize them:
template <typename T>
std::ostream& encode_to_stream(std::ostream& os, T const& object) {
return os.write((char const*)&object, sizeof(object));
}
template <typename T>
std::istream& decode_from_stream(std::istream& is, T& object) {
return is.read((char*)&object, sizeof(object));
}
template<>
std::ostream& encode_to_stream<Data>(std::ostream& os, Data const& object) {
encode_to_stream<char>(os, object.cmember);
encode_to_stream<int>(os, object.imember);
return os;
}
template <>
std::istream& decode_from_stream<Data>(std::istream& is, Data& object) {
decode_from_stream<char>(is, object.cmember);
decode_from_stream<int>(is, object.imember);
return is;
}
The bonus is that the defaults will read and write POD objects including the padding. You can specialize as necessary to optimize your storage. However, you probably want to consider endianess, versioning, and other binary storage issues as well. It might be prudent to simply write an archival class that wraps your storage and provides methods for serialization and deserialization of primitives and then an open ended method that you can specialize as needed:
class Archive {
protected:
typedef unsigned char byte;
void writeBytes(byte const* byte_ptr, std::size_t byte_size) {
m_fstream.write((char const*)byte_ptr, byte_size);
}
public:
template <typename T>
void writePOD(T const& pod) {
writeBytes((byte const*)&pod, sizeof(pod));
}
// Users are required to specialize this to use it. If it is used
// for a type that it is not specialized for, a link error will occur.
template <typename T> void serializeObject(T const& obj);
};
template<>
void Archive::serializeObject<Data>(Data const& obj) {
writePOD(cmember);
writePOD(imember);
}
This is the approach that I have always ended up at after a bunch of perturbations in between. It is nicely extensible without requiring inheritance and gives you the flexibility to change your underlying data storage format as needed. You can even specialize writePOD to do different things for different underlying data types like ensuring that multibyte integers are written in network order or whatnot.
Don't know if this will help you, but I'm in the habit of ordering the members of the structs that I intend to write to files (or send over networks) so they have as little padding as possible. This is done my putting the members with the widest datatypes and most strict alignment first:
• pointers first
•double
•long long
•long
•float
•int
•short
•char
• bitfields last
Any padding added by the compiler will come at the end of the struct data.
In other words, you could simplify your problem by eliminating the padding (if possible) by reordering the struct members:
struct Data
{
int imember;
char cmember;
/* padding bytes here */
};
Obviously this won't solve your problem if you can't reorder the struct members (because it's used by a third-party API or because you need the initial members to have specific datatypes).
I would say that you are actually looking for serialization.
There are a number of framework for serialization, but I personally prefer Google Protocol Buffers over Boost.Serialization and other approaches.
Protocol Buffers has versioning and binary/human readable output.
If you are concerned about size, you always have the possibility of compressing the data. There are lightning fast compression algorithm like LZW which offer a good ratio speed/compression for example.
Look into the #pragma pack macro for your compiler. Some compilers use #pragma options align=packed or something similar.
As you can see, I calculate actual size of Data struct with the code: int actualLen = sizeof(char) + sizeof(int). Is there any alternative to this ?
No, not in standard C++.
Your compiler might provide a compiler-specific option, though. Packed structs as shown by Graeme and Coincoin might do.
If you don't want to use pragma pack, try to manually re-order the variables,
like
struct Data {
int imember;
char cmember;
};
You said #Coincoin that can not pack. If you just need size for some reason, here is dirty solution
#define STRUCT_ELEMENTS char cmember;/* padding bytes */ int imember;
typedef struct
{
STRUCT_ELEMENTS
}paddedData;
#pragma pack(push)
#pragma pack(1)
typedef struct
{
STRUCT_ELEMENTS
}packedData;
#pragma pop
now you have size of both;
sizeof(packedData);
sizeof(paddedData);
Only reason that I can think of why you can not pack is linking this to other program. In that case you will need to pack your structure and then unpeck when working whit external program.
No, there is no way within the language proper to get this information. One way to approach a solution is to define your data classes indirectly, using some feature of the language - it could be as old-fashioned as macros and the preprocessor, or as new-fangled as tuple templates. You need something which lets you iterate over the class members systematically.
Here's a macro based approach:
#undef Data_MEMBERS
#define Data_MEMBERS(Data_OP) \
Data_OP(c, char) \
Data_OP(i, int)
#undef Data_CLASS_DEFINITION
#define Data_CLASS_DEFINITION(name, type) \
type name##member;
struct Data {
Data_MEMBERS(Data_CLASS_DEFINITION)
};
#define Data_SERIAL_SIZER(name, type) \
sizeof(type) +
#define Data_Serial_Size \
(Data_MEMBERS(Data_SERIAL_SIZER) 0)
And so forth.
If you can rewrite the struct definition, you could try to use field specifiers to get rid of the holes, like so:
struct Data {
char cmember : 1;
int imember : 4;
};
Sadly, this does not guarantee that it still won't place imember 4 bytes after the start of cmember. But many compilers will get the idea and do it anyway.
Other alternatives:
Reorder your members by size (largest first). This is an old embedded world trick to minimize holes.
Use Ada instead.
The code
type Data is record
cmember : character;
imember : integer;
end record;
for Data use record
cmember at 0 range 0..7;
imemeber at 1 range 0..31;
end record;
Does exactly what you want.