How to convert a variable size struct to char array

How to convert a variable size struct to char array - c++

I am trying to serialize a structure for sending as a UDP message. The issue I am having is that the structure contains a variable length array of sub-structures as below:
struct SubStruct
{
short val1;
short val2;
};
struct Message
{
short numSubStructs;
SubStruct* structs;
};
The method I use for sending my fixed length messages is to cast the struct to a unsigned char*. Below MSG_LENGTH is equal to sizeof(short) + numSubStructs * sizeof(SubStruct)
send(socket, reinterpret_cast<unsigned char*>(&someMessage), MSG_LENGTH);
This works fine for all my fixed length messages but not for the variable length messages. Looking at the data sent out over the socket, I'm pretty sure it is sending the actual address of the structs pointer.
My question is, is there a way of serializing this kind of structure other than looping through the pointer (array) and appending to some buffer?
Thanks

Try something like this:
char *serializedMessage = new char[sizeof(short) + someMessage.numSubStructs * sizeof(SubStruct)];
// Error check here
// Insert the count of structs
memcpy(serializedMessage, &someMessage.numSubStructs, sizeof(short));
// Copy the structs themselves.
memcpy(&serializedMessage[sizeof(short)], someMessage.structs,
someMessage.numSubStructs * sizeof(SubStruct));
// Transmit serializedMessage
delete[] serializedMessage;
NOTE This does not pay attention to the endianess of the data, so it is highly likely to fail if the source and target machines have different endianess.

I'm not aware of an elegant way to do this in C++. There are some ugly ways however. Basically, allocate a buffer large enough to hold the entire 'unrolled' structure and sub-structure. If the last member of the struct is the variable sized element then it is not too bad to maintain. If there are multiple nested structures then it gets to be unwieldy.
Here is a C style example.
struct example{
int array_length;
some_struct array[1]; // beware of padding in structure between the fields
}
int number_of_structs = 2;
example* ptr = malloc(sizeof(int)+ number_of_structs*sizeof(some_struct));
ptr->array_lenth = number_of_structs;
ptr->array[0].first_field = 1;
ptr->array[1].first_field = 2;
send(socket, ptr, sizeof(int)+ number_of_structs*sizeof(some_struct));
There are also some (nonstandard) ways to do this with zero length arrays.

Related

Why is this example using memcpy to convert uint8_t* parameter to a structure?

I was using a TCP library that has an incoming data handler with the following signature:
static void handleData(void *arg, AsyncClient *client, void *data, size_t len)
When I tried to cast the data like the following the access the field values of the structure, the board crashed.
MyStructure* m = (MyStructure*)data;
In an example of an unrelated communication library, I had seen it using memcpy like the following, so I changed the casting code above to memcpy then it worked. But why is the example using memcpy instead of casting?
// Callback when data is received
void OnDataRecv(uint8_t * mac, uint8_t *incomingData, uint8_t len) {
memcpy(&incomingReadings, incomingData, sizeof(incomingReadings));
incomingTemp = incomingReadings.temp;
incomingHum = incomingReadings.hum;
}
The incomingReadings is declared as a global variable, but that variable is only used inside of that function, and only the fields which are copied to other global variables incomingTemp and incomingHum are used elsewhere. What if the example function were like the following, would it crash?
void OnDataRecv(uint8_t * mac, uint8_t *incomingData, uint8_t len) {
struct_message* incoming = (struct_message*)incomingData;
incomingTemp = incoming->temp;
incomingHum = incoming->hum;
}
PS: About the crashing above, I have tested more things to reproduce it with simpler code. It seems that the board does not crash at casting, but at accessing the cast variable.
The structure is as simple as
typedef struct TEST_TYPE
{
unsigned long a;
} TEST_TYPE;
and in the client, I sent a in
TEST_TYPE *a = new TEST_TYPE();
a->a = 1;
. In the server's handleData, I modified the code like below
static void handleData(void *arg, AsyncClient *client, void *data, size_t len)
{
Serial.printf("Data length = %i\n", len);
uint8_t* x = (uint8_t*)data;
for(int i =0; i<len; i++)
{
Serial.printf("%X, ", x[i]);
}
Serial.println("Casting.");
TEST_TYPE* a = (TEST_TYPE*)data;
Serial.println("Printing.");
Serial.printf("\nType = %i\n", a->a);
, and the output was
Data length = 4
1, 0, 0, 0, Casting.
Printing.
--------------- CUT HERE FOR EXCEPTION DECODER ---------------
Exception (9):
epc1=0x40201117 epc2=0x00000000 epc3=0x00000000 excvaddr=0x3fff3992 depc=0x00000000
>>>stack>>>
ctx: sys
sp: 3fffec30 end: 3fffffb0 offset: 0190
PS2: Seems like it indeed is an alignment issue. The exception code is 9 above, and according to this page, 9 means:
LoadStoreAlignmentCause Load or store to an unaligned address
I have found an old answer for a similar case. The author suggested some possible solutions
adding __attribute__((aligned(4))) to the buffer: I think this is not applicable in my case, because I did not create the data parameter.
adding __attribute__((packed)) to the structure: I have modified my structure like the following, and it did not crash this time.
typedef struct TEST_TYPE
{
unsigned long a;
} __attribute__((packed)) TEST_TYPE;
Read it by each one byte and construct the fields manually: This seems too much work.

Without the full picture of the lifetimes of all the data, it's hard to say what's going wrong in your particular case. Some thoughts:
uint8_t *bytes;
...
MyStructure* m = (MyStructure*)bytes;
What the snippet above is doing is using m to interpret the region of memory pointed to by bytes as a MyStructure. It's important to note that m is only valid as long as bytes is valid. When bytes goes out of scope (or freed, etc.), m is no longer valid.
uint8_t *bytes;
MyStructure m;
...
memcpy(&m, bytes, sizeof(MyStructure));
This snippet is copying the data referred to by bytes into m. At this point, m's lifetime is separate from bytes. Note that you could do the same thing with this syntax:
uint8_t *bytes;
MyStructure m;
...
m = *((MyStructure*)bytes)
This snippet is saying "treat bytes as a pointer to a MyStructure, then dereference the pointer and make a copy of it".
As #danadam points out in a comment, memcpy() should be used in the case of alignment issues.

Would it crash? Perhaps.
Essentially you're touching alignment and aliasing here.
The rules are here:
https://en.cppreference.com/w/cpp/language/object#Alignment
Your struct most probably has higher alignment requirements than 1 and therefore it depends on where in the memory the converted bytes are located if it will crash or not. As neither you nor the compiler can be sure of that, the cast is undefined behavior.
The only way your cast from void* to MyStructure* wouldn't be UB is when the void* was casted from a MyStructure* in the first place.
uint8 / char etc. have minimal alignment requirements (only 1 byte) and are therefore valid anywhere in a chunk of memory. That can be used to copy the memory into your correctly aligned object.

C++ MPI create and send array of structs which has fields char[16] and integer

I know there is a question on the site here about this but that is implemented in c. I need c++ version of it.
Problem is the following:
I created a struct which has the following form
struct word
{
char value[WORD_MAX_LENGTH];
int freq;
};
I need to send the array of these structs. How can I serialize this struct and send say 500 of them to another process using MPI_Send.
Here is where I got so far:
MPI_Datatype word_type, oldtypes[2];
int blockcounts[2];
MPI_Aint offsets[2], extent;
//Determine the offset and block length of the first element of the struct which is a char array.
offsets[0] = 0;
oldtypes[0] = MPI_CHAR;
blockcounts[0] = WORD_MAX_LENGTH;
//Determine the offset of int ferq of the struct.
MPI_Type_extent(MPI_CHAR, &extent);
offsets[1] = 16 * extent;
blockcounts[1] = 1;
// Finally create the type.
MPI_Type_create_struct(2, blockcounts, offsets, oldtypes, &word_type);
MPI_Type_commit(&word_type);
When I do this, it compiles with no error however when I try to run it.
It complains
Fatal error in PMPI_Type_create_struct: Invalid datatype, error stack:
PMPI_Type_create_struct(173): MPI_Type_create_struct(count=2,
array_of_blocklengths=0x7fffbd059f70,
array_of_displacements=0x7fffbd059f80, array_of_types=0x7fffbd059f60,
newtype=0x7fffbd059f14) failed
PMPI_Type_create_struct(142): Invalid datatype
Any help will be appreciated.
Note: I can't use boost libraries.

There are at least two issues
oldtypes[1] is not initialized
offsets[1] looks odd, you can use the offsetsof() macro instead
You might also have to MPI_Type_create_resized() if your compiler adds some padding at the end of the word strict.

bmiColors field of BITMAPINFO structure

The BITMAPINFO structure has the following declaration
typedef struct tagBITMAPINFO {
BITMAPINFOHEADER bmiHeader;
RGBQUAD bmiColors[1];
} BITMAPINFO;
Why is the RGBQUAD array static? Why is it not a pointer?

It is a standard trick to declare a variable sized struct. The color table never has just one entry, it has at least 2 for a monochrome bitmap, typically 256 for a 8bpp bitmap, etc. Indicated by the bmiHeader.biClrUsed member. So the actual size of the struct depends on the bitmap format.
Since the C language doesn't permit declaring such a data structure, this is the closest match. Creating the structure requires malloc() to allocate sufficient bytes to store the structure, calculated from biClrUsed. Then a simple cast to (BITMAPINFO*) makes it usable.

It doesn't matter than it is static or not. The thing is, you'd still have to allocate enough memory for the palette. It is a RGBQuad because it stores only R, G, B, A and nothing more..
example:
for(i = 0; i < 256; i++)
{
lpbmpinfo->bmiColors[i].rgbRed = some_r;
lpbmpinfo->bmiColors[i].rgbGreen = some_g;
lpbmpinfo->bmiColors[i].rgbBlue = some_b;
lpbmpinfo->bmiColors[i].rgbReserved = 0;
}

There's no static keyword in the declaration. It's a completely normal struct member. It's used to declare a variable-sized struct with a single variable-sized array at the end
The size of the array is only known at compile time, but since arrays of size 0 are forbidden in C and C++ so we'll use array[1] instead. See the detailed explanation from MS' Raymond Chen in Why do some structures end with an array of size 1?
On some compilers like GCC zero-length arrays are allowed as an extension so Linux and many other platforms usually use array[0] instead of array[1]
Declaring zero-length arrays is allowed in GNU C as an extension. A zero-length array can be useful as the last element of a structure that is really a header for a variable-length object:
struct line {
int length;
char contents[0];
};
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
thisline->length = this_length;
Arrays of Length Zero
In C99 a new feature called flexible array member was introduced. Since then it's better to use array[] for portability
struct vectord {
size_t len;
double arr[]; // the flexible array member must be last
};
See also
What is the purpose of a zero length array in a struct?
What's the need of array with zero elements?
Array of zero length
Is empty array in the end of the structure a C standard?
What is the advantage of using zero-length arrays in C?

Variable sized packet structs with vectors

Lately I've been diving into network programming, and I'm having some difficulty constructing a packet with a variable "data" property. Several prior questions have helped tremendously, but I'm still lacking some implementation details. I'm trying to avoid using variable sized arrays, and just use a vector. But I can't get it to be transmitted correctly, and I believe it's somewhere during serialization.
Now for some code.
Packet Header
class Packet {
public:
void* Serialize();
bool Deserialize(void *message);
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
};
Packet ImpL
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
} Packet;
void* Packet::Serialize(int size) {
Packet* p = (Packet *) malloc(8 + 30);
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->data.assign(size,'&'); //just for testing purposes
}
bool Packet::Deserialize(void *message) {
Packet *s = (Packet*)message;
this->sender_id = ntohl(s->sender_id);
this->sequence_number = ntohl(s->sequence_number);
this->data = s->data;
}
During execution, I simply create a packet, assign it's members, and send/receive accordingly. The above methods are only responsible for serialization. Unfortunately, the data never gets transferred.
Couple of things to point out here. I'm guessing the malloc is wrong, but I'm not sure how else to compute it (i.e. what other value it would be). Other than that, I'm unsure of the proper way to use a vector in this fashion, and would love for someone to show me how (code examples please!) :)
Edit: I've awarded the question to the most comprehensive answer regarding the implementation with a vector data property. Appreciate all the responses!

This trick works with a C-style array at the end of the struct, but not with a C++ vector. There is no guarantee that the C++ vector class will (and it most likely won't) put its contained data in the "header object" that is present in the Packet struct. Instead, that object will contain a pointer to somewhere else, where the actual data is stored.

i think you might want to do like this:
`
struct PacketHeader
{
unsigned int senderId;
unsigned int sequenceNum;
};
class Packet
{
protected:
PacketHeader header;
std::vector<char> data;
public:
char* serialize(int& packetSize);
void deserialize(const char* data,int dataSize);
}
char* Packet::serialize(int& packetSize)
{
packetSize = this->data.size()+sizeof(PacketHeader);
char* packetData = new char[packetSize];
PacketHeader* packetHeader = (PacketHeader*)packetData;
packetHeader->senderId = htonl(this->header.senderId);
packetHeader->sequenceNum = htonl(this->header.sequenceNum);
char* packetBody = (packetData + sizeof(packetHeader));
for(size_t i=0 ; i<this->data.size() ; i++)
{
packetBody[i] = this->data.at(i);
}
return packetData;
}
void deserialize(const char* data,int dataSize)
{
PacketHeader* packetHeader = (PacketHeader*)data;
this->header.senderId = ntohl(packetHeader->senderId);
this->header.sequenceNum = ntohl(packetHeader->sequenceNum);
this->data.clear();
for(int i=sizeof(PacketHeader) ; i<dataSize ; i++)
{
this->data.push_back(data[i]);
}
}
`
those codes does not include bound checking and free allocated data, don't forget to delete the returned buffer from serialize() function, and also you can use memcpy instead of using loop to copy byte per byte into or from std::vector.
most compiler sometime add padding inside a structure, this would cause an issue if you send those data intact without disable the padding, you can do this by using #pragma pack(1) if you are using visual studio
disclaimer: i don't actually compile those codes, you might want to recheck it

I think the problem centres around you trying the 'serialise' the vector that way and you're probably assuming that the vector's state information gets transmitted. As you've found, that doesn't really work that way as you're trying to move an object across the network and things like pointers etc don't mean anything on the other machine.
I think the easiest way to handle this would be to change Packet to the following structure:
struct Packet {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int vector_size;
char data[1];
};
The data[1] bit is an old C trick for variable length array - it has to be the last element in the struct as you're essentially writing past the size of the struct. You have to get the allocation for the data structure right for this, otherwise you'll be in a world of hurt.
Your serialisation function then looks something like this:
void* Packet::Serialize(std::vector<char> &data) {
Packet* p = (Packet *) malloc(sizeof(Packet) + data.size());
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->vector_size = htonl(data.size());
::memcpy(p->data, data[0], size);
}
As you can see, we'll transmit the data size and the contents of the vector, copied into a plain C array which transmits easily. You have to keep in mind that in your network sending routine, you have to calculate the size of the structure properly as you'll have to send sizeof(Packet) + sizeof(data), otherwise you'll get the vector cut off and are back into nice buffer overflow territory.
Disclaimer - I haven't tested the code above, it's just written from memory so you might have to fix the odd compilation error.

I think you need to work directly with byte arrays returned by the socket functions.
For these purposes it's good to have two distinct parts of a message in your protocol. The first part is a fixed-size "header". This will include the size of the byes that follow, the "payload", or, data in your example.
So, to borrow some of your snippets and expand on them, maybe you'll have something like this:
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int data_length; // this is new
} PacketHeader;
So then when you get a buffer in, you'll treat it as a PacketHeader*, and check data_length to know how much bytes will appear in the byte vector that follows.
I would also add a few points...
Making these fields unsigned int is not wise. The standards for C and C++ don't specify how big int is, and you want something that will be predictable on all compilers. I suggest the C99 type uint32_t defined in <stdint.h>
Note that when you get bytes from the socket... It is in no way guaranteed to be the same size as what the other end wrote to send() or write(). You might get incomplete messages ("packets" in your terminology), or you might get multiple ones in a single read() or recv() call. It's your responsibility to buffer these if they are short of a single request, or loop through them if you get multiple requests in the same pass.

This cast is very dangerous as you have allocated some raw memory and then treated it as an initialized object of a non-POD class type. This is likely to cause a crash at some point.
Packet* p = (Packet *) malloc(8 + 30);
Looking at your code, I assume that you want to write out a sequence of bytes from the Packet object that the seralize function is called on. In this case you have no need of a second packet object. You can create a vector of bytes of the appropriate size and then copy the data across.
e.g.
void* Packet::Serialize(int size)
{
char* raw_data = new char[sizeof sender_id + sizeof sequence_number + data.size()];
char* p = raw_data;
unsigned int tmp;
tmp = htonl(sender_id);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
tmp = htonl(sequence_number);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
std::copy(data.begin(), data.end(), p);
return raw_data;
}
This may not be exactly what you intended as I'm not sure what the final object of your size parameter is and your interface is potentially unsafe as you return a pointer to raw data that I assume is supposed to be dynamically allocated. It is much safer to use an object that manages the lifetime of dynamically allocated memory then the caller doesn't have to guess whether and how to deallocate the memory.
Also the caller has no way of knowing how much memory was allocated. This may not matter for deallocation but presumably if this buffer is to be copied or streamed then this information is needed.
It may be better to return a std::vector<char> or to take one by reference, or even make the function a template and use an output iterator.

Deserialize a byte array to a struct

I get a transmission over the network that's an array of chars/bytes. It contains a header and some data. I'd like to map the header onto a struct. Here's an example:
#pragma pack(1)
struct Header
{
unsigned short bodyLength;
int msgID;
unsigned short someOtherValue;
unsigned short protocolVersion;
};
int main()
{
boost::array<char, 128> msgBuffer;
Header header;
for(int x = 0; x < sizeof(Header); x++)
msgBuffer[x] = 0x01; // assign some values
memcpy(&header, msgBuffer.data(), sizeof(Header));
system("PAUSE");
return 0;
}
Will this always work assuming the structure never contains any variable length fields? Is there a platform independent / idiomatic way of doing this?
Note:
I have seen quite a few libraries on the internet that let you serialize/deserialize, but I get the impression that they can only deserialize something if it has ben previously serialized with the same library. Well, I have no control over the format of the transmission. I'm definitely going to get a byte/char array where all the values just follow upon each other.

Just plain copying is very likely to break, at least if the data can come from a different architecture (or even just compiler) than what you are on. This is for reasons of:
Endianness
Structure packing
That second link is GCC-specific, but this applies to all compilers.
I recommend reading the fields byte-by-byte, and assembling larger field (ints, etc) from those bytes. This gives you control of endianness and padding.

Some processors require that certain types are properly aligned. They will not accept the specified packing and generate a hardware trap.
And even on common x86 packed structures can cause the code to run more slowly.
Also you will have to take care when working with different endianness platforms.
By the way, if you want a simple and platform-independent communication mechanism with bindings to many programming languages, then have a look at YAMI.

The #pragma pack(1) directive should work on most compilers but you can check by working out how big your data structure should be (10 in your case if my maths is correct) and using printf("%d", sizeof(Header)); to check that the packing is being done.
As others have said you still need to be wary of Endianness if you're going between architectures.

I strongly disagree with the idea of reading byte by byte. If you take care of the structure packing in the struct declaration, you can copy into the struct without a problem. For the endiannes problem again reading byte by byte solves the problem but does not give you a generic solution. That method is very lame. I have done something like this before for a similar job and it worked allright without a glitch.
Think about this. I have a structure, I also have a corresponding definition of that structure. You may construct this by hand but I have had written a parser for this and used it for other things as well.
For example, the definition of the structure you gave above is "s i s s". ( s = short , i = int ) Then I give the struct address , this definition and structure packing option of this struct to a special function that deals with the endiannes thing and voila it is done.
SwitchEndianToBig(&header, "s i s s", 4); // 4 = structure packing option

Tell me if I'm wrong, but AFAIK, doing it that way will guarantee you that the data is correct - assuming the types have the same size on your different platforms :
#include <array>
#include <algorithm>
//#pragma pack(1) // not needed
struct Header
{
unsigned short bodyLength;
int msgID;
unsigned short someOtherValue;
unsigned short protocolVersion;
float testFloat;
Header() : bodyLength(42), msgID(34), someOtherValue(66), protocolVersion(69), testFloat( 3.14f ) {}
};
int main()
{
std::tr1::array<char, 128> msgBuffer;
Header header;
const char* rawData = reinterpret_cast< const char* >( &header );
std::copy( rawData, rawData + sizeof(Header), msgBuffer.data()); // assuming msgBuffer is always big enough
system("PAUSE");
return 0;
}
If the types are different on your targeted plateforms, you have to uses aliases (typedef) for each type to be sure the size of each used type is the same.

I know who I'm communicating with, so I don't really have to worry about endianness. But I like to stay away from compiler specific commands anyway.
So how about this:
const int kHeaderSizeInBytes = 6;
struct Header
{
unsigned short bodyLength;
unsigned short msgID;
unsigned short protocolVersion;
unsigned short convertUnsignedShort(char inputArray[sizeof(unsigned short)])
{return (((unsigned char) (inputArray[0])) << 8) + (unsigned char)(inputArray[1]);}
void operator<<(char inputArray[kHeaderSizeInBytes])
{
bodyLength = convertUnsignedShort(inputArray);
msgID = convertUnsignedShort(inputArray + sizeof(bodyLength));
protocolVersion = convertUnsignedShort(inputArray + sizeof(bodyLength) + sizeof(msgID));
}
};
int main()
{
boost::array<char, 128> msgBuffer;
Header header;
for(int x = 0; x < kHeaderSizeInBytes; x++)
msgBuffer[x] = x;
header << msgBuffer.data();
system("PAUSE");
return 0;
}
Gets rid of the pragma, but it isn't as general purpose as I'd like. Every time you add a field to the header you have to modify the << function. Can you iterate over struct fields somehow, get the type of the field and call the corresponding function?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to convert a variable size struct to char array - c++

Related

Why is this example using memcpy to convert uint8_t* parameter to a structure?

C++ MPI create and send array of structs which has fields char[16] and integer

bmiColors field of BITMAPINFO structure

Variable sized packet structs with vectors

Deserialize a byte array to a struct

Categories

Resources