Variable sized packet structs with vectors - c++

Lately I've been diving into network programming, and I'm having some difficulty constructing a packet with a variable "data" property. Several prior questions have helped tremendously, but I'm still lacking some implementation details. I'm trying to avoid using variable sized arrays, and just use a vector. But I can't get it to be transmitted correctly, and I believe it's somewhere during serialization.
Now for some code.
Packet Header
class Packet {
public:
void* Serialize();
bool Deserialize(void *message);
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
};
Packet ImpL
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
} Packet;
void* Packet::Serialize(int size) {
Packet* p = (Packet *) malloc(8 + 30);
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->data.assign(size,'&'); //just for testing purposes
}
bool Packet::Deserialize(void *message) {
Packet *s = (Packet*)message;
this->sender_id = ntohl(s->sender_id);
this->sequence_number = ntohl(s->sequence_number);
this->data = s->data;
}
During execution, I simply create a packet, assign it's members, and send/receive accordingly. The above methods are only responsible for serialization. Unfortunately, the data never gets transferred.
Couple of things to point out here. I'm guessing the malloc is wrong, but I'm not sure how else to compute it (i.e. what other value it would be). Other than that, I'm unsure of the proper way to use a vector in this fashion, and would love for someone to show me how (code examples please!) :)
Edit: I've awarded the question to the most comprehensive answer regarding the implementation with a vector data property. Appreciate all the responses!

This trick works with a C-style array at the end of the struct, but not with a C++ vector. There is no guarantee that the C++ vector class will (and it most likely won't) put its contained data in the "header object" that is present in the Packet struct. Instead, that object will contain a pointer to somewhere else, where the actual data is stored.

i think you might want to do like this:
`
struct PacketHeader
{
unsigned int senderId;
unsigned int sequenceNum;
};
class Packet
{
protected:
PacketHeader header;
std::vector<char> data;
public:
char* serialize(int& packetSize);
void deserialize(const char* data,int dataSize);
}
char* Packet::serialize(int& packetSize)
{
packetSize = this->data.size()+sizeof(PacketHeader);
char* packetData = new char[packetSize];
PacketHeader* packetHeader = (PacketHeader*)packetData;
packetHeader->senderId = htonl(this->header.senderId);
packetHeader->sequenceNum = htonl(this->header.sequenceNum);
char* packetBody = (packetData + sizeof(packetHeader));
for(size_t i=0 ; i<this->data.size() ; i++)
{
packetBody[i] = this->data.at(i);
}
return packetData;
}
void deserialize(const char* data,int dataSize)
{
PacketHeader* packetHeader = (PacketHeader*)data;
this->header.senderId = ntohl(packetHeader->senderId);
this->header.sequenceNum = ntohl(packetHeader->sequenceNum);
this->data.clear();
for(int i=sizeof(PacketHeader) ; i<dataSize ; i++)
{
this->data.push_back(data[i]);
}
}
`
those codes does not include bound checking and free allocated data, don't forget to delete the returned buffer from serialize() function, and also you can use memcpy instead of using loop to copy byte per byte into or from std::vector.
most compiler sometime add padding inside a structure, this would cause an issue if you send those data intact without disable the padding, you can do this by using #pragma pack(1) if you are using visual studio
disclaimer: i don't actually compile those codes, you might want to recheck it

I think the problem centres around you trying the 'serialise' the vector that way and you're probably assuming that the vector's state information gets transmitted. As you've found, that doesn't really work that way as you're trying to move an object across the network and things like pointers etc don't mean anything on the other machine.
I think the easiest way to handle this would be to change Packet to the following structure:
struct Packet {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int vector_size;
char data[1];
};
The data[1] bit is an old C trick for variable length array - it has to be the last element in the struct as you're essentially writing past the size of the struct. You have to get the allocation for the data structure right for this, otherwise you'll be in a world of hurt.
Your serialisation function then looks something like this:
void* Packet::Serialize(std::vector<char> &data) {
Packet* p = (Packet *) malloc(sizeof(Packet) + data.size());
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->vector_size = htonl(data.size());
::memcpy(p->data, data[0], size);
}
As you can see, we'll transmit the data size and the contents of the vector, copied into a plain C array which transmits easily. You have to keep in mind that in your network sending routine, you have to calculate the size of the structure properly as you'll have to send sizeof(Packet) + sizeof(data), otherwise you'll get the vector cut off and are back into nice buffer overflow territory.
Disclaimer - I haven't tested the code above, it's just written from memory so you might have to fix the odd compilation error.

I think you need to work directly with byte arrays returned by the socket functions.
For these purposes it's good to have two distinct parts of a message in your protocol. The first part is a fixed-size "header". This will include the size of the byes that follow, the "payload", or, data in your example.
So, to borrow some of your snippets and expand on them, maybe you'll have something like this:
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int data_length; // this is new
} PacketHeader;
So then when you get a buffer in, you'll treat it as a PacketHeader*, and check data_length to know how much bytes will appear in the byte vector that follows.
I would also add a few points...
Making these fields unsigned int is not wise. The standards for C and C++ don't specify how big int is, and you want something that will be predictable on all compilers. I suggest the C99 type uint32_t defined in <stdint.h>
Note that when you get bytes from the socket... It is in no way guaranteed to be the same size as what the other end wrote to send() or write(). You might get incomplete messages ("packets" in your terminology), or you might get multiple ones in a single read() or recv() call. It's your responsibility to buffer these if they are short of a single request, or loop through them if you get multiple requests in the same pass.

This cast is very dangerous as you have allocated some raw memory and then treated it as an initialized object of a non-POD class type. This is likely to cause a crash at some point.
Packet* p = (Packet *) malloc(8 + 30);
Looking at your code, I assume that you want to write out a sequence of bytes from the Packet object that the seralize function is called on. In this case you have no need of a second packet object. You can create a vector of bytes of the appropriate size and then copy the data across.
e.g.
void* Packet::Serialize(int size)
{
char* raw_data = new char[sizeof sender_id + sizeof sequence_number + data.size()];
char* p = raw_data;
unsigned int tmp;
tmp = htonl(sender_id);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
tmp = htonl(sequence_number);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
std::copy(data.begin(), data.end(), p);
return raw_data;
}
This may not be exactly what you intended as I'm not sure what the final object of your size parameter is and your interface is potentially unsafe as you return a pointer to raw data that I assume is supposed to be dynamically allocated. It is much safer to use an object that manages the lifetime of dynamically allocated memory then the caller doesn't have to guess whether and how to deallocate the memory.
Also the caller has no way of knowing how much memory was allocated. This may not matter for deallocation but presumably if this buffer is to be copied or streamed then this information is needed.
It may be better to return a std::vector<char> or to take one by reference, or even make the function a template and use an output iterator.

Related

Why is this example using memcpy to convert uint8_t* parameter to a structure?

I was using a TCP library that has an incoming data handler with the following signature:
static void handleData(void *arg, AsyncClient *client, void *data, size_t len)
When I tried to cast the data like the following the access the field values of the structure, the board crashed.
MyStructure* m = (MyStructure*)data;
In an example of an unrelated communication library, I had seen it using memcpy like the following, so I changed the casting code above to memcpy then it worked. But why is the example using memcpy instead of casting?
// Callback when data is received
void OnDataRecv(uint8_t * mac, uint8_t *incomingData, uint8_t len) {
memcpy(&incomingReadings, incomingData, sizeof(incomingReadings));
incomingTemp = incomingReadings.temp;
incomingHum = incomingReadings.hum;
}
The incomingReadings is declared as a global variable, but that variable is only used inside of that function, and only the fields which are copied to other global variables incomingTemp and incomingHum are used elsewhere. What if the example function were like the following, would it crash?
void OnDataRecv(uint8_t * mac, uint8_t *incomingData, uint8_t len) {
struct_message* incoming = (struct_message*)incomingData;
incomingTemp = incoming->temp;
incomingHum = incoming->hum;
}
PS: About the crashing above, I have tested more things to reproduce it with simpler code. It seems that the board does not crash at casting, but at accessing the cast variable.
The structure is as simple as
typedef struct TEST_TYPE
{
unsigned long a;
} TEST_TYPE;
and in the client, I sent a in
TEST_TYPE *a = new TEST_TYPE();
a->a = 1;
. In the server's handleData, I modified the code like below
static void handleData(void *arg, AsyncClient *client, void *data, size_t len)
{
Serial.printf("Data length = %i\n", len);
uint8_t* x = (uint8_t*)data;
for(int i =0; i<len; i++)
{
Serial.printf("%X, ", x[i]);
}
Serial.println("Casting.");
TEST_TYPE* a = (TEST_TYPE*)data;
Serial.println("Printing.");
Serial.printf("\nType = %i\n", a->a);
, and the output was
Data length = 4
1, 0, 0, 0, Casting.
Printing.
--------------- CUT HERE FOR EXCEPTION DECODER ---------------
Exception (9):
epc1=0x40201117 epc2=0x00000000 epc3=0x00000000 excvaddr=0x3fff3992 depc=0x00000000
>>>stack>>>
ctx: sys
sp: 3fffec30 end: 3fffffb0 offset: 0190
PS2: Seems like it indeed is an alignment issue. The exception code is 9 above, and according to this page, 9 means:
LoadStoreAlignmentCause Load or store to an unaligned address
I have found an old answer for a similar case. The author suggested some possible solutions
adding __attribute__((aligned(4))) to the buffer: I think this is not applicable in my case, because I did not create the data parameter.
adding __attribute__((packed)) to the structure: I have modified my structure like the following, and it did not crash this time.
typedef struct TEST_TYPE
{
unsigned long a;
} __attribute__((packed)) TEST_TYPE;
Read it by each one byte and construct the fields manually: This seems too much work.
Without the full picture of the lifetimes of all the data, it's hard to say what's going wrong in your particular case. Some thoughts:
uint8_t *bytes;
...
MyStructure* m = (MyStructure*)bytes;
What the snippet above is doing is using m to interpret the region of memory pointed to by bytes as a MyStructure. It's important to note that m is only valid as long as bytes is valid. When bytes goes out of scope (or freed, etc.), m is no longer valid.
uint8_t *bytes;
MyStructure m;
...
memcpy(&m, bytes, sizeof(MyStructure));
This snippet is copying the data referred to by bytes into m. At this point, m's lifetime is separate from bytes. Note that you could do the same thing with this syntax:
uint8_t *bytes;
MyStructure m;
...
m = *((MyStructure*)bytes)
This snippet is saying "treat bytes as a pointer to a MyStructure, then dereference the pointer and make a copy of it".
As #danadam points out in a comment, memcpy() should be used in the case of alignment issues.
Would it crash? Perhaps.
Essentially you're touching alignment and aliasing here.
The rules are here:
https://en.cppreference.com/w/cpp/language/object#Alignment
Your struct most probably has higher alignment requirements than 1 and therefore it depends on where in the memory the converted bytes are located if it will crash or not. As neither you nor the compiler can be sure of that, the cast is undefined behavior.
The only way your cast from void* to MyStructure* wouldn't be UB is when the void* was casted from a MyStructure* in the first place.
uint8 / char etc. have minimal alignment requirements (only 1 byte) and are therefore valid anywhere in a chunk of memory. That can be used to copy the memory into your correctly aligned object.

Is there a need to dereference when performing a memcpy using pointer to typedef fixed length array? Why or why not?

Ok - so I'll preface this by saying I'm not entirely sure how to describe the question and my current confusion, so I'll do my best to provide examples.
Question
Which of the two approaches to using the typedef-ed, fixed-length array in a memcpy call (shown below in "Context") is correct? Or are they equivalent?
(I'm starting the think that they are equivalent - some experimentation under "Notes", below).
Context
Consider the following typedef typedef uint8_t msgdata[150]; and the library interface const msgdata* IRead_GetMsgData (void); .
In my code, I use IRead_GetMsgData and memcpy the result into another uint8_t buffer (contrived example below).
//Included from library:
//typedef uint8_t msgdata[150];
//const msgdata* IRead_GetMsgData (void);
uint8_t mBuff[2048];
void Foo() {
const msgdata* myData = IRead_GetMsgData();
if(myData != nullptr) {
std::memcpy(mBuff, *myData, sizeof(msgdata));
}
}
Now, this works and passes our unit tests fine but it started a discussion between our team about whether we should dereference myData in this case. It turns out, not dereferencing myData also works and passes all our unit tests
std::memcpy(mBuff, myData, sizeof(msgdata)); //This works fine, too
My thought when writing the memcpy call was that, because myData is of type msgdata*, dereferencing it would return the pointed-to msgdata, which is a uint8_t array.
E.g.
typedef uint8 msgdata[150];
msgdata mData = {0u};
msgdata* pData = &mData;
memcpy(somePtr, pData, size); //Would expect this to fail - pData isn't the buffer mData.
memcpy(somePtr, *pData, size); //Would expect this to work - dereferencing pData returns the buffer mData
memcpy(somePtr, mData, size); //Would expect this to work - mData is the buffer, mData ==&mData[0]
I've tried searching for discussion of similar questions but haven't yet found anything that felt relevant:
Using new with fixed length array typedef - how to use/format a typedef
How to dereference typedef array pointer properly? - how to dereference a typedef-ed array and access its elements
typedef fixed length array - again how to format the typedef.
The last one in that list felt most relevant to me, as the accepted answer nicely states (emphasis mine)
[this form of typedef is] probably a very bad idea
Which, having now tried to understand what's actually going on, I couldn't agree with more! Not least because it hides the type you're actually trying to work with...
Notes
So after we started thinking on this, I did a bit of experimentation:
typedef uint8_t msgdata[150];
msgdata data = {0};
msgdata* pData = &data;
int main() {
printf("%p\n", pData);
printf("%p\n", *pData);
printf("%p\n", &data);
printf("%p\n", data);
return 0;
}
Outputs:
0x6020a0
0x6020a0
0x6020a0
0x6020a0
And if I extend that to include a suitable array, arr and a defined size value, size, I can use various memcpy calls such as
std::memcpy(arr, data, size);
std::memcpy(arr, pData, size);
std::memcpy(arr, *pData, size);
Which all behave the same, leading me to believe they are equivalent.
I understand the first and last versions (data and *pData), but I'm still unsure of what is happening regarding the pData version...
This code is, IMO, plain wrong. I'd also accept the alternative view "the code is very misleading"
//Included from library:
//typedef uint8_t msgdata[150];
//const msgdata* IRead_GetMsgData (void);
uint8_t mBuff[2048];
void Foo() {
const msgdata* myData = IRead_GetMsgData();
if(myData != nullptr) {
std::memcpy(mBuff, *myData, sizeof(msgdata));
}
}
When you dereference *myData, you mislead the reader. Obviously, memcpy requires a pointer to a msgdata, so the dereferencing star is not needed. myData is already a pointer. Introducing an extra dereference would break the code.
But it doesn't... Why?
That's where you specific use case kicks in. typedef uint8_t msgdata[150]; msgdata is an array that decays into a pointer. So, *msgdata is the array, and an array is(decays into) a pointer to its beginning.
So, you could argue: no big deal, I can leave my extra * in, right ?
No.
Because someday, someone will change the code to:
class msgdata
{
int something_super_useful;
uint8_t msgdata[150];
};
In this case, the compiler will catch it but, in general, an indirection level error might compile to a subtle crash. It would take you hours or days to find the extraneous *.

How to convert a variable size struct to char array

I am trying to serialize a structure for sending as a UDP message. The issue I am having is that the structure contains a variable length array of sub-structures as below:
struct SubStruct
{
short val1;
short val2;
};
struct Message
{
short numSubStructs;
SubStruct* structs;
};
The method I use for sending my fixed length messages is to cast the struct to a unsigned char*. Below MSG_LENGTH is equal to sizeof(short) + numSubStructs * sizeof(SubStruct)
send(socket, reinterpret_cast<unsigned char*>(&someMessage), MSG_LENGTH);
This works fine for all my fixed length messages but not for the variable length messages. Looking at the data sent out over the socket, I'm pretty sure it is sending the actual address of the structs pointer.
My question is, is there a way of serializing this kind of structure other than looping through the pointer (array) and appending to some buffer?
Thanks
Try something like this:
char *serializedMessage = new char[sizeof(short) + someMessage.numSubStructs * sizeof(SubStruct)];
// Error check here
// Insert the count of structs
memcpy(serializedMessage, &someMessage.numSubStructs, sizeof(short));
// Copy the structs themselves.
memcpy(&serializedMessage[sizeof(short)], someMessage.structs,
someMessage.numSubStructs * sizeof(SubStruct));
// Transmit serializedMessage
delete[] serializedMessage;
NOTE This does not pay attention to the endianess of the data, so it is highly likely to fail if the source and target machines have different endianess.
I'm not aware of an elegant way to do this in C++. There are some ugly ways however. Basically, allocate a buffer large enough to hold the entire 'unrolled' structure and sub-structure. If the last member of the struct is the variable sized element then it is not too bad to maintain. If there are multiple nested structures then it gets to be unwieldy.
Here is a C style example.
struct example{
int array_length;
some_struct array[1]; // beware of padding in structure between the fields
}
int number_of_structs = 2;
example* ptr = malloc(sizeof(int)+ number_of_structs*sizeof(some_struct));
ptr->array_lenth = number_of_structs;
ptr->array[0].first_field = 1;
ptr->array[1].first_field = 2;
send(socket, ptr, sizeof(int)+ number_of_structs*sizeof(some_struct));
There are also some (nonstandard) ways to do this with zero length arrays.

Deallocate structure using pointer arithmetics and a pointer to an element of that structure

I have the following structure in C++ :
struct wrapper
{
// Param constructor
wrapper(unsigned int _id, const char* _string1, unsigned int _year,
unsigned int _value, unsigned int _usage, const char* _string2)
:
id(_id), year(_year), value(_value), usage(_usage)
{
int len = strlen(_string1);
string1 = new char[len + 1]();
strncpy(string1, _string1, len);
len = strlen(_string2);
string2 = new char[len + 1]();
strncpy(string2, _string2, len);
};
// Destructor
~wrapper()
{
if(string1 != NULL)
delete [] string1;
if(string2 != NULL)
delete [] string2;
}
// Elements
unsigned int id;
unsigned int year;
unsigned int value;
unsigned int usage;
char* string1;
char* string2;
};
In main.cpp let's say I allocate memory for one object of this structure :
wrapper* testObj = new wrapper(125600, "Hello", 2013, 300, 0, "bye bye");
Can I now delete the entire object using pointer arithmetic and a pointer that points to one of the structure elements ?
Something like this :
void* ptr = &(testObj->string2);
ptr -= 0x14;
delete (wrapper*)ptr;
I've tested myself and apparently it works but I'm not 100% sure that is equivalent to delete testObj.
Thanks.
Technically, the code like this would work (ignoring the fact that wrapper testObj should be wrapper* testObj and that the offset is not necessarily 0x14, e.g. debug builds sometimes pad the structures, and maybe some other detail I missed), but it is a horrible, horrible idea. I can't stress hard enough how horrible it is.
Instead of 0x14 you could use offsetof macro.
If you like spending nights in the company of the debugger, sure, feel free to do so.
I will assume that the reason for the question is sheer curiosity about whether it is possible to use pointer arithmetic to navigate from members to parent, and not that you would like to really do it in production code. Please tell me I am right.
Can I now delete the entire object using pointer arithmetic and a pointer that points to one of the structure elements ?
Theoretically, yes.
The pointer that you give to delete needs to have the correct value, and it doesn't really matter whether that value comes from an existing pointer variable, or by "adjusting" one in this manner.
You also need to consider the type of the pointer; if nothing else, you should cast to char* before performing your arithmetic so that you are moving in steps of single bytes. Your current code will not compile because ISO C++ forbids incrementing a pointer of type 'void*' (how big is a void?).
However, I recommend not doing this at all. Your magic number 0x14 is unreliable, given alignment and padding and the potential of your structure to change shape.
Instead, store a pointer to the actual object. Also stop with all the horrid memory mess, and use std::string. At present, your lack of copy constructor is presenting a nasty bug.
You can do this sort of thing with pointer arithmetic. Whether you should is an entirely different story. Consider this macro (I know... I know...) that will give you the base address of a structure given its type, the name of a structure member and a pointer to that member:
#define ADDRESS_FROM_MEMBER(T, member, ptr) reinterpret_cast<T*>( \
reinterpret_cast<unsigned char *>(ptr) - (ptrdiff_t)(&(reinterpret_cast<T*>(0))->member))

Testing constructor initialization list

I am working on a test which checks if all class attributes are initialized in a constructor.
My current solution works for non pointer attributes:
void CSplitVectorTest::TestConstructorInitialization()
{
const size_t memorySize = sizeof(CSplitVector);
char* pBuffer1 = (char*) malloc(memorySize);
char* pBuffer2 = (char*) malloc(memorySize);
memset(pBuffer1,'?',memorySize);
memset(pBuffer2,'-',memorySize);
new(pBuffer1) CSplitVector;
new(pBuffer2) CSplitVector;
const bool bObjectsAreEqual = memcmp(pBuffer1,pBuffer2,memorySize)==0;
if (!TEST(bObjectsAreEqual))
{
COMMENT("Constructor initialization list not complete!");
}
free(pBuffer1);
free(pBuffer2);
}
Do you have an idea how could it be improved to test if pointers are initialized?
Your test checks whether every byte of the object has been written over by the constructor. As a straight memory check it looks OK, although if the class contains other objects which don't necessarily initialise themselves fully, you may be in trouble.
That said, my main question would be: Is it really an effective test? For example, is it critical that every attribute in the CSplitVector class is initialised by the initialisation list? Do you perhaps have some which may not need to be initialised at this point? Also, how about checking whether the attributes are set to values that you'd expect?
Instead of comparing byte by byte, you probably should use the right padding or word size, and test if any byte of each word got initialized. That way you will probably get around compiler using padding and constructor leaving uninitialized bytes between padded shorter-than-word fields.
To test the real padding size, shooting from the hip, following code should do it pretty reliably:
struct PaddingTest {
volatile char c; // volatile probably not needed, but will not hurt either
volatile int i;
static int getCharPadding() {
PaddingTest *t = new PaddingTest;
int diff = (int)(&(t->i)) - (int)&((t->c));
delete t;
return diff;
}
}
Edit: You still need the two objects, but you no longer compare them to each others, you just compare each initialized data to the memset value, and if either object has any change, it means the word got touched (also on the other one, it's just chance that it got initialized to same value you memset).
I found a solution for mentioned problems, tested it with initialized/not initialized pointers and with different length types.
In test header I added #pragma pack(1) (I am working on gcc)
#pragma pack(1)
#include <CSplitVector>
Test got a little bit complicated:
void CSplitVectorTest::TestConstructorInitialization()
{
const size_t memorySize = sizeof(CSplitVector);
char* pBuffer = (char*) malloc(memorySize);
memset(pBuffer,'?',memorySize);
CSplitVector* pSplitVector = new(pBuffer) CSplitVector;
// find pointers for all '?'
QList<char*> aFound;
char* pFoundChar = (char*) memchr(pBuffer,'?',memorySize);
while (pFoundChar)
{
aFound.append(pFoundChar);
char* pStartFrom = pFoundChar+1;
pFoundChar = (char*) memchr(pStartFrom,'?',memorySize-(int)(pStartFrom-pBuffer));
}
// if there are any '?'....
if (aFound.count())
{
// allocate the same area with '-'...
pSplitVector->~CSplitVector();
memset(pBuffer,'-',memorySize);
pSplitVector = new(pBuffer) CSplitVector;
// and check if places found before contain '-'
while (aFound.count())
{
pFoundChar = aFound.takeFirst();
if (*pFoundChar=='-')
{
// if yes then class has uninitialized attribute
TEST_FAILED("Constructor initialization list not complete!");
pSplitVector->~CSplitVector();
free(pBuffer);
return;
}
}
}
// if no then all attributes are initialized
pSplitVector->~CSplitVector();
free(pBuffer);
TEST(true);
}
Feel free to point any flaws in this solution.