I would like to take the memory generated from my struct and push it into a byte array (char array) as well as the other way around (push the byte array back into a struct). It would be even better if I could skip the string generation step and go directly to writing memory into the EEPROM. (Do not worry about the eeprom bit, I can handle that by reading & writing individual bytes)
// These are just example structs (I will be using B)
typedef struct {int a,b,c;} A;
typedef struct {A q,w,e;} B;
#define OFFSET 0 // For now
void write(B input)
{
for (int i=0;i<sizeof(B);i++)
{
eepromWrite(i+OFFSET,memof(input,i));
}
}
B read()
{
B temp;
for (int i=0;i<sizeof(B);i++)
{
setmemof(temp,i,eepromRead(i+OFFSET));
}
return temp;
}
This example I wrote is not supposed to compile, it was meant to explain my ideas in a platform independent environment.
PLEASE NOTE: memof and setmemof do not exist. This is what I am asking for though my question. An alternative answer would be to use a char array as an intermediate step.
Assuming your structures contain objects and not pointers, you can do this with a simple cast:
save_b(B b) {
unsigned char b_data[sizeof(B)];
memcpy(b_data, (unsigned char *) &b, sizeof(B));
save_bytes(b_data, sizeof(B));
}
Actually, you shouldn't need to copy from the structure into a char array. I was just hoping to make the idea clear.
Be sure to look into #pragma pack, with determines how the elements in the stuctures are aligned. Any alignment greater than one byte may increase the size unnecessarily.
Related
How to copy to flexible array inside struct in c?
#include <stdio.h>
#include <string.h>
typedef struct
{
int val;
char buf[];
} foo;
int main()
{
foo f;
f.val = 4;
// f.buf = "asd"; -> invalid use of flexible array member
memcpy(f.buf, "asd\0", 4);
printf("%s\n", f.buf);
}
output:
asd
*** stack smashing detected ***: terminated
Aborted (core dumped)
Also, if the struct was declared as:
typedef struct
{
char buf[];
} foo
vscode editor gives error:
incomplete type is not allow
and gcc gives error:
error: flexible array member in a struct with no named members
6 | char buf[];
Why is array in struct now allowed but pointer is? (char *buf).
Also, If a struct has a flexible array, what is its sizeof(struct containsFlexArray)? How can I dynamically resolve its array, when it has no dimension?
EDIT:
if the above works in C++, because the incomplete array "decay" to pointer of known length (8 bytes in x64), why is this not also the case in c? If I peek to asm, I see the program does not allocate enough stack for the struct (it allocates only space for foo.val member, but not bur foo.buf member, in which case the program tries to use override the foo.val member (by using its address instead of foo.buf), which causes the stack smashing detected. But why is it implemented this wrong way? (So I want to know the rationale behind introducing flexible array as well)
You may want to read information on flexible array member here.
It seems as, when using a flexible array in a struct there must be at least one other data member and the flexible array member must be last.
And also you may have an element of clarification concerning the usage of flexible array members in C here
lets use intel/amd architecture here where char => 1 byte int => 4 and long is 8 bytes long.
Struct alignment is an issue here. You see when you declare a struct in c, compiler looks at it as individual block. So if you have a struct like this:
struct a {
long l;
char c1;
char c2;
}
compiler looks at the first type used, and allocates 8 bytes of memory for l, looks at c and determines that c1 is shorter than l and rather than figure out what c1's type is, it allocates 8 bytes of data for c1. It does the same for c2. So you end up with struct that is 24 bytes long and only 10 are used. Where if you use this:
struct b {
char c1;
long l;
char c2;
}
this will allocate 1 byte for c1, 8 byte for l, and 8 bytes for c2. So you end up with 17 bytes and 10 used. Where as if you have this:
struct b {
char c1;
char c2;
long l;
}
well it allocates 1 byte for c1, 1 byte for c2, and 8 bytes for l. In total 10 bytes but all 10 are used.
So what does it have to do with array? You see if you have:
struct aa {
char a;
long b[];
}
This will know to allocate at least one byte for b initially. Where when you do not have char a,
struct aa {
long b[];
}
Compiler might not allocate any memory (allocate 0 bytes), because it simply does not know how much to allocate.
EDIT:
Left my PC and in mean time other answer popped up. The other answer is very good!!! But I hope this helps you understand what is going on.
You did not initialize the buf[] array when you declared an instance in main(). The compiler does not know how much memory to allocate. The stack smashing is a compiler feature that keeps your program from doing... bad things to you computer. Add a number to your array declaration in typedef struct.
Like this:
` #include <stdio.h>
#include <string.h>
typedef struct
{
int val;
char buf[5];
} foo;
int main()
{
foo f;
f.val = 4;
// f.buf = "asd"; -> invalid use of flexible array member
memcpy(f.buf, "asd\0", 4);
printf("%s\n", f.buf);
}`
Let's say I have the following
struct MyType { long a, b, c; char buffer[remainder] }
I wanted to do something like
char buffer[4096 - offsetof(MyType, buffer)]
But it appears that it's illegal
You can do:
struct ABC {long a,b,c; }
struct MyType : ABC {char buffer[4096-sizeof(ABC)];};
static_assert(sizeof(MyType)==4096,"!");
Your problem stems from trying to use the not-yet-fully-defined MyType type while defining it. You could do this with a union:
#include <iostream>
struct MyType {
union {
struct { long a, b, c; } data;
char buffer[4096];
};
};
static_assert(sizeof(MyType) == 4096, "MyType size should be exactly 4K");
int main() {
MyType x;
x.data.a = 42;
std::cout << sizeof(x) << " " << x.data.a << "\n";
return 0;
}
The output (on my system):
4096 42
Because it's a union, the type actually holds the a/b/c tuple and buffer area in an overlapped region of memory, big enough to hold the larger of the two. So, unless your long variable are really wide, that will be the 4K buffer area :-)
In any case, that size requirement is checked by the static_assert.
That may be less than ideal as buffer takes up the entire 4K. If instead you want to ensure that buffer is only the rest of the structure (after the long variables), you can use the following:
struct MyType {
long a, b, c;
char buffer[4096 - 3 * sizeof(long)];
};
and ensure that you use x.something rather than x.data.something when accessing the a, b, or c variables.
This solves your problem by using the size of three longs (these are fully defined) instead of the size of something not yet defined. It's still a good idea to keep the static_assert to ensure overall size is what you wanted.
Technically, the compiler has total control over padding and layout. A union/struct combo combined with a static_assert sanity check might be enough for government work, but std::aligned_storage is also there to give you memory blocks that are safe to put objects in.
struct MyType {
long a, b, c;
};
using MyTypeStorage = std::aligned_storage<4096, std::alignment_of<MyType>::value>::type;
/* ... */
MyTypeStorage myTypeStorage;
MyType* x = new (&myTypeStorage) MyType {};
https://godbolt.org/z/87e7Tc
For a union, writing to one member and reading from other member (except for char array) is UB.
//snippet 1(testing for endianess):
union
{
int i;
char c[sizeof(int)];
} x;
x.i = 1; // writing to i
if(x.c[0] == 1) // reading from c[0]
{ printf("little-endian\n");
}
else
{ printf("big-endian\n");
}
//snippet 2(swap bytes using union):
int swapbytes()
{
union // assuming 32bit, sizeof(int)==4
{
int i;
char c[sizeof(int)];
} x;
x.i = 0x12345678; // writing to member i
SWAP(x.ch[0],x.ch[3]); // writing to char array elements
SWAP(x.ch[1],x.ch[2]); // writing to char array elements
return x.i; // reading from x.i
}
Snippet 1 is legal C or C++ but not snippet 2. Am I correct? Can some one point to the section of standard where it says its OK to write to a member of union and read from another member which is a char array.
There is a really simple way that gets round the undefined behaviour (well undefinied behvaiour that is defined in pretty much every compiler out there ;)).
uint32_t i = 0x12345678;
char ch[4];
memcpy( ch, &i, 4 );
bool bLittleEndian = ch[0] == 0x78;
This has the added bonus that pretty much every compiler out there will see that you are memcpying a constant number of bytes and optimise out the memcpy completely resulting in exactly the same code as your snippet 1 while staying totally within the rules!
I believe it (snippet 1) is technically not allowed, but most compilers allow it anyway because people use this kind of code. GCC even documents that it is supported.
You will have problems on some machines where sizeof(int) == 1, and possibly on some that are neither big endian nor little endian.
Either use available functions that change words to the proper order, or set this with a configuration macro. You probably need to recognize compiler and OS anyway.
Continuing from Absolute fastest (and hopefully elegant) way to return a certain char buffer given a struct type I want to now initialize once each static character buf per struct individually.
Ie, for:
#pragma pack(push, 1);
struct Header {
int a;
int b;
char c;
};
struct X {
int x;
int y;
};
struct Y {
char someStr[20];
};
struct Msg {
Header hdr;
union {
X x;
Y y;
};
};
#pragma pack(pop)
We have:
tempate<typename T>
struct Buffer {
static char buffer[sizeof(T)];
}
template<class T>
inline char* get_buffer() {
return Buffer<T>::buffer;
}
The two things I'm looking for are:
There are exactly 2 buffers: 1 for X and one for Y. They should each be the length of sizeof(Msg.hdr) + sizeof(Msg.x) and sizeof(Msg.hdr) + sizeof(Msg.y), respectively.
Each buffer will be retrieved a lot during the application lifetime and only some fields really (or need to) change.
2a. Msg for X backed by it's char buffer should be initialized to m.hdr.a = 1, m.hdr.b = 0; and for Msg Y it should be m.hdr.a = 16; m.hdr.b = 1; as an example.
The app will frequently fetch these buffers as type Msg backed by either X or Y (the app would know which one) and then change x and y or someStr only and then output it to the file for example then repeat.
Just wondering what nice way builds on these great examples by #6502 and #Fred Nurk to elegantly initialize these 2 buffers while being human readable. I'd prefer to keep using structs and to limit the use of reinterpret_cast<>() as much as possible as there may be aliasing issues that might develop.
Please let me know if I'm not clear and I will do my best to answer any questions and/or edit this question description.
Thanks.
*** Update: my usage pattern of these buffers is that I will be sending copying the char* out to a stream or file. hence I need to get a char* pointer to the underlying data. However I need to work on the char buffers via their structs for readability and convenience. Also this char buffer should be decoupled and not necessarily contained or "attached" to the struct as the structs are pretty much in separate files and used elsewhere where the buffers are not needed/wanted. Would just doing a simple static X x; static Y y; suffice or Maybe better buffers of length Header + X for X's Msg buffer? and then somehow just keep a char* reference to each Msg for X and Y? Will I run into aliasing issues potentially?
If you would be writing it in C, you could look into a fairly common C compiler extension called "cast to a union type", but in C++ it is no longer present.
In C++ there is no way around reinterpret_cast<> for what you require, but at least you can do it fairly safely by calculating the member offset on NULL pointer casted to the union, and then subtracting this offset from your data pointer before casting it to the union. I believe that on most compilers the offset will be 0, but it is better to be on the safe side.
template<class T>
union Aligner {
T t;
char buffer[sizeof(T)];
};
template<class T>
inline char* get_buffer(T* pt) {
return reinterpret_cast<Aligner<T>*>(reinterpret_cast<char*>(pt) - reinterpret_cast<ptrdiff_t>(&reinterpret_cast<Aligner<T>*>(NULL)->t))->buffer;
}
Lately I've been diving into network programming, and I'm having some difficulty constructing a packet with a variable "data" property. Several prior questions have helped tremendously, but I'm still lacking some implementation details. I'm trying to avoid using variable sized arrays, and just use a vector. But I can't get it to be transmitted correctly, and I believe it's somewhere during serialization.
Now for some code.
Packet Header
class Packet {
public:
void* Serialize();
bool Deserialize(void *message);
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
};
Packet ImpL
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
} Packet;
void* Packet::Serialize(int size) {
Packet* p = (Packet *) malloc(8 + 30);
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->data.assign(size,'&'); //just for testing purposes
}
bool Packet::Deserialize(void *message) {
Packet *s = (Packet*)message;
this->sender_id = ntohl(s->sender_id);
this->sequence_number = ntohl(s->sequence_number);
this->data = s->data;
}
During execution, I simply create a packet, assign it's members, and send/receive accordingly. The above methods are only responsible for serialization. Unfortunately, the data never gets transferred.
Couple of things to point out here. I'm guessing the malloc is wrong, but I'm not sure how else to compute it (i.e. what other value it would be). Other than that, I'm unsure of the proper way to use a vector in this fashion, and would love for someone to show me how (code examples please!) :)
Edit: I've awarded the question to the most comprehensive answer regarding the implementation with a vector data property. Appreciate all the responses!
This trick works with a C-style array at the end of the struct, but not with a C++ vector. There is no guarantee that the C++ vector class will (and it most likely won't) put its contained data in the "header object" that is present in the Packet struct. Instead, that object will contain a pointer to somewhere else, where the actual data is stored.
i think you might want to do like this:
`
struct PacketHeader
{
unsigned int senderId;
unsigned int sequenceNum;
};
class Packet
{
protected:
PacketHeader header;
std::vector<char> data;
public:
char* serialize(int& packetSize);
void deserialize(const char* data,int dataSize);
}
char* Packet::serialize(int& packetSize)
{
packetSize = this->data.size()+sizeof(PacketHeader);
char* packetData = new char[packetSize];
PacketHeader* packetHeader = (PacketHeader*)packetData;
packetHeader->senderId = htonl(this->header.senderId);
packetHeader->sequenceNum = htonl(this->header.sequenceNum);
char* packetBody = (packetData + sizeof(packetHeader));
for(size_t i=0 ; i<this->data.size() ; i++)
{
packetBody[i] = this->data.at(i);
}
return packetData;
}
void deserialize(const char* data,int dataSize)
{
PacketHeader* packetHeader = (PacketHeader*)data;
this->header.senderId = ntohl(packetHeader->senderId);
this->header.sequenceNum = ntohl(packetHeader->sequenceNum);
this->data.clear();
for(int i=sizeof(PacketHeader) ; i<dataSize ; i++)
{
this->data.push_back(data[i]);
}
}
`
those codes does not include bound checking and free allocated data, don't forget to delete the returned buffer from serialize() function, and also you can use memcpy instead of using loop to copy byte per byte into or from std::vector.
most compiler sometime add padding inside a structure, this would cause an issue if you send those data intact without disable the padding, you can do this by using #pragma pack(1) if you are using visual studio
disclaimer: i don't actually compile those codes, you might want to recheck it
I think the problem centres around you trying the 'serialise' the vector that way and you're probably assuming that the vector's state information gets transmitted. As you've found, that doesn't really work that way as you're trying to move an object across the network and things like pointers etc don't mean anything on the other machine.
I think the easiest way to handle this would be to change Packet to the following structure:
struct Packet {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int vector_size;
char data[1];
};
The data[1] bit is an old C trick for variable length array - it has to be the last element in the struct as you're essentially writing past the size of the struct. You have to get the allocation for the data structure right for this, otherwise you'll be in a world of hurt.
Your serialisation function then looks something like this:
void* Packet::Serialize(std::vector<char> &data) {
Packet* p = (Packet *) malloc(sizeof(Packet) + data.size());
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->vector_size = htonl(data.size());
::memcpy(p->data, data[0], size);
}
As you can see, we'll transmit the data size and the contents of the vector, copied into a plain C array which transmits easily. You have to keep in mind that in your network sending routine, you have to calculate the size of the structure properly as you'll have to send sizeof(Packet) + sizeof(data), otherwise you'll get the vector cut off and are back into nice buffer overflow territory.
Disclaimer - I haven't tested the code above, it's just written from memory so you might have to fix the odd compilation error.
I think you need to work directly with byte arrays returned by the socket functions.
For these purposes it's good to have two distinct parts of a message in your protocol. The first part is a fixed-size "header". This will include the size of the byes that follow, the "payload", or, data in your example.
So, to borrow some of your snippets and expand on them, maybe you'll have something like this:
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int data_length; // this is new
} PacketHeader;
So then when you get a buffer in, you'll treat it as a PacketHeader*, and check data_length to know how much bytes will appear in the byte vector that follows.
I would also add a few points...
Making these fields unsigned int is not wise. The standards for C and C++ don't specify how big int is, and you want something that will be predictable on all compilers. I suggest the C99 type uint32_t defined in <stdint.h>
Note that when you get bytes from the socket... It is in no way guaranteed to be the same size as what the other end wrote to send() or write(). You might get incomplete messages ("packets" in your terminology), or you might get multiple ones in a single read() or recv() call. It's your responsibility to buffer these if they are short of a single request, or loop through them if you get multiple requests in the same pass.
This cast is very dangerous as you have allocated some raw memory and then treated it as an initialized object of a non-POD class type. This is likely to cause a crash at some point.
Packet* p = (Packet *) malloc(8 + 30);
Looking at your code, I assume that you want to write out a sequence of bytes from the Packet object that the seralize function is called on. In this case you have no need of a second packet object. You can create a vector of bytes of the appropriate size and then copy the data across.
e.g.
void* Packet::Serialize(int size)
{
char* raw_data = new char[sizeof sender_id + sizeof sequence_number + data.size()];
char* p = raw_data;
unsigned int tmp;
tmp = htonl(sender_id);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
tmp = htonl(sequence_number);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
std::copy(data.begin(), data.end(), p);
return raw_data;
}
This may not be exactly what you intended as I'm not sure what the final object of your size parameter is and your interface is potentially unsafe as you return a pointer to raw data that I assume is supposed to be dynamically allocated. It is much safer to use an object that manages the lifetime of dynamically allocated memory then the caller doesn't have to guess whether and how to deallocate the memory.
Also the caller has no way of knowing how much memory was allocated. This may not matter for deallocation but presumably if this buffer is to be copied or streamed then this information is needed.
It may be better to return a std::vector<char> or to take one by reference, or even make the function a template and use an output iterator.