Difference between memcpy and copy by assignment - c++

Struct A
{
uint16_t len;
uint8_t cnt;
uint8_t unit;
uint32_t seq;
};
This struct A is serialized into a char * buf. If I want to deserialize the individual values eg:
uint16_t len = 0;
memcpy(&len, buf, sizeof(len));
or I can just do
uint16_t len = (uint16_t) buf;
Which one is better or are both the same?
Also to deserialize the whole struct, if I just do
A tmp;
memcpy(&tmp, buf, sizeof(A));
Would this work fine or should I be worried about padding etc from the compiler?

When the data is copied into char[] buffer, it may not be properly aligned in memory for access as multi-byte types. Copying the data back into struct restores proper alignment.
If I want to deserialize the individual values eg:
uint16_t len = 0;
memcpy(&len, buf, sizeof(len));
Assuming that you have copied the struct into buf, this is perfectly valid, because the language guarantees that the initial member would be aligned with the beginning of the structure. However, casting buf to uint16_t* is invalid, because the buffer many not be properly aligned in memory to be addressed as uint16_t.
Note that getting elements of the struct other than the initial one require computing proper offset:
uint32_t seq;
memcpy(&seq, buf+offsetof(struct A, seq), sizeof(seq));
Also to deserialize the whole struct, if I just do
A tmp;
memcpy(&tmp, buf, sizeof(A));
Would this work fine or should I be worried about padding etc from the compiler?
This would work fine. Any padding embedded in the struct when you copied it into the buf would come back into tmp, along with the actual data.

Related

What is the safe approach to convert incoming network `char*` data to `uint8_t` and back

This question on SO deals with the char <-> uint8_t issue mainly from the perspective of the Strict Aliasing Rule. Roughly speaking, it clarifies that as long as uint8_t is implemented as either char or unsigned char, we're fine.
I'm interested in understanding whether or not the possible incompatability of the signed/unsignedness of uint8_t with char matters when using reinterpret_cast.
When I need to deal directly with bytes, I prefer using uint8_t. However, the Winsock API deals with char*s.
I would like to understand how to handle these conversions correctly, in order to not run into Undefined Behavior or other phenomenons that damage the portability of the app.
The following functions takes a std::array<uint8_t, 4> and converts it to an uint32_t - i.e., takes 4 bytes and converts them to an integer.
uint32_t bytes_to_u32(const std::array<uint8_t, 4>& bytes) {
return (bytes[0] << 24) + (bytes[1] << 16) + (bytes[2] << 8) + bytes[3];
}
However, the data incoming from the socket (using the recv function) comes in char* form.
One approach is the following:
std::array<uint8_t, 4> length_buffer;
int bytes_received = 0;
while (bytes_received < 4) {
bytes_received += recv(sock, reinterpret_cast<char*>(length_buffer.data()) + bytes_received, 4 - bytes_received, 0);
}
It seems to work on my machine. However - is this safe? If I'm not mistaken, on a different machine or compiler, a char may be signed, meaning the length_buffer will hold wrong values after the conversion. Am I wrong?
I know that reinterpret_cast does not change the bit pattern at all - it leaves the binary data the same. Knowing this, it still doesn't fully register in my brain whether or not this technique is the right way to go.
Please explain how to approach this problem.
EDIT: Also noting, after converting the char* to uint8_t*, I need to be able to convert the uint8_t* to a valid numeric value, or sometimes test the numeric values of individual bytes in the buffer. In order to interpret the "commands" I was sent over the network, and send some back to the other side.
I hope I did understand your question correctly, you can solve this problem using unions:
//Union is template so you can use this for any given type
template<typename T>
union ConvertBytes
{
T value;
char byte[sizeof(T)];
};
void process()
{
recv(socket, buffer, bufferLength, 0); //Recieve data
ConvertBytes<uint32_t> converter;
for (int i = 0; i < sizeof(uint32_t); i++) //Considering that you recieve only that one uint32
{
converter.byte[i] = buffer[i]; //Assign all bytes into union
}
uint32_t result = converter.value; //Get uint32_t value from union
}

Convert Char array to uint8_t vector

For some project i need to send encoded messages but i can only give vetor of uint8_t to be sent, and i have a char array (with numbers and string i converted to hexadecimal in it) and a pointer on the array. I encode the msg which is an object into the buffer then i have to send it and decode it etc.
char buffer[1024]
char *p = buffer
size_t bufferSize = sizeof(buffer)
Encode(msg,p,bufferSize)
std::vector<uint8_t> encodedmsg; //here i need to put my message in the buffer
Send(encodedmsg.data(),encodedmsg.size()) //Only taking uint8_t vector
Here is the prototype of send :
uint32_t Send(const uint8_t * buffer, const std::size_t bufferSize)
I already looked at some questions but no one have to replace it in a vector or convert to uint8_t.
I thinked bout memcpy or reinterpreted cast or maybe using a for loop but i don't really know how to do it whitout any loss.
Thanks,
Actually your code suggest that Send() function takes pointer to uint8_t, not std::vector<uint8_t>.
And since char and uint8_t has same memory size you just could do:
Send(reinterpret_cast<uint8_t*>(p), bufferSize);
But if you want to do everything "right" you could do this:
encodedmsg.resize(bufferSize);
std::transform(p, p + bufferSize, encodedmsg.begin(), [](char v) {return static_cast<uint8_t>(v);});

Copying struct with bitfields & dynamic data into a Char array buffer

I have a struct like the following
struct Struct {
int length; //dynamicTest length
unsigned int b: 1;
unsigned int a: 1;
unsigned int padding: 10;
int* dynamicTest;
int flag;
}
I want to copy this into a char array buffer (to send over a socket). I'm curious how I would do that.
To be precise, you do this with memcpy, e.g.:
#include <string.h>
/* ... */
Struct s = /*... */;
char buf[1024]
memcpy(buf, &s, sizeof(s));
/* now [buf, buf + sizeof(s)) holds the needed data */
Alternatively you can avoid copying at all and view an instance of struct as an array of char (since everything in computer memory is sequence of bytes, this approach works).
Struct s = /* ... */;
const char* buf = (char*)(&s);
/* now [buf, buf + sizeof(s)) holds the needed data */
If you are going to send it over the network, you need to care of byte order, int size and many other details.
Copying bit fields present no problem, but for dynamic fields, such as your char* this naive approach won't work. The more general solution, that works with any other types is serialization.

Nice representation of byte array and its size

How would you represent byte array and its size nicely? I'd like to store (in main memory or within a file) raw byte arrays(unsigned chars) in which first 2/4 bytes will represents its size. But operations on such array does not look well:
void func(unsigned char *bytearray)
{
int size;
memcpy(&size, bytearray, sizeof(int));
//rest of operation when we know bytearray size
}
How can I avoid that? I think about a simple structure:
struct bytearray
{
int size;
unsigned char *data;
};
bytearray *b = reinterpret_cast<bytearray*>(new unsigned char[10]);
b->data = reinterpret_cast<unsigned char*>(&(b->size) + 1);
And I've got an access to a size and data part of bytearray. But it still looks ugly. Could you recommend an another approach?
Unless you have some overwhelming reason to do otherwise, just do the idiomatic thing and use std::vector<unsigned char>.
You're effectively re-inventing the "Pascal string". However
b->data = reinterpret_cast<unsigned char*>(&(b->size) + 1);
won't work at all, because the pointer points to itself, and the pointer will get overwritten.
You should be able to use an array with unspecified size for the last element of a structure:
struct bytearray
{
int size;
unsigned char data[];
};
bytearray *b = reinterpret_cast<bytearray*>(::operator new(sizeof (bytearray) + 10));
b->size = 10;
//...
::operator delete(b);
Unlike std::vector, this actually stores the size and data together, so you can, for example, write it to a file in one operation. And memory locality is better.
Still, the fact that std::vector is already tested and many useful algorithms are implemented for you makes it very attractive.
I would use std::vector<unsigned char> to manage the memory, and write a conversion function to create some iovec like structure for you at the time that you need such a thing.
iovec make_iovec (std::vector<unsigned char> &v) {
iovec iv = { &v[0], v.size() };
return iv;
}
Using iovec, if you need to write both the length and data in a single system call, you can use the writev call to accomplish it.
ssize_t write_vector(int fd, std::vector<unsigned char> &v) {
uint32_t len = htonl(v.size());
iovec iv[2] = { { &len, sizeof(uint32_t) }, make_iovec(v) };
return writev(fd, iv, 2);
}

Convert a string to and unsigned char []

I currently have a Packet set up like so:
struct Packet {
unsigned short sequenceNumber;
unsigned short length;
unsigned char control;
unsigned char ack;
unsigned short crc;
unsigned char data[];
Packet copy(const Packet& aPacket) {
sequenceNumber = aPacket.sequenceNumber;
length = aPacket.length;
control= aPacket.control;
ack = aPacket.ack;
crc = aPacket.crc;
memcpy (data, aPacket.data, aPacket.length);
}
};
This packet gets converted into a string for encryption and then needs to be taken from its decrypted string form back to a Packet. I am able to do this fine for all of the variables except for the unsigned char data[]. I have tried the following with no success:
string data = thePack.substr(pos, thePack.length()-pos);
unsigned char * cData = new unsigned char[data.length()];
strcpy((char *)cData, data.c_str());
memcpy(p.data, cData, data.length());
where data is the string representation of the data to be copied into the unsigned char [] and p is the Packet.
This gives the following from valgrind:
==16851== Invalid write of size 1
==16851== at 0x4A082E7: strcpy (mc_replace_strmem.c:303)
Even though it cites strcpy as the source, it compiles and runs fine with just the memcpy line commented out.
I have also tried replacing memcpy with strcpy with the same result. Any ideas? I feel that it might be due to the fact that data may have not been initialized and there for not have any memory allocated to it, but I thought memcpy would take care of this.
You haven't specified the size of the data array.
unsigned char data[];
This is legal, but rather difficult to use. The data array will follow the rest of the Packet structure in memory, but the compiler doesn't know how much space to allocate for it. So you have to allocate the extra space yourself:
size_t datalen = thePack.length()-pos;
void* pbuffer = malloc( sizeof (Packet) + datalen + 1 );
Packet* p = new (pbuffer) Packet;
memcpy(p.data, &thePack[pos], datalen);
p.data[datelen] = 0;
What won't work is letting the compiler decide how big a Packet should be, either using new Packet or a local variable Packet p;. That will end up with no space reserved for data. And no, memcpy doesn't allocate memory.
A much cleaner solution would be to use a std::vector for your variable-sized data array.
The char[] you're allocating is one character too small -- you must leave room for the NULL byte at the end:
unsigned char * cData = new unsigned char[data.length() + 1];
Use the strcpy version to copy the string, so the NULL byte gets copied correctly. Although it might run OK without that +1, there's no guarantee, and sometimes it might crash.