So I have a construct called packet
struct Packet {
unsigned int packet_type;
wchar_t packet_length[128];
wchar_t file_name[256];
wchar_t template_name[256];
wchar_t file_name_list[1024];
wchar_t file_data[1024];
void serialize(char * dat) {
memcpy(dat, this, sizeof(Packet));
}
void deserialize(const char * dat) {
memcpy(this, dat, sizeof(Packet));
}
};
I'm trying to desieralize from this data
{byte[2692]}
[0] 0 unsigned int packet_type; (4 bytes)
[1] 0
[2] 0
[3] 0
[4] 50 '2' wchar_t packet_length[128]; (128 bytes)
[3] 0
[5] 54 '6'
[3] 0
[6] 57 '9'
[3] 0
[7] 50 '2'
[8] 0
[...] 0
[132] 112 'p' wchar_t file_name[256]; (256 bytes)
[133] 0
[134] 104 'h'
[...] 0
But the memcpy in deserialze isn't giving me the file_name, but it does give me the packet_length. What's up with this? Thanks!
EDIT:
So it's clear to me now that wchar_t is taking up more space than I once thought; however, I'm being told not to use memcpy at all?
I've written this deserialize method and it grabs the data correctly. Will this still cause a security leak?
void deserialize(const char * dat) {
memcpy(&(packet_type), dat, 4);
memcpy(&(packet_length[0]), dat + 4, 128);
memcpy(&(file_name[0]), dat + 132, 256);
memcpy(&(template_name[0]), dat + 388, 256);
memcpy(&(file_name_list[0]), dat + 644, 1024);
memcpy(&(file_data[0]), dat + 1668, 1024);
}
Please do not use this method for serializing structures. It's utterly non-portable.
Moreover, the compiler might as well pad, align or reorder members depending on the target architecture, endianness, optimizations and a bunch of other things.
A much more elegant way would be to use boost::Serialization, which takes care of low-level details in a portable way.
If, on the other hand, you just want to inspect your structures, then a debugger comes handy...
The layout of your char array assumes that the size of wchar_t is two bytes; it is not - here is an example of a system where the size of wchar_t is 4, so the size of Packet is 10756, not 2692 bytes: (link to a demo).
That is why your memcpy trick from the edit presents a problem: it assumes that the layout of data in the char[] array matches the layout of wchar_t[] arrays, which it may or may not match. If you know that your data array has two-character elements stored in little endian format (LSB first), you can write your own function that converts the data from the source to the destination, and call it for portions of your serialized data, like this:
void bytes_to_wchar(wchar_t *dest, const unsigned char* src, size_t length) {
for (size_t i = 0 ; i != lengt ; i++) {
dest[i] = src[2*i] | (src[2*i+1] << 8);
}
}
Now you can use this function to copy data into wchar_t arrays independently of the wchar_t size on the target system, or the endianness of the target system:
void deserialize(const char * dat) {
bytes_to_wchar(packet_type, dat + 0, 4);
bytes_to_wchar(packet_length[0], dat + 4, 128);
bytes_to_wchar(file_name[0], dat + 132, 256);
bytes_to_wchar(template_name[0], dat + 388, 256);
bytes_to_wchar(file_name_list[0], dat + 644, 1024);
bytes_to_wchar(file_data[0], dat + 1668, 1024);
}
The shortcut of saving the data from memory and writing it back may work when you do it on the same hardware, using the same compiler. Even then it remains sensitive to small adjustments in the headers that you use and in the settings of the compiler.
If the character array that you need to copy into the struct has a fixed layout, you need to write a function to process that layout, converting two-byte groups into wchar_ts, four-byte groups into unsigned ints, and so on.
Related
I am trying to receive a message of a TCP socket and store it in an uint8_t array.
The buffer I am to receive is to be 8 bytes long and contains 4 unique values.
Byte 1: value 1 which is a uint8_t, Byte 2-3: value 2 which is a uint16_t, Byte 4: value 3 which is a uint8_t, Byte 5-8: value 4 which is an unsigned long.
Endiannessis big endian order.
int numBytes = 0;
uint8_t buff [8];
if ((numBytes = recv(sockfd, buff, 8, 0)) == -1)
{
perror("recv");
exit(1);
}
uint8_t *pt = buff;
printf("buff[0] = %u\n", *pt);
++pt;
printf("buff[1] = %u\n", *(uint16_t*)pt);
But the second printf prints out an unexpected value. Have I done something incorrectly to extract the two bytes or is something wrong with my print function?
You have 2 issues to take care of once your data has arrived in the buffer.
The first is obeying aliasing rules which is achieved by only casting non-char type pointers to char* because char can alias anything. You should never cast char* to non-char type pointers.
The second is obeying network byte ordering protocol whereby integers transmitted over a network are converted to network order before transfer and converted from network order after receipt. For this we generally use htons, htonl, ntohs and ntohl.
Something like this:
// declare receive buffer to be char, not uint8_t
char buff[8];
// receive chars in buff here ...
// now transfer and convert data
uint8_t a;
uint16_t b;
uint8_t c;
uint32_t d;
a = static_cast<uint8_t>(buff[0]);
// always cast the receiving type* to char*
// never cast char* to receiving type*
std::copy(buff + 1, buff + 3, (char*)&b)
// convert from network byte order to host order
b = ntohs(b); // short version (uint16_t)
c = static_cast<uint8_t>(buff[3]);
std::copy(buff + 4, buff + 8, (char*)&d)
d = ntohl(d); // long version (uint32_t)
Perhaps like this (big-endian)
uint8_t buff [8];
// ...
uint8_t val1 = buff[0];
unit16_t val2 = buff[1] * 256 + buff[2];
unit8_t val3 = buff[3];
unsigned long val4 = buff[4] * 16777216 + buff[5] * 65536 + buff[6] * 256 + buff[7];
I am trying to convert the following struct to a char array so that I can send it via the serial port.
struct foo
{
uint16_t voltage;
char ID ;
char TempByte;
char RTCday[2];
char RTCmonth[2];
char RTCyear[2];
char RTChour[2];
char RTCmin[2];
char Sepbyte;
}dvar = { 500, 'X' , '>' , "18" , "10" , "15" , "20" , "15" , '#'};
I then convert it to a char array using the following:
char b[sizeof(struct foo)];
memcpy(b, &dvar, sizeof(struct foo));
However for some reason I get these trailing values in the char array
0x0A 0xFF
I initially thought it was getting the values because when i cast it to a char array it was effectively casting it to a string so I though the was the NULL '\0'
Any help will be appreciated.
Thanks
On modern processors, sizeof(struct data download) needs to be aligned on 32bits boundaries. Your data structure size is 8 chars + 1 short (16 bits) integer. The compiler needs to add 2 chars to the size of the structure to be able to handle it correctly when assigning it.
Since you're doing communication over a serial line and know exactly what you're sending, you might as well specify the exact number of bytes you're willing to send over your serial lines: 2 +/*1 short */ + 8 (8 bytes).
I have a sneaky suspicion you are using an 8bit microcontroller!
You can debug by printing b[sizeof(foo)], and b[sizeof(foo)+1]
These will be your two characters.
If you noticed, you should not be referencing these, they are outside the bounds of your char array. eg n element array [0..(n-1)] (copied from your struct)
If you add an unused element to your struct(or increase the size of your final member) the char array can be terminated '\0' -compiler probably wants to do this.
Or do a pointer assignment as #Melebius has shown.
I'm trying to write a program which will query a URL using curl and retrieve a string of bytes. The returned data than needs to be interpreted as various data types; an int followed by a sequence structures.
The curl write back function must have a prototype of:
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata);
I've seen various examples where the returned data is stored in a buffer either as characters directly in memory or as a string object.
If I have a character array, then I know that I can interpret a portion of it as a structure with code like this:
struct mystruct {
//define struct
};
char *buffer;
//push some data into the buffer
char *read_position;
read_position = buffer + 5;
test = (mystruct *)buffer;
I have two related questions. Firstly, is there a better way of using curl to retrieve binary data and pushing it into structures, rather than reading it directly into memory as characters. Secondly if reading into memory as a character buffer is the way to go, is my code above a sensible way to interpret the chunks of memory as different data types?
Things you need to consider when interpreting raw structures, especially over network:
The size of your data types;
The endianness of your data types;
Struct padding.
You should only use data types in your structure that are the correct size regardless of what compiler is used. That means for integers, you should use types from <cstdint>.
As for the endianness, you need to know if the data will arrive as big-endian or little-endian. I like to be explicit about it:
template< class T >
const char * ReadLittleEndian32( const char *buf, T & val )
{
static_assert( sizeof(T) == 4 );
val = T(buf[0]) | T(buf[1]) << 8 | T(buf[2]) << 16 | T(buf[3]) << 24;
return buf + sizeof(T);
}
template< class T >
const char * ReadBigEndian32( const char *buf, T & val )
{
static_assert( sizeof(T) == 4 );
val = T(buf[0]) << 24 | T(buf[1]) << 16 | T(buf[2]) << 8 | T(buf[3]);
return buf + sizeof(T);
}
//etc...
Finally, dealing with potential padding differences... I've already been naturally tending towards a 'deserialise' approach where each value is read and translated explicitly. The structure is no different:
struct Foo
{
uint16_t a;
int16_t b;
int32_t c;
const char * Read( const char * buf );
};
const char * Foo::Read( const char * buf )
{
buf = ReadLittleEndian16( buf, a );
buf = ReadLittleEndian16( buf, b );
buf = ReadLittleEndian32( buf, c );
return buf;
}
Notice the templating handles sign and other things in the data type, so that all we care about in the end is size. Also remember that data types such as float and double already have inherent endianness and should not be translated -- they can be read verbatim:
const char * ReadDouble( const char * buf, double & val )
{
val = *(double*)buf;
return buf + sizeof(double);
}
I am reading through a buffer (char *) and i have a cursor, where i am tracking my starting position of the buffer, is there a way to copy characters 7-64 out of the buffer, or is my best bet to just loop the buffer from poistion x to position y?
The size of the destination buffer is the result of another function dynamically computed
Initializing this returns
variable-sized object 'version' may not be initialized
Relevant code parts:
int32_t size = this->getObjectSizeForMarker(cursor, length, buffer);
cursor = cursor + 8; //advance cursor past marker and size
char version[size] = this->getObjectForSizeAndCursor(size, cursor, buffer);
-
char* FileReader::getObjectForSizeAndCursor(int32_t size, int cursor, char *buffer) {
char destination[size];
memcpy(destination, buffer + cursor, size);
}
-
int32_t FileReader::getObjectSizeForMarker(int cursor, int eof, char * buffer) {
//skip the marker and read next 4 byes
cursor = cursor + 4; //skip marker and read 4
unsigned char *ptr = (unsigned char *)buffer + cursor;
int32_t objSize = (ptr[0] << 24) | (ptr[1] << 16) | (ptr[2] << 8) | ptr[3];
return objSize;
}
Move the pointer to buffer six units ahead (to get to the seventh index), and then memcpy 64-7 (57) bytes, e.g.:
const char *buffer = "foo bar baz...";
char destination[SOME_MAX_LENGTH];
memcpy(destination, buffer + 6, 64-7);
You may want to terminate the destination array so that you can work with it using standard C string functions. Note that we're adding the null character at the 58th index, after the 57 bytes that were copied over:
/* terminate the destination string at the 58th byte, if desired */
destination[64-7] = '\0';
If you need to work with a dynamically sized destination, use a pointer instead of an array:
const char *buffer = "foo bar baz...";
char *destination = NULL;
/* note we do not multiply by sizeof(char), which is unnecessary */
/* we should cast the result, if we're in C++ */
destination = (char *) malloc(58);
/* error checking */
if (!destination) {
fprintf(stderr, "ERROR: Could not allocate space for destination\n");
return EXIT_FAILURE;
}
/* copy bytes and terminate */
memcpy(destination, buffer + 6, 57);
*(destination + 57) = '\0';
...
/* don't forget to free malloc'ed variables at the end of your program, to prevent memory leaks */
free(destination);
Honestly, if you're in C++, you should really probably be using the C++ strings library and std::string class. Then you can call the substr substring method on your string instance to get the 57-character substring of interest. It would involve fewer headaches and less re-inventing the wheel.
But the above code should be useful for both C and C++ applications.
I'm trying to use istringstream to recreate an encoded wstring from some memory. The memory is laid out as follows:
1 byte to indicate the start of the wstring encoding. Arbitrarily this is '!'.
n bytes to store the character length of the string in text format, e.g. 0x31, 0x32, 0x33 would be "123", i.e. a 123-character string
1 byte separator (the space character)
n bytes which are the wchars which make up the string, where wchar_t's are 2-bytes each.
For example, the byte sequence:
21 36 20 66 00 6f 00 6f 00
is "!6 f.o.o." (using dots to represent char 0)
All I've got is a char* pointer (let's call it pData) to the start of the memory block with this encoded data in it. What's the 'best' way to consume the data to reconstruct the wstring ("foo"), and also move the pointer to the next byte past the end of the encoded data?
I was toying with using an istringstream to allow me to consume the prefix byte, the length of the string, and the separator. After that I can calculate how many bytes to read and use the stream's read() function to insert into a suitably-resized wstring. The problem is, how do I get this memory into the istringstream in the first place? I could try constructing a string first and then pass that into the istringstream, e.g.
std::string s((const char*)pData);
but that doesn't work because the string is truncated at the first null byte. Or, I could use the string's other constructor to explicitly state how many bytes to use:
std::string s((const char*)pData, len);
which works, but only if I know what len is beforehand. That's tricky given that the data is variable length.
This seems like a really solvable problem. Does my rookie status with strings and streams mean I'm overlooking an easy solution? Or am I barking up the wrong tree with the whole string approach?
Try setting your stringstream's rdbuf:
char* buffer = something;
std::stringbuf *pbuf;
std::stringstream ss;
std::pbuf=ss.rdbuf();
std::pbuf->sputn(buffer, bufferlength);
// use your ss
Edit: I see that this solution will have a similar problem to your string(char*, len) situation. Can you tell us more about your buffer object? If you don't know the length, and it isn't null terminated, it's going to be very hard to deal with.
Is it possible to modify how you encode the length, and make that a fixed size?
unsigned long size = 6; // known string length
char* buffer = new char[1 + sizeof(unsigned long) + 1 + size];
buffer[0] = '!';
memcpy(buffer+1, &size, sizeof(unsigned long));
buffer should hold the start indicator (1 byte), the actual size (size of unsigned long), the delimiter (1 byte) and the text itself (size).
This way, you could get the size "pretty" easy, then set the pointer to point beyond the overhead, and then use the len variable in the string constructor.
unsigned long len;
memcpy(&len, pData+1, sizeof(unsigned long)); // +1 to avoid the start indicator
// len now contains 6
char* actualData = pData + 1 + sizeof(unsigned long) + 1;
std::string s(actualData, len);
It's low level and error prone :) (for instance if you read anything that isn't encoded the way that you expect it to be, the len can get pretty big), but you avoid dynamically reading the length of the string.
It seems like something on this order should work:
std::wstring make_string(char const *input) {
if (*input != '!')
return "";
char length = *++input;
return std::wstring(++input, length);
}
The difficult part is dealing with the variable length of the size. Without something to specify the length it's hard to guess when to stop treating the data as specifying the length of the string.
As for moving the pointer, if you're going to do it inside a function, you'll need to pass a reference to the pointer, but otherwise it's a simple matter of adding the size you found to the pointer you received.
It's tempting to (ab)use the (deprecated but nevertheless standard) std::istrstream here:
// Maximum size to read is
// 1 for the exclamation mark
// Digits for the character count (digits10() + 1)
// 1 for the space
const std::streamsize max_size = 3 + std::numeric_limits<std::size_t>::digits10;
std::istrstream s(buf, max_size);
if (std::istream::traits_type::to_char_type(s.get()) != '!'){
throw "missing exclamation";
}
std::size_t size;
s >> size;
if (std::istream::traits_type::to_char_type(s.get()) != ' '){
throw "missing space";
}
std::wstring(reinterpret_cast<wchar_t*>(s.rdbuf()->str()), size/sizeof(wchar_t));