Python's struct.pack/unpack equivalence in C++ - c++

I used struct.pack in Python to transform a data into serialized byte stream.
>>> import struct
>>> struct.pack('i', 1234)
'\xd2\x04\x00\x00'
What is the equivalence in C++?

You'll probably be better off in the long run using a third party library (e.g. Google Protocol Buffers), but if you insist on rolling your own, the C++ version of your example might be something like this:
#include <stdint.h>
#include <string.h>
int32_t myValueToPack = 1234; // or whatever
uint8_t myByteArray[sizeof(myValueToPack)];
int32_t bigEndianValue = htonl(myValueToPack); // convert the value to big-endian for cross-platform compatibility
memcpy(&myByteArray[0], &bigEndianValue, sizeof(bigEndianValue));
// At this point, myByteArray contains the "packed" data in network-endian (aka big-endian) format
The corresponding 'unpack' code would look like this:
// Assume at this point we have the packed array myByteArray, from before
int32_t bigEndianValue;
memcpy(&bigEndianValue, &myByteArray[0], sizeof(bigEndianValue));
int32_t theUnpackedValue = ntohl(bigEndianValue);
In real life you'd probably be packing more than one value, which is easy enough to do (by making the array size larger and calling htonl() and memcpy() in a loop -- don't forget to increase memcpy()'s first argument as you go, so that your second value doesn't overwrite the first value's location in the array, and so on).
You'd also probably want to pack (aka serialize) different data types as well. uint8_t's (aka chars) and booleans are simple enough as no endian-handling is necesary for them -- you can just copy each of them into the array verbatim as a single byte. uint16_t's you can convert to big-endian via htons(), and convert back to native-endian via ntohs(). Floating point values are a bit tricky, since there is no built-in htonf(), but you can roll your own that will work on IEEE754-compliant machines:
uint32_t htonf(float f)
{
uint32_t x;
memcpy(&x, &f, sizeof(float));
return htonl(x);
}
.... and the corresponding ntohf() to unpack them:
float ntohf(uint32_t nf)
{
float x;
nf = ntohl(nf);
memcpy(&x, &nf, sizeof(float));
return x;
}
Lastly for strings you can just add the bytes of the string to the buffer (including the NUL terminator) via memcpy:
const char * s = "hello";
int slen = strlen(s);
memcpy(myByteArray, s, slen+1); // +1 for the NUL byte

There isn't one. C++ doesn't have built-in serialization.
You would have to write individual objects to a byte array/vector, and being careful about endianness (if you want your code to be portable).

https://github.com/karkason/cppystruct
#include "cppystruct.h"
// icmp_header can be any type that supports std::size and std::data and holds bytes
auto [type, code, checksum, p_id, sequence] = pystruct::unpack(PY_STRING("bbHHh"), icmp_header);
int leet = 1337;
auto runtimePacked = pystruct::pack(PY_STRING(">2i10s"), leet, 20, "String!");
// runtimePacked is an std::array filled with "\x00\x00\x059\x00\x00\x00\x10String!\x00\x00\x00"
// The format is "compiled" and has zero overhead in runtime
constexpr auto packed = pystruct::pack(PY_STRING("<2i10s"), 10, 20, "String!");
// packed is an std::array filled with "\x00\x01\x00\x00\x10\x00\x00\x00String!\x00\x00\x00"

You could check out Boost.Serialization, but I doubt you can get it to use the same format as Python's pack.

I was also looking for the same thing. Luckily I found https://github.com/mpapierski/struct
with a few additions you can add missing types into struct.hpp, I think it's the best so far.
To use it, just define you params like this
DEFINE_STRUCT(test,
((2, TYPE_UNSIGNED_INT))
((20, TYPE_CHAR))
((20, TYPE_CHAR))
)
The just call this function which will be generated at compilation
pack(unsigned int p1, unsigned int p2, const char * p3, const char * p4)
The number and type of parameters will depend on what you defined above.
The return type is a char* which contains your packed data.
There is also another unpack() function which you can use to read the buffer

You can use union to get different view into the same memory.
For example:
union Pack{
int i;
char c[sizeof(int)];
};
Pack p = {};
p.i = 1234;
std::string packed(p.c, sizeof(int)); // "\xd2\x04\x00\0"
As mentioned in the other answers, you have to notice the endianness.

Related

Convert integer to char array and then convert it back

I am a little confused on how casts work in C++.
I have a 4 bytes integer which I need to convert to a char[32] and then convert it back in some other function.
I am doing the following :
uint32_t v = 100;
char ch[32]; // This is 32 bytes reserved memory
memcpy(ch,&v,4);
uint32_t w = *(reinterpret_cast<int*>(ch)); // w should be equal to v
I am getting the correct results on my compiler, but I want to make sure if this is a correct way to do it.
Technically, no. You are at risk of falling foul of your CPU's alignment rules, if it has any.
You may alias an object byte-by-byte using char*, but you can't take an actual char array (no matter where its values came from) and pretend it's some other object.
You will see that reinterpret_cast<int*> method a lot, and on many systems it will appear to work. However, the "proper" method (if you really need to do this at all) is:
const auto INT_SIZE = sizeof(int);
char ch[INT_SIZE] = {};
// Convert to char array
const int x = 100;
std::copy(
reinterpret_cast<const char*>(&x),
reinterpret_cast<const char*>(&x) + INT_SIZE,
&ch[0]
);
// Convert back again
int y = 0;
std::copy(
&ch[0],
&ch[0] + INT_SIZE,
reinterpret_cast<char*>(&y)
);
(live demo)
Notice that I only ever pretend an int is a bunch of chars, never the other way around.
Notice also that I have also swapped your memcpy for type-safe std::copy (although since we're nuking the types anyway, that's sort of by-the-by).

how to parse unsigned char array to numerical data

The setup of my question is as follows:
I have a source sending a UDP Packet to my receiving computer
Receiving computer takes the UDP packet and receives it into unsigned char *message.
I can print the packet out byte-wise using
for(int i = 0; i < sizeof(message); i++) {
printf("0x%02 \n", message[i];
}
And this is where I am! Now I'd like to start parsing these bytes I recieved into the network as shorts, ints, longs, and strings.
I've written a series of functions like:
short unsignedShortToInt(char[] c) {
short i = 0;
i |= c[1] & 0xff;
i <<= 8;
i |= c[0] & 0xff;
return i;
}
to parse the bytes and shift them into ints, longs, and shorts. I can use sprintf() to create strings from byte arrays.
My question is -- what's the best way to get the substrings from my massive UDP packet? The packet is over 100 character in lengths, so I'd like an easy way to pass in message[0:6] or message[20:22] to these variation utility functions.
Possible options:
I can use strcpy() to create a temporary array for each function call, but that seems a bit messy.
I can turn the entire packet into a string and use std::string::substr. This seems nice, but I'm concerned that converting the unsigned chars into signed chars (part of the string conversion process) might cause some errors (maybe this concern is unwarranted?).
Maybe another way?
So I ask you, stackoverflow, to recommend a clean, concise way to do this task!
thanks!
Why not use proper serialization ?
i.e. MsgPack
You'll need a scheme how to differentiate messages. You could for example make them self-describing, something like:
struct my_message {
string protocol;
string data;
};
and dispatch decoding based on the protocol.
You'll most probably be better off using a tested serialization library than finding out that your system is vulnerable to buffer overflow attacks and malfunction.
I think you have two problems to solve here. First you need to make sure the integer data are properly aligned in memory after extracting them from the character buffer. next you need to ensure the correct byte order of the integer data after their extraction.
The alignment problem can be solved with a union containing the integral data type super-imposed upon a character array of the correct size. The network byte order problem can be solved using the standard ntohs() and ntohl() functions. This will only work if the sending software also used the standard byte-order produced by the inverse of these functions.
See: http://www.beej.us/guide/bgnet/output/html/multipage/htonsman.html
Here are a couple of UNTESTED functions you may find useful. I think they should just about do what you are after.
#include <netinet/in.h>
/**
* General routing to extract aligned integral types
* from the UDP packet.
*
* #param data Pointer into the UDP packet data
* #param type Integral type to extract
*
* #return data pointer advanced to next position after extracted integral.
*/
template<typename Type>
unsigned char const* extract(unsigned char const* data, Type& type)
{
// This union will ensure the integral data type is correctly aligned
union tx_t
{
unsigned char cdata[sizeof(Type)];
Type tdata;
} tx;
for(size_t i(0); i < sizeof(Type); ++i)
tx.cdata[i] = data[i];
type = tx.tdata;
return data + sizeof(Type);
}
/**
* If strings are null terminated in the buffer then this could be used to extract them.
*
* #param data Pointer into the UDP packet data
* #param s std::string type to extract
*
* #return data pointer advanced to next position after extracted std::string.
*/
unsigned char const* extract(unsigned char const* data, std::string& s)
{
s.assign((char const*)data, std::strlen((char const*)data));
return data + s.size();
}
/**
* Function to parse entire UDP packet
*
* #param data The entire UDP packet data
*/
void read_data(unsigned char const* const data)
{
uint16_t i1;
std::string s1;
uint32_t i2;
std::string s2;
unsigned char const* p = data;
p = extract(p, i1); // p contains next position to read
i1 = ntohs(i1);
p = extract(p, s1);
p = extract(p, i2);
i2 = ntohl(i2);
p = extract(p, s2);
}
Hope that helps.
EDIT:
I have edited the example to include strings. It very much depends on how the strings are stored in the stream. This example assumes the strings are null-terminated c-strings.
EDIT2:
Whoopse, changed code to accept unsigned chars as per question.
If the array is only 100 characters in length just create a char buffer[100] and a queue of them so you don't miss processing any of the messages.
Next you could just index that buffer as you described and if you know the struct of the message then you know the index points.
next you can union the types i.e
union myType{
char buf[4];
int x;
}
giving you the value as an int from a char if thats what you need

Interpret strings as packed binary data in C++

I have question about interpreting strings as packed binary data in C++. In python, I can use struct module. Is there a module or a way in C++ to interpret strings as packed binary data without embedding Python?
As already mentioned, it is better to consider this an array of bytes (chars, or unsigned chars), possibly held in a std::vector, rather than a string. A string is null terminated, so what happens if a byte of the binary data had the value zero?
You can either cast a pointer within the array to a pointer to your struct, or copy the data over a struct:
#include <memory>
#pragma pack ( push )
#pragma pack( 1 );
struct myData
{
int data1;
int data2;
// and whatever
};
#pragma pack ( pop )
char* dataStream = GetTheStreamSomehow();
//cast the whole array
myData* ptr = reinterpret_cast<myData*>( dataStream );
//cast from a known position within the array
myData* ptr2 = reinterpret_cast<myData*>( &(dataStream[index]) );
//copy the array into a struct
myData data;
memcpy( &data, dataStream, sizeof(myData) );
If you were to have the data stream in a vector, the [] operator would still work. The pragma pack declarations ensure the struct is single byte aligned - researching this is left as an exercise for the reader. :-)
Basically, you don't need to interpret anything. In C++, strings are
packed binary data; you can interpret them as text, but you're not
required to. Just be aware that the underlying type of a string, in
C++, is char, which can be either signed (range [-128,127] on all
machines I've heard of) or unsigned (usually [0,255], but I'm aware of
machines where it is [0,511]).
To pass the raw data in a string to a C program, use
std::string::data() and std::string::size(). Otherwise, you can
access it using iterators or indexation much as you would with
std::vector<char> (which may express the intent better).
A string in C++ has a method called c_str ( http://www.cplusplus.com/reference/string/string/c_str/ ).
c_str returns the relevant binary data in a string in form of an array of characters. You can cast these chars to anything you wish and read them as an array of numbers.
Eventhough it might be closer to pickling in python, boost serialization may be closest to what you want to achieve.
Otherwise you might want to do it by hand. It is not that hard to make reader/writer classes to convert primitives/classes to packed binary format. I would do it by shifting bytes to avoid host endianess issues.

Reinterpret float vector as unsigned char array and back

I've searched and searched stackoverflow for the answer, but have not found what I needed.
I have a routine that takes an unsigned char array as a parameter in order to encode it as Base64. I would like to encode an STL float vector (vector) in Base64, and therefore would need to reinterpret the bytes in the float vector as an array of unsigned characters in order to pass it to the encode routine. I have tried a number of things from reinterpret and static casts, to mem copies, etc, but none of them seem to work (at least not the way I implemented them).
Likewise, I'll need to do the exact opposite when decoding the encoded data back to a float array. The decode routine will provide the decoded data as an unsigned char array, and I will need to reinterpret that array of bytes, converting it to a float vector again.
Here is a stripped down version of my C++ code to do the encoding:
std::string
EncodeBase64FloatVector( const vector<float>& p_vector )
{
unsigned char* sourceArray;
// SOMEHOW FILL THE sourceArray WITH THE FLOAT VECTOR DATA BITS!!
char* target;
size_t targetSize = p_vector.size() * sizeof(float);
target = new char[ targetSize ];
int result = EncodeBase64( sourceArray, floatArraySizeInUChars, target, targetSize );
string returnResult;
if( result != -1 )
{
returnResult = target;
}
delete target;
delete sourceArray;
return returnResult;
}
Any help would be greatly appreciated. Thanks.
Raymond.
std::vector guarantees the data will be contiguous, and you can get a pointer to the first element in the vector by taking the address of the first element (assuming it's not empty).
typedef unsigned char byte;
std::vector<float> original_data;
...
if (!original_data.empty()) {
const float *p_floats = &(original_data[0]); // parens for clarity
Now, to treat that as an array of unsigned char, you use a reinterpret_cast:
const byte *p_bytes = reinterpret_cast<const byte *>(p_floats);
// pass p_bytes to your base-64 encoder
}
You might want to encode the length of the vector before the rest of the data, in order to make it easier to decode them.
CAUTION: You still have to worry about endianness and representation details. This will only work if you read back on the same platform (or a compatible one) that you wrote with.
sourceArray = reinterpret_cast<const unsigned char *>(&(p_vector[0]))
I would highly recommend checking out Google's protobuf to solve your problem. Floats and doubles can vary in size and layout between platforms and that package has solved all those problems for you. Additionally, it can easily handle your data structure should it ever become more complicated than a simple array of floats.
If you do use that, you will have to do your own base64 encoding still as protobuf encodes data assuming you have an 8-bit clean channel to work with. But that's fairly trivial.

How to read in specific sizes and store data of an unknown type in c++?

I'm trying to read data in from a binary file and then store in a data structure for later use. The issue is I don't want to have to identify exactly what type it is when I'm just reading it in and storing it. I just want to store the information regarding what type of data it is and how much data of this certain type there is (information easily obtained in the first couple bytes of this data)
But how can I read in just a certain amount of data, disregarding what type it is and still easily be able to cast (or something similar) that data into a readable form later?
My first idea would be to use characters, since all the data I will be looking at will be in byte units.
But if I did something like this:
ifstream fileStream;
fileStream.open("fileName.tiff", ios::binary);
//if I had to read in 4 bytes of data
char memory[4];
fileStream.read((char *)&memory, 4);
But how could I cast these 4 bytes if I later I wanted to read this and knew it was a double?
What's the best way to read in data of an unknown type but know size for later use?
fireStream.
I think a reinterpret_cast will give you what you need. If you have a char * to the bytes you can do the following:
double * x = reinterpret_cast<double *>(dataPtr);
Check out Type Casting on cplusplus.com for a more detailed description of reinterpret_cast.
You could copy it to the known data structure which makes life easier later on:
double x;
memcpy (&x,memory,sizeof(double));
or you could just refer to it as a cast value:
if (*((double*)(memory)) == 4.0) {
// blah blah blah
}
I believe a char* is the best way to read it in, since the size of a char is guaranteed to be 1 unit (not necessarily a byte, but all other data types are defined in terms of that unit, so that, if sizeof(double) == 27, you know that it will fit into a char[27]). So, if you have a known size, that's the easiest way to do it.
You could store the data in a class that provides functions to cast it to the possible result types, like this:
enum data_type {
TYPE_DOUBLE,
TYPE_INT
};
class data {
public:
data_type type;
size_t len;
char *buffer;
data(data_type a_type, char *a_buffer, size_t a_len)
: type(a_type), buffer(NULL), len(a_len) {
buffer = new char[a_len];
memcpy(buffer, a_buffer, a_len);
}
~data() {
delete[] buffer;
}
double as_double() {
assert(TYPE_DOUBLE == type);
assert(len >= sizeof(double));
return *reinterpret_cast<double*>(buffer);
}
int as_int() {...}
};
Later you would do something like this:
data d = ...;
switch (d.type) {
case TYPE_DOUBLE:
something(d.as_double());
break;
case TYPE_INT:
something_else(d.as_int());
break;
...
}
That's at least how I'm doing these kind of things :)
You can use structures and anonymous unions:
struct Variant
{
size_t size;
enum
{
TYPE_DOUBLE,
TYPE_INT,
} type;
union
{
char raw[0]; // Copy to here. *
double asDouble;
int asInt;
};
};
Optional: Create a table of type => size, so you can find the size given the type at runtime. This is only needed when reading.
static unsigned char typeSizes[2] =
{
sizeof(double),
sizeof(int),
};
Usage:
Variant v;
v.type = Variant::TYPE_DOUBLE;
v.size = Variant::typeSizes[v.type];
fileStream.read(v.raw, v.size);
printf("%f\n", v.asDouble);
You will probably receive warnings about type punning. Read: Doing this is not portable and against the standard! Then again, so is reinterpret_cast, C-style casting, etc.
Note: First edit, I did not read your original question. I only had the union, not the size or type part.
*This is a neat trick I learned a long time ago. Basically, raw doesn't take up any bytes (thus doesn't increase the size of the union), but provides a pointer to a position in the union (in this case, the beginning). It's very useful when describing file structures:
struct Bitmap
{
// Header stuff.
uint32_t dataSize;
RGBPixel data[0];
};
Then you can just fread the data into a Bitmap. =]
Be careful. In most environments I'm aware of, doubles are 8 bytes, not 4; reinterpret_casting memory to a double will result in junk, based on what the four bytes following memory contain. If you want a 32-bit floating point value, you probably want a float (though I should note that the C++ standard does not require that float and double be represented in any way and in particular need not be IEEE-754 compliant).
Also, your code will not be portable unless you take endianness into account in your code. I see that the TIFF format has an endianness marker in its first two bytes that should tell you whether you're reading in big-endian or little-endian values.
So I would write a function with the following prototype:
template<typename VALUE_TYPE> VALUE_TYPE convert(char* input);
If you want full portability, specialize the template and have it actually interpret the bits in input. Otherwise, you can probably get away with e.g.
template<VALUE_TYPE> VALUE_TYPE convert(char* input) {
return reinterpret_cast<double>(input);
}