Let's say I have the following:
#pragma pack(push,1)
struct HDR {
unsigned short msgType;
unsigned short msgLen;
};
struct Msg1 {
unsigned short msgType;
unsigned short msgLen;
char text[20];
};
struct Msg2 {
unsigned short msgType;
unsigned short msgLen;
uint32_t c1;
uint32_t c2;
};
.
.
.
I want to be able to reuse the HDR struct so I don't have to keep defining the two members: msgType and msgLen. I don't want to involve vtables for performance reasons but I do want to override operator<< for each of the structs. Based on this last requirement, I don't see how I could possibly use a union since the sizes are also different.
Any ideas on how this can best be handled for pure performance
Composition seems most appropriate here:
struct Msg1
{
HDR hdr;
char text[20];
};
Whilst you could use C++ inheritance, it doesn't really make sense semantically in this case; a Msg1 is not a HDR.
Alternatively (and possibly preferentially), you could define an abstract Msg base type:
struct Msg
{
HDR hdr;
protected:
Msg() {}
};
and have all your concrete message classes derive from that.
What is wrong with normal C++ inheritance?
struct HDR { ... };
struct Msg1: HDR { ... };
Just don't declare any virtual member functions and you'll be all set.
Depending on what you're planning on doing with these structs, you can do this without incurring any vtable overhead like this:
struct HDR {
unsigned short msgType;
unsigned short msgLen;
};
struct Msg1: HDR {
char text[20];
friend ostream& operator<< (ostream& out, const Msg1& msg);
};
struct Msg2: HDR {
uint32_t c1;
uint32_t c2;
friend ostream& operator<< (ostream& out, const Msg2& msg);
};
Since the base class does not have any virtual functions in it, you won't get a vtable for these objects. However, you should be aware that this means that if you have a HDR pointer pointing at an arbitrary subclass, you won't be able to print it out, since it won't be clear which operator<< function to call.
I think, though, that there could be a more fundamental issue here. If you're trying to treat all of these objects uniformly through a base pointer but want to be able to print them all out, then you're going to have to take a memory hit to tag each object. You can either tag them implicitly with a vtable, or explicitly by adding your own type information. There really isn't a good way of getting around this.
If, on the other hand, you just want to simplify your logic by factoring the data members into the base class, then this approach should work.
A common pattern to deal with binary network protocols is define a struct that contains an union:
struct Message {
Header hdr;
union {
Body1 msg1;
Body2 msg2;
Body3 msg3;
};
};
Semantically you are stating that a Message is composed of a Header and a body that can be one-of Body1, Body2... Now, provide insertion and extraction operators for the header and each body separately. Then implement the same operators for the Message by calling it on the Header, and depending on the message type, the body of the message that makes sense.
Note that the elements of the union do not need to have the same size. The size of the union will be the maximum of it's members' sizes. This approach allows for a compact binary representation that can be read/written from the network. Your read/write buffer will be the Message and you will read just the header and then the appropriate body.
// Define operators:
std::ostream& operator<<( std::ostream&, Header const & );
std::ostream& operator<<( std::ostream&, Body1 const & ); // and the rest
// Message operator in terms of the others
std::ostream& opeartor<<( std::ostream& o, Message const & m )
{
o << m.header;
switch ( m.header.type ) {
case TYPE1: o << m.body1; break;
//...
};
return o;
}
// read and dump the contents to stdout
Message message;
read( socket, &message, sizeof message.header ); // swap the endianness, check size...
read( socket &message.msg1, message.header.size ); // ...
std::cout << message << std::endl;
Related
Let's say I have an application that keeps receiving the byte stream from the socket. I have the documentation that describes what the packet looks like. For example, the total header size, and total payload size, with the data type corresponding to different byte offsets. I want to parse it as a struct. The approach I can think of is that I will declare a struct and disable the padding by using some compiler macro, probably something like:
struct Payload
{
char field1;
uint32 field2;
uint32 field3;
char field5;
} __attribute__((packed));
and then I can declare a buffer and memcpy the bytes to the buffer and reinterpret_cast it to my structure. Another way I can think of is that process the bytes one by one and fill the data into the struct. I think either one should work but it is kind of old school and probably not safe.
The reinterpret_cast approach mentioned, should be something like:
void receive(const char*data, std::size_t data_size)
{
if(data_size == sizeof(payload)
{
const Payload* payload = reinterpret_cast<const Payload*>(data);
// ... further processing ...
}
}
I'm wondering are there any better approaches (more modern C++ style? more elegant?) for this kind of use case? I feel like using metaprogramming should help but I don't have an idea how to use it.
Can anyone share some thoughts? Or Point me to some related references or resources or even relevant open source code so that I can have a look and learn more about how to solve this kind of problem in a more elegant way.
There are many different ways of approaching this. Here's one:
Keeping in mind that reading a struct from a network stream is semantically the same thing as reading a single value, the operation should look the same in either case.
Note that from what you posted, I am inferring that you will not be dealing with types with non-trivial default constructors. If that were the case, I would approach things a bit differently.
In this approach, we:
Define a read_into(src&, dst&) function that takes in a source of raw bytes, as well as an object to populate.
Provide a general implementation for all arithmetic types is provided, switching from network byte order when appropriate.
Overload the function for our struct, calling read_into() on each field in the order expected on the wire.
#include <cstdint>
#include <bit>
#include <concepts>
#include <array>
#include <algorithm>
// Use std::byteswap when available. In the meantime, just lift the implementation from
// https://en.cppreference.com/w/cpp/numeric/byteswap
template<std::integral T>
constexpr T byteswap(T value) noexcept
{
static_assert(std::has_unique_object_representations_v<T>, "T may not have padding bits");
auto value_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
std::ranges::reverse(value_representation);
return std::bit_cast<T>(value_representation);
}
template<typename T>
concept DataSource = requires(T& x, char* dst, std::size_t size ) {
{x.read(dst, size)};
};
// General read implementation for all arithmetic types
template<std::endian network_order = std::endian::big>
void read_into(DataSource auto& src, std::integral auto& dst) {
src.read(reinterpret_cast<char*>(&dst), sizeof(dst));
if constexpr (sizeof(dst) > 1 && std::endian::native != network_order) {
dst = byteswap(dst);
}
}
struct Payload
{
char field1;
std::uint32_t field2;
std::uint32_t field3;
char field5;
};
// Read implementation specific to Payload
void read_into(DataSource auto& src, Payload& dst) {
read_into(src, dst.field1);
read_into<std::endian::little>(src, dst.field2);
read_into(src, dst.field3);
read_into(src, dst.field5);
}
// mind you, nothing stops you from just reading directly into the struct, but beware of endianness issues:
// struct Payload
// {
// char field1;
// std::uint32_t field2;
// std::uint32_t field3;
// char field5;
// } __attribute__((packed));
// void read_into(DataSource auto& src, Payload& dst) {
// src.read(reinterpret_cast<char*>(&dst), sizeof(Payload));
// }
// Example
struct some_data_source {
std::size_t read(char*, std::size_t size);
};
void foo() {
some_data_source data;
Payload p;
read_into(data, p);
}
An alternative API could have been dst.field2 = read<std::uint32_t>(src), which has the drawback of requiring to be explicit about the type, but is more appropriate if you have to deal with non-trivial constructors.
see it in action on godbolt: https://gcc.godbolt.org/z/77rvYE1qn
I have some structs as:
struct dHeader
{
uint8_t blockID;
uint32_t blockLen;
uint32_t bodyNum;
};
struct dBody
{
char namestr[10];
uint8_t blk_version;
uint32_t reserved1;
}
and I have a stringstream as:
std::stringstream Buffer(std::iostream::in | std::iostream::out);
I want to write a dHdr and multiple dBody structs into Buffer with
Buffer << Hdr1;
Buffer << Body1;
Buffer << Body1;
I get the error:
error: no match for 'operator<<' in 'Buffer << Hdr1'
If I try it with:
Buffer.write(reinterpret_cast<char*>(&Hdr1), sizeof(dbHdr1));
Buffer.write(reinterpret_cast<char*>(&Body1), sizeof(Body1));
Buffer.write(reinterpret_cast<char*>(&Body2), sizeof(Body2));
I get confused about the packing and memory alignment.
What is the best way to write a struct into a stringstream?
And read
the stringstream into a regular string?
For each of your structures, you need to define something similar to this:
struct dHeader
{
uint8_t blockID;
uint32_t blockLen;
uint32_t bodyNum;
};
std::ostream& operator<<(std::ostream& out, const dHeader& h)
{
return out << h.blockID << " " << h.blockLen << " " << h.bodyNum;
}
std::istream& operator>>(std::istream& in, dHeader& h) // non-const h
{
dHeader values; // use extra instance, for setting result transactionally
bool read_ok = (in >> values.blockID >> values.blockLen >> values.bodyNum);
if(read_ok /* todo: add here any validation of data in values */)
h = std::move(values);
/* note: this part is only necessary if you add extra validation above
else
in.setstate(std::ios_base::failbit); */
return in;
}
(similar for the other structures).
Edit: An un-buffered read/write implementation has the following drawbacks:
it is unformatted; This may not be an issue for a small utility application, if you control where it is compiled and run, but normally, if you take the serialized data and run/compile the app on a different architecture you will have issues with endianness; you will also need to ensure the types you use are not-architecture dependent (i.e. keep using uintXX_t types).
it is brittle; The implementation depends on the structures only containing POD types. If you add a char* to your structure later, your code will compile the same, just expose undefined behavior.
it is obscure (clients of your code would expect to either see an interface defined for I/O or assume that your structures support no serialization). Normally, nobody thinks "maybe I can serialize, but using un-buffered I/O" - at least not when being the client of a custom struct or class implementation.
The issues can be ameliorated, by adding i/o stream operators, implemented in terms of un-buffered reads and writes.
Example code for the operators above:
std::ostream& operator<<(std::ostream& out, const dHeader& h)
{
out.write(reinterpret_cast<char*>(&h), sizeof(dHeader));
return out;
}
std::istream& operator>>(std::istream& in, dHeader& h) // non-const h
{
dHeader values; // use extra instance, for setting result transactionally
bool read_ok = in.read( reinterpret_cast<char*>(&values), sizeof(dHeader) );
if(read_ok /* todo: add here any validation of data in values */)
h = std::move(values);
/* note: this part is only necessary if you add extra validation above
else
in.setstate(std::ios_base::failbit); */
return in;
}
This centralizes the code behind an interface (i.e. if your class no longer supports un-buffered writes, you will have to change code in one place), and makes your intent obvious (implement serialization for your structure). It is still brittle, but less so.
You can provide an overload for std::ostream::operator<< like
std::ostream& operator<<(std::ostream&, const dHeader&);
std::ostream& operator<<(std::ostream&, const dBody&);
For more information see this stackoverflow question.
I'm trying to keep objects including vectors of objects in a binary file.
Here's a bit of the load from file code:
template <class T> void read(T* obj,std::ifstream * file) {
file->read((char*)(obj),sizeof(*obj));
file->seekg(int(file->tellg())+sizeof(*obj));
}
void read_db(DB* obj,std::ifstream * file) {
read<DB>(obj,file);
for(int index = 0;index < obj->Arrays.size();index++) {
std::cin.get(); //debugging
obj->Arrays[0].Name = "hi"; //debugging
std::cin.get(); //debugging
std::cout << obj->Arrays[0].Name;
read<DB_ARRAY>(&obj->Arrays[index],file);
for(int row_index = 0;row_index < obj->Arrays[index].Rows.size();row_index++) {
read<DB_ROW>(&obj->Arrays[index].Rows[row_index],file);
for(int int_index = 0;int_index < obj->Arrays[index].Rows[row_index].i_Values.size();int_index++) {
read<DB_VALUE<int>>(&obj->Arrays[index].Rows[row_index].i_Values[int_index],file);
}
}
}
}
And here's the DB/DB_ARRAY classes
class DB {
public:
std::string Name;
std::vector<DB_ARRAY> Arrays;
DB_ARRAY * operator[](std::string);
DB_ARRAY * Create(std::string);
};
class DB_ARRAY {
public:
DB* Parent;
std::string Name;
std::vector<DB_ROW> Rows;
DB_ROW * operator[](int);
DB_ROW * Create();
DB_ARRAY(DB*,std::string);
DB_ARRAY();
};
So now the first argument to the read_db function would have correct values, and the vector Arrays on the object has the correct size, However if I index any value of any object from obj->Arrays it's going to throw the access violation exception.
std::cout << obj->Arrays[0].Name; // error
std::cout << &obj->Arrays[0]; // no error
The later always prints the same address, so when I save an object casted to char* does it save the address of it too?
As various commenters pointed out, you cannot simply serialize a (non-POD) object by saving / restoring it's memory.
The usual way to implement serialization is to implement a serialization interface on the classes. Something like this:
struct ISerializable {
virtual std::ostream& save(std::ostream& os) const = 0;
virtual std::istream& load(std::istream& is) = 0;
};
You then implement this interface in your serializable classes, recursively calling save and load on any members referencing other serializable classes, and writing out any POD members. E.g.:
class DB_ARRAY : public ISerializable {
public:
DB* Parent;
std::string Name;
std::vector<DB_ROW> Rows;
DB_ROW * operator[](int);
DB_ROW * Create();
DB_ARRAY(DB*,std::string);
DB_ARRAY();
virtual std::ostream& save(std::ostream& os) const
{
// serialize out members
return os;
}
virtual std::istream& load(std::istream& is)
{
// unserialize members
return os;
}
};
As count0 pointed out, boost::serialization is also a great starting point.
What is the format of the binary data in the file? Until you specify
that, we can't tell you how to write it. Basically, you have to specify
a format for all of your data types (except char), then write the code
to write out that format, byte by byte (or generate it into a buffer);
and on the other side, to read it in byte by byte, and reconstruct it.
The C++ standard says nothing (or very little) about the size and
representation of the data types, except that sizeof(char) must be
1, and that unsigned char must be a pure binary representation over
all of the bits. And on the machines I have access today (Sun Sparc and
PC's), only the character types have a common representation. As for
the more complex types, the memory used in the value representation
might not even be contiguous: the bitwise representation of an
std::vector, for example, is usually three pointers, with the actual
values in the vector being found somewhere else entirely.
The functions istream::read and ostream::write are
designed for reading data into a buffer for manual parsing, and writing
a pre-formatted buffer. The fact that you need to use a
reinterpret_cast to use them otherwise should be a good indication
that it won't work.
i have a packet struct which have a variable len for a string example:
BYTE StringLen;
String MyString; //String is not a real type, just trying to represent an string of unknown size
My question is how i can make the implementation of this packet inside an struct without knowing the size of members (in this case strings). Here is an example of how i want it to "look like"
void ProcessPacket (PacketStruct* packet)
{
pointer = &packet.MyString;
}
I think its not possible to make since the compiler doesn't know the size of the string until run time. So how can make it look high level and comprehensible?.
The reason i need structs its for document every packet without the user actually have to look any of the functions that analyze the packet.
So i can resume the question to: is there a way to declare an struct of undefined size members or something close as a struct?
I would recommend a shell class that just interprets the packet data.
struct StringPacket {
char *data_;
StringPacket (char *data) : data_(data) {}
unsigned char len () const { return *data_; }
std::string str () const { return std::string(data_+1, len());
};
As mentioned in comments, you wanted a way to treat a variable-sized packet like a struct. The old C way to do that was to create a struct that looked like this:
struct StringPacketC {
unsigned char len_;
char str_[1]; /* Modern C allows char str_[]; but C++ doesn't */
};
And then, cast the data (remember, this is C code):
struct StringPacketC *strpack = (struct StringPacketC *)packet;
But, you are entering undefined behavior, since to access the full range of data in strpack, you would have to read beyond the 1 byte array boundary defined in the struct. But, this is a commonly used technique in C.
But, in C++, you don't have to resort to such a hack, because you can define accessor methods to treat the variable length data appropriately.
you can copy the string into a high-level std::string (at least, if my guess that String is a typedef for const char* is correct):
void ProcessPacket( const PacketStruct& packet )
{
std::string highLevelString( packet.MyString,
static_cast< size_t >( packet.StringLen ) );
...
}
A simple variant according to your posting would be:
struct PacketStruct {
std::string MyString;
size_t length () const { return MyString.length(); }
const char* operator & () const { return MyString.c_str(); }
};
This can be used (almost) as you desired above:
void ProcessPacket (const PacketStruct& packet)
{
const char * pointer = &packet;
size_t length = packet.length();
std::cout << pointer << '\t' << length << std::endl;
}
and should be invoked like:
int main()
{
PacketStruct p;
p.MyString ="Hello";
ProcessPacket(p);
}
I have a class with a member of type uint8 and when I try to output it to an ostream it displays as it's char representation. I would prefer it's int representation so I need to static_cast(myStruct.member) each time which is a bit cumbersome and potentially error-prone. Any ideas?
Implement operator<< on your class and define the cast there. Seems to me like you are violating encapsulation.
class X {
uint8 a;
int get_int () const { return static_cast<int>(a); }
};
Us a wrapper method which encapsulate the casting inside.
Usage:
cout << obj.get_int();