Reading struct/union members from a character buffer - c++

I need to process data that is given to me as a char buffer where the actual structure of the data depends on the values of some of its fields.
More specifically, consider the following header file:
struct IncomingMsgStruct
{
MsgHdrStruct msgHdr;
char msgData[MSG_DATA_MAX_SIZE]; // Can hold any of several structures
};
struct RelevantMessageData
{
DateTimeStruct dateTime;
CommonDataStruct commonData;
MsgBodyUnion msgBody;
};
struct DateTimeStruct { /* ... */ };
struct CommonDataStruct
{
char name[NAME_MAX_SIZE + 1];
MsgTypeEnum msgType;
// more elements here
};
union MsgBodyUnion
{
MsgBodyType1Struct msgBodyType1;
MsgBodyType2Struct msgBodyType2;
// ...
MsgBodyTypeNStruct msgBodyTypeN;
};
struct MsgBodyType1Struct { /* ... */ };
struct MsgBodyType2Struct { /* ... */ };
// ...
struct MsgBodyTypeNStruct { /* ... */ };
The structures contain data members (some of which are also structures) and member functions for initialization, conversion to string, etc. There are no constructors, destructors, virtual functions, or inheritance.
Please note that this is in the context of a legacy code that I have no control over. The header and the definitions in it are used by other components, and some of them can change with time.
The data is made available to me as a buffer of characters, so my processing function will look like:
ResultType processRelevantMessage(char const* inBuffer);
It is guaranteed that inBuffer contains a MsgStruct structure, and that its msgData member holds a RelevantMessageData structure. Correct alignment and endianness are also guaranteed as the data originated from the corresponding structures on the same platform.
For simplicity, let's assume that I am only interested in the case where msgType equals to a specific value, so only the members of, say MsgBodyType2Struct, will need to be accessed (and an error returned otherwise). I can generalize it to handle several types later.
My understanding is that a naive implementation using reinterpret_cast can run afoul of the C++ strict aliasing rules.
My question is:
How can I do it in standard-compliant C++ without invoking undefined behaviour, without changing or duplicating the definitions, and without extra copying or allocations?
Or, if that is not possible, how can I do it in GCC (possibly using flags such as -fno-strict-aliasing etc.)?
EDIT:
Since the data comes from the same platform, there should be no endianness concerns.
As mentioned above, I prefer to avoid copying.
Upon further reading, it seems to me that placement-new should be safe. So is the following implementation compliant?
ResultType processRelevantMessageType2(char const* in)
{
MsgStruct const* pMsgStruct = new (in) MsgStruct;
RelevantMessageData const* pRelevantMessageData = new (pMsgStruct->msgData) RelevantMessageData;
// Assume we're only interested in the MsgBodyType2Struct case
if (pRelevantMessageData->commonData.msgType == MSG_TYPE_2) {
MsgBodyType2Struct const& msgBodyType2Struct = pRelevantMessageData->msgBody.MsgBodyType2Struct;
// Can access the fields of msgBodyType2Struct here?
// ...
}
// ...
}

My understanding is that a naive implementation using reinterpret_cast can run afoul of the C++ strict aliasing rules.
Indeed. Also, consider that an array of bytes might start at an arbitrary address in memory, whereas a struct typically has some alignment restrictions that need to be satisfied. The safest way to deal with this is to create a new object of the desired type, and use std::memcpy() to copy the bytes from the buffer into the object:
ResultType processRelevantMessage(char const* inBuffer) {
MsgHdrStruct hdr;
std::memcpy(&hdr, inbuffer, sizeof hdr);
...
RelevantStruct data;
std::memcpy(&data, inbuffer + sizeof hdr, sizeof data);
...
}
The above is well-defined C++ code, you can use hdr and data afterwards without problems (as long as those are POD types that don't contain any pointers).

I suggest using a serialization library or write operator<< and operator>> overloads for those structs. You could use the functions htonl and ntohl which are available on some platforms or write a support class to stream numeric values yourself.
Such a class could look like this:
#include <bit>
#include <algorithm>
#include <cstring>
#include <iostream>
#include <iterator>
#include <limits>
#include <type_traits>
template<class T>
struct tfnet { // to/from net (or file)
static_assert(std::endian::native == std::endian::little ||
std::endian::native == std::endian::big); // endianess must be known
static_assert(std::numeric_limits<double>::is_iec559); // only support IEEE754
static_assert(std::is_arithmetic_v<T>); // only for arithmetic types
tfnet(T& v) : val(&v) {} // store a pointer to the value to be streamed
// write a value to a stream
friend std::ostream& operator<<(std::ostream& os, const tfnet& n) {
if constexpr(std::endian::native == std::endian::little) {
// reverse byte order to be in network byte order
char buf[sizeof(T)];
std::memcpy(buf, n.val, sizeof buf);
std::reverse(std::begin(buf), std::end(buf));
os.write(buf, sizeof buf);
} else {
// already in network byte order
os.write(n.val, sizeof(T));
}
return os;
}
// read a value from a stream
friend std::istream& operator>>(std::istream& is, const tfnet& n) {
char buf[sizeof(T)];
if(is.read(buf, sizeof buf)) {
if constexpr(std::endian::native == std::endian::little) {
// reverse byte order to be in network byte order
std::reverse(std::begin(buf), std::end(buf));
}
std::memcpy(n.val, buf, sizeof buf);
}
return is;
}
T* val;
};
Now, if you have a set of structs:
#include <cstdint>
struct data {
std::uint16_t x = 10;
std::uint32_t y = 20;
std::uint64_t z = 30;
};
struct compound {
data x;
int y = 40;
};
You can add the streaming operators for them:
std::ostream& operator<<(std::ostream& os, const data& d) {
return os << tfnet{d.x} << tfnet{d.y} << tfnet{d.z};
}
std::istream& operator>>(std::istream& is, data& d) {
return is >> tfnet{d.x} >> tfnet{d.y} >> tfnet{d.z};
}
std::ostream& operator<<(std::ostream& os, const compound& d) {
return os << d.x << tfnet{d.y}; // using data's operator<< for d.x
}
std::istream& operator>>(std::istream& is, compound& d) {
return is >> d.x >> tfnet{d.y}; // using data's operator>> for d.x
}
And reading/writing the structs:
#include <sstream>
int main() {
std::stringstream ss;
compound x;
compound y{{0,0,0},0};
ss << x; // write to stream
ss >> y; // read from stream
}
Demo
If you can't use the streaming operators directly on the source streams, you can put the char buffer you do get in an istringstream and extract the data from that using the added operators.

Related

Dynamically allocate memory to arrays in a union

I'm using union to fill some message fields in a char type message buffer. If the length of the message is constant, it works correctly. See the simplified code sample below.
The problem is, my message can have variable length. Specifically, the const N will be decided on runtime. Is there a way to keep using unions by dynamically allocating memory for buf?
I'm exploring smart pointers but haven't had any luck so far.
const int N = 4;
struct repeating_group_t {
uint8_t field1;
uint8_t field2;
}rpt_group;
struct message_t
{
union
{
char buf[2 + 2*N];
struct {
uint8_t header;
uint8_t block_len;
std::array<repeating_group_t, N> group;
};
};
};
int main()
{
message_t msg;
msg.header = 0x32;
msg.block_len = 8;
for (auto i = 0; i < N; i++)
{
msg.group[i].field1 = i;
msg.group[i].field2 = 10*i;
}
// msg.buf is correctly filled
return 0;
}
As said in the comments, use std::vector.
int main() {
// before C++17 use char
std::vector<std::byte> v.
v.push_back(0x32);
v.push_back(8);
for (auto i = 0; i < N; i++) {
v.push_back(i);
const uint16_t a = 10 * i;
// store uint16_t in big endian
v.push_back(a >> 16);
v.push_back(a & 0xff);
}
}
For custom datatypes, you could provide your own stream-like or container-like container and overload operator>> or another custom function of your choice for your datatypes.
struct Message{
std::vector<std::byte> v;
Message& push8(uint8_t t) { ... }
// push 16 bits little endian
Message& push16le(uint16_t t) { ... }
// push 16 bits big endian
Message& push16be(uint16_t t) { ... }
// etc
Message& push(const Repeating_group& t) {
v.push_back(t.field1);
v.push_back(t.field2);
return v;
}
// etc.
};
int main(){
Message v;
v.push8(0x32).push8(8);
for (...) {
v.push(Repeating_group(i, i * 10));
}
}
You can't have N evaluated at runtime because both c-array (your buf) and std::array have size information in its type.
Also - using union for (de)serialization is not a good practice - size of your structure will depend on alignment needed on given machine it is compiled for and so on... You could add packed attribute to overcome it, but you still have plenty of platform dependency problems here.
Regarding variable length - you'd need to write custom (de)serializer that will understand and store/read that size information to recreate that container on the other end.
Where do you want to pass these messages?

Using an already-defined struct as an anonymous member of a union

Let's say I have a 32-bit hardware register Reg that I want to be able to access either as a 32-bit value (e.g. Reg = 0x12345678) or as bitfields (e.g. Reg.lsw = 0xABCD). I can achieve this by declaring a union with anonymous struct member, and declaring assignment and conversion operators to/from uint32_t. In a little-endian environment, the code might look like this:
#include <cstdint>
#include <cstdio>
typedef union
{
uint32_t val ;
struct
{
uint32_t lsw : 16 ;
uint32_t msw : 16 ;
} ;
operator = (uint32_t n) { val = n ; }
operator uint32_t() const { return val ; }
} HWR ;
int main()
{
HWR Reg ;
Reg = 0x12345678 ;
Reg.lsw = 0xABCD ;
printf ("%X\n", uint32_t(Reg)) ;
}
But now let's say I have a whole bunch of these registers, each with its own bitfield layout, and I have a header file FieldDefs.h that declares these bitfield layouts as named structures. How can I use these named structures in the above code, so that I can access the 32-bit value and also the individual bitfields? I could do it like this:
#include "FieldDefs.h" // Defines struct MyHWR
typedef union
{
uint32_t val ;
struct MyHWR field ;
operator = (uint32_t n) { val = n ; }
operator uint32_t() const { return val ; }
} MyHWRUnion ;
But now instead of Reg.lsw =..., I need to type Reg.field.lsw =...
Is there any way (in C++17) to declare an already defined struct as an anonymous member of a union? I am using g++ version 7.3.0 if it matters.
union
{
// ...
struct
{
// ...
};
This is an anonymous struct. Anonymous structs are ill-formed in C++. Only unions may be anonyous. This is different from C where anonymous structs are allowed (since C11).
Is there any way (in C++17) to declare an already defined struct as an anonymous member of a union?
No. Unnamed members cannot have a named type.
You'll need to make a choice between the unnamed member and the pre-declared class. Given that the anonymous struct is non-standard in the first place, I recommend going with the named member and pre-defined class. Maybe give it a short name to minimise verbosity.
I suppose none will like this answer, neither OP (since requires g++ 9.1), neither C++ gurus (UB smells?), but I am still a little proud of tinkering it.
There is [[no_unique_address]] attribute coming in C++20 and g++ 9.1 already supports it (even without -std=c++2a flag).
How can it be utilized here?
By test and trials it seems that if we create proxy member val marked with it it will take address of object1.
Thus we can create Proxy class which has operator=(uint32_t) and operator uint32_t that treats this as uint32_t. The proxy object has no address, does not increase size of struct that utilizes it.
Bitfields names have to be added by inheritance, which got wrapped in simple template, for consistency named HWR.
Voilà, we have HWR<bitfield> object which can be assigned to uint32_t directly, by val member and gives access to bitfields names.
https://godbolt.org/z/N2xEmz
#include <bits/stdint-uintn.h>
#include <cstddef>
#include <cstdint>
#include <cstdio>
// Example bifields, I assumed you have such in "FieldDefs.h"
struct bitfield {
uint32_t lsw : 16;
uint32_t msw : 16;
};
struct ThisProxy {
uint32_t& operator=(uint32_t n) {
auto& uint = *reinterpret_cast<uint32_t*>(this);
uint = n;
return uint;
}
operator uint32_t() const { return *reinterpret_cast<const uint32_t*>(this); }
};
template <typename Bitfield>
struct HWR : Bitfield {
static_assert(sizeof(Bitfield) == 4, "Bad things would happen");
HWR& operator=(uint32_t n) {
this->val = n;
return *this;
}
operator uint32_t() const { return this->val; }
[[no_unique_address]] ThisProxy val;
};
int main() {
HWR<bitfield> Reg;
// Sanity check that proxy points at &Reg and does not increase size
static_assert(offsetof(HWR<bitfield>, val) == 0, "");
static_assert(sizeof(HWR<bitfield>) == 4, "");
Reg = 0x12345678;
Reg.val = 0x8765432A;
Reg.lsw = 0xABCA;
printf("%X\n%ld\n", uint32_t(Reg), sizeof(Reg));
return 0;
}
Edit:
As it turned out that access by Reg.val is not mandatory the trick with inheritance + reinterpret_cast can be reused in pre-C++20 code.
template <typename Bitfield> struct HWR : Bitfield {
static_assert(sizeof(Bitfield) == 4, "Bad things would happen");
HWR &operator=(uint32_t n) {
*reinterpret_cast<uint32_t *>(this) = n;
return *this;
}
operator uint32_t() const {
return *reinterpret_cast<const uint32_t *>(this);
}
};
There is still smell of reinterpret_cast and I need to find out oine thing to fully recommend this code. Whenever bitfield can be interpreted by underlying type uint32_t.
1 I am not sure whenever offset of 0 is guaranteed by P0840R2.
PS. g++ complains with warning: offsetof within non-standard-layout type ‘HWR<bitfield>’ is conditionally-supported [-Winvalid-offsetof], but I didn't try to find workaround for it.
PPS. No anonymous structs!

Writing/Reading a std::map to a binary file require operator

I want to write a std::map to a file and read it back. I'm looking for a rather simple and minimalist way to do it, without boost. I found that it is doable with vector like here Reading and writing a std::vector into a file correctly with iterators
I found this question as it relates to what I want to do, except I'm looking for the binary alternative.
reading a file of key-value pairs in to a std::map
For types with no dynamic memory (actually, pointers) involved
template<size_t N>
struct Serial
{
char bin[N];
friend ostream& operator<<(ostream& os, const Serial& s)
{
for(auto c : bin)
os << c;
return os;
}
friend istream& operator>>(istream& is, Serial& s)
{
for(auto& c : bin)
is >> c;
return is;
}
};
struct Key
{
static constexpr size_t size = sizeof(Key);
Key(const Serial<size>& s) { memcpy(this, s.bin, size); }
Serial<size>& serialize() { return *reinterpret_cast<Serial<size>*>(this); }
};
struct Value
{
static constexpr size_t size = sizeof(Value);
Key(const Serial<size>& s) { memcpy(this, s.bin, size); }
Serial<size>& serialize() { return *reinterpret_cast<Serial<size>*>(this); }
};
void write(ostream& os, const std::map<Key, Value>& m)
{
for(const auto& p : m)
os << p.first.serialize() << p.second.serialize();
}
void read(istream& is, std::map<Key, Value>& m)
{
Serial<Key::size> k;
Serial<Value::size> v;
while(is >> k >> v)
m[k] = v;
}
For types with dynamic memory (pointers) involved, the solution will be then entirely dependent on how they work, no magical solution can be provided.
Have you considered JSON?
Welcome to the messy, confusing, inconsistent world of serialization. Hope you enjoy the ride!!
This is an age-old problem: how to write a modestly complex data structure to some text or binary format, and then be able to later read it back. There are a couple of different ways to do this. However, you said you wanted to serialize to a binary format, so I would recommend using MessagePack.
There's a C++11 library for working with the MessagePack format called msgpack11 that's also rather lightweight, which would seem to fit your requirements. Here's an example:
std::map<A, B> my_map;
// To save my_map:
msgpack11::MsgPack msgpack{my_map};
std::string binary_data = msgpack.dump();
// Now you can save binary_data to a file.
// To get the map back:
string error_string;
auto msgpack = msgpack11::MsgPack::parse(binary_data, error_string);
std::map<A, B> my_map;
// Now you need to manually read back the data.
For binary writing you should use write method of ostream
ostream& ostream::write (const char* s, streamsize n);
See documentation here: http://en.cppreference.com/w/cpp/io/basic_ostream
You can't write map to file driectly, you should write it's represntation , developed by you. You would need to write each key/value pair individually or buffer them is a data block and write it into file.This really isn't much more complicated than a for loop, though. If map contains classes that aren't trivially constructed and destroyed, you should implement a method that allows to serialize class' binary data.
Binary implementations will necessarily be non-portable (for the resultant file). If that is not a concern then consider defining a custom allocator that uses a memory mapped file. You would then declare your std:map using that allocator as one of the template arguments. You could use that map directly, or using range insertion to save an existing map to a file. If the key or value require allocators (e.g. strings) you would have to do declare versions of those types using the memory mapped allocator in the template declaration and define assignment operators from the key/value type to the new types.
You can find some allocator implementations and further discussion by searching for "memory mapped file stl allocator". Also see: Memory mapped file storage in stl vector
void BinSerialize(ostream &out, int32_t x);
void BinSerialize(ostream &out, int16_t x);
void BinSerialize(ostream &out, int8_t x);
void BinSerialize(ostream &out, const string &s)
{
BinSerialize(out, (int32_t)s.size());
out.write(size.c_str(), s.size()));
}
temmplate<class KeyT, ValueT>
void BinSerialize(ostream &out, const std::map<KeyT, ValueT> &m)
{
BinSerialize(out, (int32_t)m.size());
for (auto& item : m)
{
BinSerialize(out, item.first);
BinSerialize(out, item.second);
}
}
void BinDeserialize(istream &input, int32& x);
void BinDeserialize(istream &input, int16& x);
void BinDeserialize(istream &input, int8& x);
void BinDeserialize(istream &input, string &s)
{
int32_t size;
BinDerialize(out, size);
s.resize(size);
out.read(size.c_str(), size);
}
temmplate<class KeyT, class ValueT>
void BinDeserialize(istream &input, std::map<KeyT, ValueT> &m)
{
int32_t size;
m.clear();
BinDeserialize(out, size);
for (int32_t i=0; i<size; ++i)
{
std::pair<KeyT, ValueT> item;
BinDeserialize(out, item.first);
BinDeserialize(out, item.second);
m.insert(item);
}
}
This is quickly written. It is possible to improve it with templates, to cover all basic types and all STL containers.
Also it would be nice to keep in mind about the endian.
It is better avoid use of overloaded operators in this case. But if you if to do it it is best to define class which will wrapp STL stream and will have own set of overloaded >> << operators. Take a look on Qt QDataStream.

Why no default hash for C++ POD structs?

I want to use a POD struct as a hash key in a map, e.g.
struct A { int x; int y; };
std::unordered_map<A, int> my_map;
but I can't do this, since no hash function is auto-generatable for such structs.
Why does the C++ standard not require a default hash for a POD struct?
Why do compilers (specifically, GCC 4.x / 5.x) offer such a hash, even if the standard doesn't mandate one?
How can I generate a hash function, using a template, in a portable way, for all of my POD structures (I'm willing to make semantic assumptions if necessary)?
As from the documentation, a possible implementation in your case would be:
#include<functional>
#include<unordered_map>
struct A { int x; int y; };
namespace std
{
template<> struct hash<A>
{
using argument_type = A;
using result_type = std::size_t;
result_type operator()(argument_type const& a) const
{
result_type const h1 ( std::hash<int>()(a.x) );
result_type const h2 ( std::hash<int>()(a.y) );
return h1 ^ (h2 << 1);
}
};
}
int main() {
std::unordered_map<A, int> my_map;
}
The compiler us not allowed to generate such a specialization because of the standard that does not define anything like that (as already mentioned in the comments).
There is a method to generate hash for POD, like good old c style. Only for real POD with no any linked data on the outside of struct. There is no checking of this requirements in code so use it only when you know and can guarantee this. All fields must be initialized (for example by default constructor like this A(), B() etc).
#pragma pack(push) /* push current alignment to stack */
#pragma pack(1) /* set alignment to 1 byte boundary */
struct A { int x; int y; };
struct B { int x; char ch[8] };
#pragma pack(pop) /* restore original alignment from stack */
struct C { int x __attribute__((packed)); };
template<class T> class PodHash;
template<>
class PodHash<A> {
public:
size_t operator()(const A &a) const
{
// it is possible to write hash func here char by char without using std::string
const std::string str =
std::string( reinterpret_cast<const std::string::value_type*>( &a ), sizeof(A) );
return std::hash<std::string>()( str );
}
};
std::unordered_map< A, int, PodHash<A> > m_mapMyMapA;
std::unordered_map< B, int, PodHash<B> > m_mapMyMapB;
UPD:
Data structure must be defined in data packing section with value of one byte or with pack attribute for prevent padding bytes.
UPD:
But I need to warn that replace deafult packing will make data loading/storing from/to memory for some fields little slowly, to prevent this need to arrange structure data fields with granularity that corresponding your (or most popular) architecture.
I suggest that you can add by yourself additional unused fields not for using but for arrange fields in your data structure for best prformance of memory loading/storing. Example:
struct A
{
char x; // 1 byte
char padding1[3]; // 3 byte for the following 'int'
int y; // 4 bytes - largest structure member
short z; // 2 byte
char padding2[2]; // 2 bytes to make total size of the structure 12 bytes
};
#pragma pack is supported by, at least:
Microsoft compiler
GNU compiler (webarchive)
clang-llvm compiler (webarchive)
Embarcadero (Borland) compiler (webarchive)
Sun WorkShop Compiler (webarchive)
Intel compiler is compatible with GCC, CLANG and Microsoft compiler
More flexible way is to declare comparision class and use it as template param of std::unordered_map.
struct A { int x; int y; };
emplate<class T> class MyHash;
template<>
class MyHash<A> {
public:
size_t operator()(const A &a) const
{
result_type const h1 ( std::hash<int>()(a.x) );
result_type const h2 ( std::hash<int>()(a.y) );
return h1 ^ (h2 << 1);
}
};
std::unordered_map<CString,CString,MyHash> m_mapMyMap;
You may want another Hash for same objects. Flexibility appear with code like this:
std::unordered_map<CString,CString, *MyAnotherHas* > m_mapMyMap;

c++ boost::asio::buffer and structures

Is there anyway to BASICALLY do the following:
#include <boost/asio.hpp>
struct testStruct{
int x;
int y;
};
int main(){
struct testStruct t;
boost::asio::buffer b;
b = boost::asio::buffer(t);
return 0;
}
Where it seems to fail is passing 't' into the buffer, 'b'.
Use the scatter operation of more than a single buffer:
#include <boost/asio.hpp>
#include <vector>
struct testStruct{
int x;
int y;
};
int
main()
{
struct testStruct t;
t.x = 5;
t.y = 7;
std::vector<boost::asio::const_buffer> buffers;
buffers.push_back( boost::asio::buffer(&t.x, sizeof(t.x) ) );
buffers.push_back( boost::asio::buffer(&t.y, sizeof(t.y) ) );
boost::asio::io_service io_service;
boost::asio::ip::tcp::socket socket( io_service ); // note not connected!
std::size_t length = boost::asio::write( socket, buffers );
return 0;
}
Note you'll need to use a corresponding gather on the receiving side. This gets very tedious with anything more than the contrived example you have presented. Which is why I suggested using a more robust serialization mechanism in your previous question.
Just Use Boost.Serialization
You can get a demo from the http://www.boost.org/doc/libs/1_47_0/doc/html/boost_asio/examples.html
When you want to send an Object, It is better for you to Serialize it First.
There are a few things you need to be careful of.
1. Padding
The layout of your struct is implementation-specific. It's entirely possible for there to be placeholder bytes between the x and y members of your struct on the server, and none on the client.
To work around this, you should serialize your structures member by member into a character buffer, and deserialize them on the client in the same manner.
You could write some utility code to help you with this, here's a starting point:
class packet_writer
{
public:
template <typename iter> void write(iter begin, iter end)
{
buffer_.insert(buffer_.end(), begin, end);
}
template <typename T> void write(T data)
{
int8_t* begin = reinterpret_cast<int8_t*>(&data);
write(begin, begin + sizeof(data));
}
const std::vector<int8_t>& buffer() const
{
return buffer_;
}
private:
std::vector<int8_t> buffer_;
};
2. Endianness
Depending on architecture, or in some cases even depending on the current CPU mode (some POWER CPUs support endianness switching), the bytes of your members may be reversed. You have to detect the endianness of the host architecture, and swap the bytes to a predefined order for use in your protocol.