Let's say I have an application that keeps receiving the byte stream from the socket. I have the documentation that describes what the packet looks like. For example, the total header size, and total payload size, with the data type corresponding to different byte offsets. I want to parse it as a struct. The approach I can think of is that I will declare a struct and disable the padding by using some compiler macro, probably something like:
struct Payload
{
char field1;
uint32 field2;
uint32 field3;
char field5;
} __attribute__((packed));
and then I can declare a buffer and memcpy the bytes to the buffer and reinterpret_cast it to my structure. Another way I can think of is that process the bytes one by one and fill the data into the struct. I think either one should work but it is kind of old school and probably not safe.
The reinterpret_cast approach mentioned, should be something like:
void receive(const char*data, std::size_t data_size)
{
if(data_size == sizeof(payload)
{
const Payload* payload = reinterpret_cast<const Payload*>(data);
// ... further processing ...
}
}
I'm wondering are there any better approaches (more modern C++ style? more elegant?) for this kind of use case? I feel like using metaprogramming should help but I don't have an idea how to use it.
Can anyone share some thoughts? Or Point me to some related references or resources or even relevant open source code so that I can have a look and learn more about how to solve this kind of problem in a more elegant way.
There are many different ways of approaching this. Here's one:
Keeping in mind that reading a struct from a network stream is semantically the same thing as reading a single value, the operation should look the same in either case.
Note that from what you posted, I am inferring that you will not be dealing with types with non-trivial default constructors. If that were the case, I would approach things a bit differently.
In this approach, we:
Define a read_into(src&, dst&) function that takes in a source of raw bytes, as well as an object to populate.
Provide a general implementation for all arithmetic types is provided, switching from network byte order when appropriate.
Overload the function for our struct, calling read_into() on each field in the order expected on the wire.
#include <cstdint>
#include <bit>
#include <concepts>
#include <array>
#include <algorithm>
// Use std::byteswap when available. In the meantime, just lift the implementation from
// https://en.cppreference.com/w/cpp/numeric/byteswap
template<std::integral T>
constexpr T byteswap(T value) noexcept
{
static_assert(std::has_unique_object_representations_v<T>, "T may not have padding bits");
auto value_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
std::ranges::reverse(value_representation);
return std::bit_cast<T>(value_representation);
}
template<typename T>
concept DataSource = requires(T& x, char* dst, std::size_t size ) {
{x.read(dst, size)};
};
// General read implementation for all arithmetic types
template<std::endian network_order = std::endian::big>
void read_into(DataSource auto& src, std::integral auto& dst) {
src.read(reinterpret_cast<char*>(&dst), sizeof(dst));
if constexpr (sizeof(dst) > 1 && std::endian::native != network_order) {
dst = byteswap(dst);
}
}
struct Payload
{
char field1;
std::uint32_t field2;
std::uint32_t field3;
char field5;
};
// Read implementation specific to Payload
void read_into(DataSource auto& src, Payload& dst) {
read_into(src, dst.field1);
read_into<std::endian::little>(src, dst.field2);
read_into(src, dst.field3);
read_into(src, dst.field5);
}
// mind you, nothing stops you from just reading directly into the struct, but beware of endianness issues:
// struct Payload
// {
// char field1;
// std::uint32_t field2;
// std::uint32_t field3;
// char field5;
// } __attribute__((packed));
// void read_into(DataSource auto& src, Payload& dst) {
// src.read(reinterpret_cast<char*>(&dst), sizeof(Payload));
// }
// Example
struct some_data_source {
std::size_t read(char*, std::size_t size);
};
void foo() {
some_data_source data;
Payload p;
read_into(data, p);
}
An alternative API could have been dst.field2 = read<std::uint32_t>(src), which has the drawback of requiring to be explicit about the type, but is more appropriate if you have to deal with non-trivial constructors.
see it in action on godbolt: https://gcc.godbolt.org/z/77rvYE1qn
Related
So, I´d want to implement simple serialization for some int variables in C++ and I really don´t know how...
My goal is the following:
I essentially want to be able to convert any integer to binary, preferably with a simple function call.
// Here´s some dummy code of what I essentially want to do
int TestVariable = 25;
String FilePath = "D:\dev\Test.txt";
Serialize(TestVariable, FilePath);
// [...]
// at some later point in the code, when I want to access the file
Deserialize(&TestVariable, FilePath);
I already heard of libraries like Boost, but I think that´d be a bit overkill when I just want to serialize simple variables.
Already thank you in advance for your answers. :D
First of all, there is a little "inconsistency": you're asking for binary serialization, in something that looks like a text file. I will assume you really want a binary output.
The only thing to take care about when serializing integers is the endianness of the machine (even though most of machines are little endian).
In C++17 or lower the easiest way is a runtime check like
inline bool littleEndian()
{
static const uint32_t test = 0x01020304;
return *((uint8_t *)&test) == 0x04;
}
C++20 introduces a compile-time check so you can rewrite the previous as
constexpr bool littleEndian()
{
return std::endian::native == std::endian::little;
}
At this point what you want, is writing in a standard way all integers.
Usually BigEndian is the standard.
template <typename T>
inline static T revert(T num)
{
T res;
for (std::size_t i = 0; i < sizeof(T); i++)
((uint8_t *)&res)[i] = ((uint8_t *)&num)[sizeof(T) - 1 - i];
return res;
}
at this point your serializer would be:
template <typename T>
void serialize(T TestVariable, std::string& FilePath)
{
static_assert(std::is_integral<T>::value); //check that T is of {char, int, ...} type
static_assert(!std::is_reference<T>::value); //check that T is not a reference
std::ofstream o(FilePath);
if (littleEndian())
TestVariable = revert(TestVariable);
o.write((char *)&TestVariable, sizeof(T));
}
and your deserializer would be
template <typename T>
void deserialize(T *TestVariable, std::string FilePath)
{
static_assert(std::is_integral<T>::value);
std::ifstream i(FilePath);
i.read((char *)TestVariable, sizeof(T));
if (littleEndian())
*TestVariable = revert(*TestVariable);
}
Notice: this code is just an example that works with your interface, you just have to include <iostream>, <fstream> and if you're using the c++20 version, include <bit>
First let me laydown the reasons not to do this:
It will be not safe to reuse the files on a different machine
Speed could be much slower than any library
Complex type like pointer, maps or structure are very difficult to implement right
But if you really what to do something custom made you can simply use streams, here is an example using stringstream (I always use stringstream in my unit test because I want them to be quick), but you can simply modify it to use filestream.
Please note, the type must be default constructable to be used by deserialize template function. That must be a very stringent requirement for complex classes.
#include <sstream>
#include <iostream>
template<typename T>
void serialize(std::ostream& os, const T& value)
{
os << value;
}
template<typename T>
T deserialize(std::istream& is)
{
T value;
is >> value;
return value;
}
int main()
{
std::stringstream ss;
serialize(ss, 1353);
serialize(ss, std::string("foobar"));
std::cout << deserialize<int>(ss) << " " << deserialize<std::string>(ss) << std::endl;
return 0;
}
I have a simple POD data class like
struct hash{
char buffer[16];
};
I need to have a vector of many instances of it it will shorelly not fit into ram (20 PB). It is conceptually grouped into a vector (tree). I want to have a way to have a pointer like thing that would hide RAM, filesystem, cold storage, and have a simple array\pointer like interface (makeing fs, operations invisible after initialisation yet allowing to give it multiple places to put data in - RAM, Fast SSD, SSD, HDD, Tape, Cloud drive locations)
How to do such thing in C++?
There is no support for this at the language level.
One solution would be use a memory mapped file, for example see:
Creating a File Mapping Using Large Pages
If you need a more platform independant solution then it is possible you could use boost that has some support for memory mapped files as well in the boost-filesystem library.
Besides that you, you can always make a pointer like object facade to manage the underlying logics (ala. smart pointers).
template<class T>
struct MyMappedPointerType {
T& operator* MyPointerType();//derefence - may throw..
//implement rest of semantics
};
I think the usual would be to use some handle. Then when you want to access the object, you would pass the handle to a function which will load the memory and give you the address, and then you would close the handle. In C++ you would use RAII.
#include <string>
#include <cstdio>
template <class T>
class Access
{
private:
FILE* f= nullptr;
public:
Access(const std::string& filename)
{
f= fopen(filename.data(), "rw");
}
~Access()
{
fclose(f);
}
class WriteAccess
{
T buffer{};
bool dirty= false;
FILE* f;
int64_t elementNumber;
public:
WriteAccess(FILE* f, int64_t elementNumber)
: f(f)
, elementNumber(elementNumber)
{
if (f) {
fseek(f, elementNumber*sizeof(buffer), SEEK_SET);
fread(&buffer, sizeof(buffer), 1, f);
}
}
T& get() { dirty= true; return buffer; }
const T& get() const { return buffer; }
~WriteAccess()
{
if (dirty && f) {
fseek(f, elementNumber*sizeof(buffer), SEEK_SET);
fwrite(&buffer, sizeof(buffer), 1, f);
}
}
};
WriteAccess operator[] (int64_t elementNumber)
{
return WriteAccess(f, elementNumber);
}
};
struct SomeData
{
int a= 0;
int b= 0;
int c= 0;
};
int main()
{
Access<SomeData> myfile("thedata.bin");
myfile[0].get().a= 1;
auto pos1= myfile[1];
pos1.get().a= 10;
pos1.get().b= 10;
}
Of course, you would provide read acccess and write access, probably not using fopen but new c++ files, you should check for errors, and maybe you could get rid of get() function in form of a conversion operator to T.
You should also note that you could use some ref counting, in my simple example Access class should outlive WriteAccess class.
Also, you should lock if this is going to get used by more than one thread, and I assumed that you would not access the same element twice.
Or you could also use memory mapped file access like they've told you.
The C++ core guidelines state many times that using void* as an argument is at best confusing and at worst error-prone. My favorite mention is at the end:
C++ Core Guidelines: To-do: Unclassified proto-rules
Anyone writing a public interface which takes or returns void* should have their toes set on fire.
That one has been a personal favorite of mine for a number of years. :)
That said:
What should this function signature be changed to in order to comply with this suggestion? Currently it works with anything, that can be reinterpreted as a const char*:
bool writeBufferToFile(void* buffer, std::size_t size, const std::string& filePath) const
{
namespace FS = std::filesystem;
FS::path p(filePath);
p.make_preferred();
bool not_valid_path = FS::is_directory(p);
bool invalid = not_valid_path;
if(invalid) { return false; }
std::ofstream ofs;
ofs.open(p.string(), std::ios_base::binary);
if(ofs)
{
ofs.write(reinterpret_cast<const char*>(buffer), size);
return true;
}
return false;
}
What you want is not two parameters, a pointer and a size, but one parameter that represents a binary chunk of data. In an ideal world, you might use something like std::vector<std::uint8_t> const&, but the problem with this is that it forces callers to do an allocate/copy/free if they happen to store the data some other way.
So what you really want is a class that represents a binary chunk of data but doesn't own that data and can be constructed, copied, and destroyed very cheaply regardless of how the underlying data is stored. This avoids the need to have two parameters to express one concept.
A class that encapsulates this is often called a Slice. So I would suggest:
class Slice
{
private:
std::uint8_t const* data_;
std::size_t size_;
...
};
bool writeBufferToFile (const Slice& data, const std::string& filePath) const
Your Slice class can easily be constructed from a std::vector <std::uint8_t> or pretty much any other sensible way of holding a range of bytes.
I went with std::any. It can be used as a type-safe replacement for void*.
Motivating articles:
std::any: How, when, and why
std::any - comparison with void* and motivating examples
bool WriteBufferToFile(const std::any& buffer, std::size_t size, std::filesystem::path filepath) noexcept {
namespace FS = std::filesystem;
filepath = FS::absolute(filepath);
filepath.make_preferred();
const auto not_valid_path = FS::is_directory(filepath);
const auto invalid = not_valid_path;
if(invalid) {
return false;
}
if(std::ofstream ofs{filepath, std::ios_base::binary}; ofs.write(reinterpret_cast<const char*>(&buffer), size)) {
return true;
}
return false;
}
I am using a third party library (mavlink) that defines a number of structs that are all tagged with __attribute__((packed)) so they can be efficiently be transmitted across a serial connection (it is written in C and I am using it in a C++ application). When I receive and reconstruct them I would like to add a time stamp field to them. I think the simplest way is to create a new struct that inherits the existing struct. i.e. in the mavlink library this struct is defined:
MAVPACKED(
typedef struct __mavlink_heartbeat_t {
uint32_t custom_mode;
uint8_t type;
uint8_t autopilot;
uint8_t base_mode;
uint8_t system_status;
uint8_t mavlink_version;
}) mavlink_heartbeat_t;
where MAVPACKED is a macro that applies __attribute__((packed)). sizeof(mavlink_heartbeat_t) returns 9. If I define
struct new_heartbeat_t : mavlink_heartbeat_t
{
uint64_t timestamp;
};
sizeof(new_heartbeat_t) returns 24, so it looks like 7 padding bytes are added (I would assume to end of mavlink_heartbeat_t so that timestamp start at byte 16.)
Are there any gotchas or things to be aware of when doing this or is there a better way?
Inheritance is a is a kind of relationship.
Is the local representation of a heartbeat really a kind of wire message? I doubt it.
But it might reasonably contain one.
I would encapsulate it something like this:
#include <cstdint>
#include <cstddef>
typedef struct __attribute__((packed)) __mavlink_heartbeat_t {
uint32_t custom_mode;
uint8_t type;
uint8_t autopilot;
uint8_t base_mode;
uint8_t system_status;
uint8_t mavlink_version;
} mavlink_heartbeat_t;
extern std::uint64_t now();
void sync_fetch_data(mavlink_heartbeat_t&);
void foo(uint8_t);
struct local_heartbeat
{
mavlink_heartbeat_t const& get_io_buffer() const {
return io_buffer_;
}
mavlink_heartbeat_t& prepare_receive() {
request_timestamp_ = now();
return io_buffer_;
}
void complete_receive() {
response_timestamp_ = now();
}
std::uint64_t get_response_timestamp() const {
return response_timestamp_;
}
private:
// stuff in here might have suspect alignment
mavlink_heartbeat_t io_buffer_;
// but these will be aligned for optimal performance
std::uint64_t request_timestamp_;
std::uint64_t response_timestamp_;
};
int main()
{
// create my local representation
local_heartbeat hb;
// prepare it to receive data
auto& buffer = hb.prepare_receive();
// somehow populate the buffer
sync_fetch_data(buffer); // could be async, etc
// notify the object that reception is complete
hb.complete_receive();
// now use the local representation
foo(hb.get_io_buffer().system_status);
}
I am not sure, but I think I have once seen a method signature that looked like this (in the constructors):
class Buffer {
Buffer(char_with_size *data) { ... };
Buffer(char *data, size_t len) { ... };
};
In the first constructor call an array/pointer parameter can passed of which the compiler automatically knows its size. So, I always know the size of the char buffer passed.
Does anyone know if this really exists in c++?
Only a templated version can possibly make sense:
Buffer(char * data, std::size_t len) { /* ... */ }
template <std::size_t N> Buffer(char (&data)[N]) : Buffer(data, N) { }
(Note that delegating constructors are new and not very widely supported yet. I just use one here for example's sake.)
Sure, Buffer(std::vector<char> data). (IOW, don't use char*.)