We are rewriting our legacy code in C to C++. At the core of our system, we have a TCP client, which is connected to master. Master will be streaming messages continuously. Each socket read will result in say an N number of message of the format - {type, size, data[0]}.
Now we don't copy these messages into individual buffers - but just pass the pointer the beginning of the message, the length and shared_ptr to the underlying buffer to a workers.
The legacy C version was single threaded and would do an inplace NTOH conversion like below:
struct Message {
uint32_t something1;
uint16_t something2;
};
process (char *message)
Message *m = (message);
m->something1 = htonl(m->something1);
m->something2 = htons(m->something2);
And then use the Message.
There are couple of issues with following the logging in new code.
Since we are dispatching the messages to different workers, each worker doing an ntoh conversion will cause cache miss issues as the messages are not cache aligned - i.e there is no padding b/w the messages.
Same message can be handled by different workers - this is the case where the message needs to processed locally and also relayed to another process. Here the relay worker needs the message in original network order and the local work needs to convert to host order. Obviously as the message is not duplicated both cannot be satisfied.
The solutions that comes to my mind are -
Duplicate the message and send one copy for all relay workers if any. Do the ntoh conversion of all messages belonging to same buffer in the dispatcher itself before dispatching - say by calling a handler->ntoh(message); so that the cache miss issue is solved.
Send each worker the original copy. Each worker will copy the message to local buffer and then do ntoh conversion and use it. Here each worker can use a thread-specific (thread_local) static buffer as a scratch pad to copy the message.
Now my question is
Is the option 1 way of doing ntoh conversion - C++sy? I mean the alignment requirement of the structure will be different from the char buffer. (we havent had any issue with this yet.). Using scheme 2 should be fine in this case as the scratch buffer can have alignment of max_align_t and hence should typecastable to any structure. But this incur copying the entire message - which can be quite big (say few K size)
Is there a better way to handle the situation?
Your primary issue seems to be how to handle messages that come in misaligned. That is, if each message structure doesn't have enough padding on the end of it so that the following message is properly aligned, you can trigger misaligned reads by reinterpreting a pointer to the beginning of a message as an object.
We can get around this a number of ways, perhaps the simplest would be to ntoh based on a single-byte pointer, which is effectively always aligned.
We can hide the nasty details behind wrapper classes, which will take a pointer to the start of a message and have accessors that will ntoh the appropriate field.
As indicated in the comments, it's a requirement that offsets be determined by a C++ struct, since that's how the message is initially created, and it may not be packed.
First, our ntoh implementation, templated so we can select one by type:
template <typename R>
struct ntoh_impl;
template <>
struct ntoh_impl<uint16_t>
{
static uint16_t ntoh(uint8_t const *d)
{
return (static_cast<uint16_t>(d[0]) << 8) |
d[1];
}
};
template <>
struct ntoh_impl<uint32_t>
{
static uint32_t ntoh(uint8_t const *d)
{
return (static_cast<uint32_t>(d[0]) << 24) |
(static_cast<uint32_t>(d[1]) << 16) |
(static_cast<uint32_t>(d[2]) << 8) |
d[3];
}
};
template<>
struct ntoh_impl<uint64_t>
{
static uint64_t ntoh(uint8_t const *d)
{
return (static_cast<uint64_t>(d[0]) << 56) |
(static_cast<uint64_t>(d[1]) << 48) |
(static_cast<uint64_t>(d[2]) << 40) |
(static_cast<uint64_t>(d[3]) << 32) |
(static_cast<uint64_t>(d[4]) << 24) |
(static_cast<uint64_t>(d[5]) << 16) |
(static_cast<uint64_t>(d[6]) << 8) |
d[7];
}
};
Now we'll define a set of nasty macros that will automatically implement accessors for a given name by looking up the member with the matching name in the struct proto (a private struct to each class):
#define MEMBER_TYPE(MEMBER) typename std::decay<decltype(std::declval<proto>().MEMBER)>::type
#define IMPL_GETTER(MEMBER) MEMBER_TYPE(MEMBER) MEMBER() const { return ntoh_impl<MEMBER_TYPE(MEMBER)>::ntoh(data + offsetof(proto, MEMBER)); }
Finally, we have an example implementation of the message structure you have given:
class Message
{
private:
struct proto
{
uint32_t something1;
uint16_t something2;
};
public:
explicit Message(uint8_t const *p) : data(p) {}
explicit Message(char const *p) : data(reinterpret_cast<uint8_t const *>(p)) {}
IMPL_GETTER(something1)
IMPL_GETTER(something2)
private:
uint8_t const *data;
};
Now Message::something1() and Message::something2() are implemented and will read from the data pointer at the same offsets they wind up being in Message::proto.
Providing the implementation in the header (effectively inline) has the potential to inline the entire ntoh sequence at the call site of each accessor!
This class does not own the data allocation it is constructed from. Presumably you could write a base class if there's ownership-maintaining details here.
Related
I am receiving messages from a socket.
The socket is packed within a header (that is basically the size of the message) and a footer that is a crc (a kind of code to check if the message is not corrupted)
So, the layout is something like :
size (2 bytes) | message (240 bytes) | crc (4 byte)
I wrote a operator>>
The operator>> is as following :
std::istream &operator>>(std::istream &stream, Message &msg) {
std::int16_t size;
stream >> size;
stream.read(reinterpret_cast<char*>(&msg), size);
// Not receive enough data
if (stream.rdbuf()->in_avail() < dataSize + 4) {
stream.setstate(std::ios_base::failbit);
return stream;
}
std::int16_t gotCrc;
stream >> gotCrc;
// Data not received correctly
if(gotCrc != computeCrc(msg)) {
stream.setstate(std::ios_base::failbit);
}
return stream;
}
The message can arrive byte by byte, or can arrive totally. We can even receive several messages in once.
Basically, what I did is something like this :
struct MessageReceiver {
std::string totalDataReceived;
void messageArrived(std::string data) {
// We add the data to totaldataReceived
totalDataReceived += data;
std::stringbuf buf(totalDataReceived);
std::istream stream(&buf);
std::vector<Message> messages(
std::istream_iterator<Message>(stream),
std::istream_iterator<Message>{});
std::for_each(begin(messages), end(messages), processMessage);
// +4 for crc and + 2 for the size to remove
auto sizeToRemove = [](auto init, auto message) {return init + message.size + 4 + 2;};
// remove the proceed messages
totalDataReceived.remove(accumulate(begin(messages), end(messages), 0, sizeToRemove);
}
};
So basically, we receive data, we insert it into a total array of data received. We stream it, and if we got at least one message, we remove it from the buffer totalDataReceived.
However, I am not sure it is the good way to go. Indeed, this code does not work when a compute a bad crc... (The message is not created, so we don't iterate over it). So each time, I am going to try to read the message with a bad crc...
How can I do this? I can not keep all the data in totalDataReceived because I can receive a lot of messages during the execution life time.
Should I implement my own streambuf?
I found what you want to create is a class which acts like a std::istream. Of course you can choose to create your own class, but I prefer to implement std::streambuf for some reasons.
First, people using your class are accustomed to using it since it acts the same as std::istream if you inherit and implement std::streambuf and std::istream.
Second, you don't need to create extra method or don't need to override operators. They're already ready in std::istream's class level.
What you have to do to implement std::streambuf is to inherit it, override underflow() and setting get pointers using setg().
My team has been having this issue for a few weeks now, and we're a bit stumped. Kindness and knowledge would be gracefully received!
Working with an embedded system, we are attempting to serialize an object, send it through a Linux socket, receive it in another process, and deserialize it back into the original object. We have the following deserialization function:
/*! Takes a byte array and populates the object's data members */
std::shared_ptr<Foo> Foo::unmarshal(uint8_t *serialized, uint32_t size)
{
auto msg = reinterpret_cast<Foo *>(serialized);
return std::shared_ptr<ChildOfFoo>(
reinterpret_cast<ChildOfFoo *>(serialized));
}
The object is successfully deserialzed and can be read from. However, when the destructor for the returned std::shared_ptr<Foo> is called, the program segfaults. Valgrind gives the following output:
==1664== Process terminating with default action of signal 11 (SIGSEGV)
==1664== Bad permissions for mapped region at address 0xFFFF603800003C88
==1664== at 0xFFFF603800003C88: ???
==1664== by 0x42C7C3: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:149)
==1664== by 0x42BC00: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:666)
==1664== by 0x435999: std::__shared_ptr<ChildOfFoo, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:914)
==1664== by 0x4359B3: std::shared_ptr<ChildOfFoo>::~shared_ptr() (shared_ptr.h:93)
We're open to any suggestions at all! Thank you for your time :)
In general, this won't work:
auto msg = reinterpret_cast<Foo *>(serialized);
You can't just take an arbitrary array of bytes and pretend it's a valid C++ object (even if reinterpret_cast<> allows you to compile code that attempts to do so). For one thing, any C++ object that contains at least one virtual method will contain a vtable pointer, which points to the virtual-methods table for that object's class, and is used whenever a virtual method is called. But if you serialize that pointer on computer A, then send it across the network and deserialize and then try to use the reconstituted object on computer B, you'll invoke undefined behavior because there is no guarantee that that class's vtable will exist at the same memory location on computer B that it did on computer A. Also, any class that does any kind of dynamic memory allocation (e.g. any string class or container class) will contain pointers to other objects that it allocated, and that will lead you to the same invalid-pointer problem.
But let's say you've limited your serializations to only POD (plain old Data) objects that contain no pointers. Will it work then? The answer is: possibly, in very specific cases, but it will be very fragile. The reason for that is that the compiler is free to lay out the class's member variables in memory in different ways, and it will insert padding differently on different hardware (or even with different optimization settings, sometimes), leading to a situation where the bytes that represent a particular Foo object on computer A are different from the bytes that would represent that same object on computer B. On top of that you may have to to worry about different word-lengths on different computers (e.g. long is 32-bit on some architectures and 64-bit on others), and different endian-ness (e.g. Intel CPUs represent values in little-endian form while PowerPC CPUs typically represent them in big-endian). Any one of these differences will cause your receiving computer to misinterpret the bytes it received and thereby corrupt your data badly.
So the remaining part of the question is, what is the proper way to serialize/deserialize a C++ object? And the answer is: you have to do it the hard way, by writing a routine for each class that does the serialization member-variable by member-variable, taking the class's particular semantics into account. For example, here are some methods that you might have your serializable classes define:
// Serialize this object's state out into (buffer)
// (buffer) must point to at least FlattenedSize() bytes of writeable space
void Flatten(uint8_t *buffer) const;
// Return the number of bytes this object will require to serialize
size_t FlattenedSize() const;
// Set this object's state from the bytes in (buffer)
// Returns true on success, or false on failure
bool Unflatten(const uint8_t *buffer, size_t size);
... and here's an example of a simple x/y point class that implements the methods:
class Point
{
public:
Point() : m_x(0), m_y(0) {/* empty */}
Point(int32_t x, int32_t y) : m_x(x), m_y(y) {/* empty */}
void Flatten(uint8_t *buffer) const
{
const int32_t beX = htonl(m_x);
memcpy(buffer, &beX, sizeof(beX));
buffer += sizeof(beX);
const int32_t beY = htonl(m_y);
memcpy(buffer, &beY, sizeof(beY));
}
size_t FlattenedSize() const {return sizeof(m_x) + sizeof(m_y);}
bool Unflatten(const uint8_t *buffer, size_t size)
{
if (size < FlattenedSize()) return false;
int32_t beX;
memcpy(&beX, buffer, sizeof(beX);
m_x = ntohl(beX);
buffer += sizeof(beX);
int32_t beY;
memcpy(&beY, buffer, sizeof(beY));
m_y = ntohl(beY);
return true;
}
int32_t m_x;
int32_t m_y;
};
... then your unmarshal function could look like this (note I've made it templated so that it will work for any class that implements the above methods):
/*! Takes a byte array and populates the object's data members */
template<class T> std::shared_ptr<T> unmarshal(const uint8_t *serialized, size_t size)
{
auto sp = std::make_shared<T>();
if (sp->Unflatten(serialized, size) == true) return sp;
// Oops, Unflatten() failed! handle the error somehow here
[...]
}
If this seems like a lot of work compared to just grabbing the raw memory bytes of your class object and sending them verbatim across the wire, you're right -- it is. But this is what you have to do if you want the serialization to work reliably and not break every time you upgrade your compiler, or change your optimization flags, or want to communicate between computers with different CPU architectures. If you'd rather not do this sort of thing by hand, there are pre-packaged libraries to assist by with (partially) automating the process, such as Google's Protocol Buffers library, or even good old XML.
The segfault during destruction occurs because you are creating a shared_ptr object by reinterpret casting a pointer to a uint8_t. During the destruction of the returned shared_ptr object the uint8_t will be released as if it is a pointer to a Foo* and hence the segfault occurs.
Update your unmarshal as given below and try it.
std::shared_ptr<Foo> Foo::unmarshal(uint8_t *&serialized, uint32_t size)
{
ChildOfFoo* ptrChildOfFoo = new ChildOfFoo();
memcpy(ptrChildOfFoo, serialized, size);
return std::shared_ptr<ChildOfFoo>(ptrChildOfFoo);
}
Here the ownership of the the ChildOfFoo object created by the statement ChildOfFoo* ptrChildOfFoo = new ChildOfFoo(); is transferred to the shared_ptr object returned by the unmarshal function. So when the returned shared_ptr object's destructor is called, it will be properly de-allocated and no segfault occurs.
Hope this help!
I would like to serialize/deserialize some structured data in order to send it over the network via a char* buffer.
More precisely, suppose I have a message of type struct Message.
struct Message {
Header header;
Address address;
size_t size; // size of data part
char* data;
} message
In C, I would use something such as:
size = sizeof(Header) + sizeof(Address) + sizeof(size_t) + message.size;
memcpy(buffer, (char *) message, size);
to serialize, and
Message m = (Message) buffer;
to deserialize.
What would be the "right" way to do it in C++. Is it better to define a class rather than a struct. Should I overload some operators? are there alignment issues to consider?
EDIT: thanks for pointing the "char *" problem. The provided C version is incorrect. The data section pointed to by the data field should be copied separately.
Actually there are many flavors:
You can boost let it do for you: http://www.boost.org/doc/libs/1_52_0/libs/serialization/doc/tutorial.html
Overloading the stream operators << for serialization and >> for deserialization works well with file and string streams
You could specify a constructor Message (const char*) for constructing from a char*.
I am a fan of static methods for deserialization like:
Message {
...
static bool desirialize (Message& dest, char* source);
}
since you could catch errors directly when deserializing.
And the version you proposed is ok, when applying the modifications in the comments are respected.
Why not insert a virtual 'NetworkSerializable' Class into your inheritance tree? A 'void NetSend(fd socket)' method would send stuff, (without exposing any private data), and 'int(bufferClass buffer)' could return -1 if no complete, valid message was deserilalized, or, if a valid message has been assembled, the number of unused chars in 'buffer'.
That encapsulates all the assembly/disassembly protocol state vars and other gunge inside the class, where it belongs. It also allows message/s to be assembled from multiple stream input buffers.
I'm not a fan of static methods. Protocol state data associated with deserialization should be per-instance, (thread-safety).
I have a set of files with binary data. Each file is composed of blocks and each block has a header and then a set of events. Each event has a header and then a sequence of fields. My problem is with the sequence of fields.
These fields contain different lengths of ordered/structured data but the fields do not come in any particular order. For example, in one event I might have in one event 3 fields looking as follows:
Event Header (12 bytes, always, made of things like number of fields, size, etc)
Field Header (2 bytes, always, field type in the top 4 bits, size in the bottom 12)
Field Data (4299-4298(VDC) data, Signals from various wires in a vertical drift chamber)
Field Header ( '' )
Field Data (ADC-LAS data, Signals from various photo multiplier tubes)
Field Header ( '' )
Field Data (FERA data, Signals from a fast encoding readout adc system)
In another event I might have the same fields plus a few more, or a field removed and another added in, etc. It all depends on which pieces of the DAQ hardware had data to be recorded when the readout system triggered.
I have thought about a few possible solutions and honestly, none of them seem palatable to me.
Method 1:
Make an abstract base class Field and then for each field type (there are only 13) inherit from that.
Pros: Reading the data in from the file is easy, simply get the region id, allocate the appropriate type of field, read the data, and store a Field*. Also, this method appeals to my sense of a place for everything and everything in its place.
Cons: When I process the fields in an event to convert the data to the information that the analysis actually uses I am continuously needing to dynamic_cast<>() everything to the derived class. This is a bit tedious and ugly and I remember reading somewhere (a while ago) that if you are having to use dynamic_cast<>() then you are using polymorphism 'wrong'. Also this makes having object pools for the fields tricky as I would need a pool for every subclass of Field. Finally, if more field types are added later then in addition to modifying the processing code, additional subclasses of field need to be created.
Method 2:
Simply have a big array of bytes to store all the field headers and data. Then leave it up to the processing code to extract the structure as well as process the data.
Pros: This means that if data formats change in the future then the only changes that need to occur are in the event processing code. It's a pretty simple solution. It's more flexible.
Cons: Jobs in the processing/reading code are less compartmentalized. Feels less elegant.
I recognize that there is probably not a solution that is going to be 'elegant' in every way, and from the standpoint of KISS I am leaning towards method 2. Should I choose Method 1, Method 2, or is there some Method 3 that I have not thought of?
You are trying to choose between struct or tuple or MSRA safeprotocole handler
` // Example program
#include
#include
#include
#include
// start ABI Protocole signature
const int EVENT_HEADER_SZ = 12;
const int FIELD_HEADER_SZ = 2;
const int FIELD_DATA_SIZE = 2^12;
// end ABI Protocole
#ifdef WINDOWS
#define __NOVT __declspec(novtable
#else
#define __NOVT
#endif
struct Protocole_Header __NOVT {
union {
char pbody[EVENT_HEADER_SZ+1];
unsigned ptype : 32;
unsigned psize : 32;
unsigned pmisc : 32;
};
};
struct Field_Header __NOVT {
union {
char fbody[FIELD_HEADER_SZ+1];
unsigned ftype : 4; // type of data 0...15
unsigned fsize : 12; // size of field data to follow up 0..4096 max size
};
};
struct Field_Data {
std::string _content;
};
typedef std::tuple<uint_fast32_t, int_fast32_t,uint_fast32_t> Protocole_Header2;
enum PHR{
TPL_TYPE,
TPL_SIZE,
TPL_ETC
};
std::istream& operator >>(std::istream &is, std::tuple<uint_fast32_t, int_fast32_t,uint_fast32_t>& tpl)
{
is >> std::get<TPL_TYPE>(tpl) >> std::get<TPL_SIZE>(tpl) >> std::get<TPL_ETC>(tpl);
return is;
}
union Field_Header2 {
char fbody[FIELD_HEADER_SZ];
unsigned ftype : 4; // type of data 0...15
unsigned fsize : 12; // size of field data to follow up 0..4096 max size
};
int main()
{
Protocole_Header ph;
Field_Header fh;
Field_Data fd;
long i;
char protocole_buffer[FIELD_DATA_SIZE+1];
std::cin.get(ph.pbody,EVENT_HEADER_SZ);
std::cin.get(fh.fbody,FIELD_HEADER_SZ);
for(i=0;i<ph.psize;++i)
{
std::cin.get(protocole_buffer,fh.fsize);
fd._content = protocole_buffer; // push somewhere else
std::cin.get(fh.fbody,FIELD_HEADER_SZ);
}
// ...
// ...
Protocole_Header2 ph2;
Field_Header2 fh2;
std::cin >> ph2;
std::cin.get(fh2.fbody,FIELD_HEADER_SZ);
for(i=0;i<std::get<TPL_SIZE>(ph2);++i)
{
std::cin.get(protocole_buffer,fh.fsize);
fd._content = protocole_buffer; // push somewhere else
std::cin.get(fh2.fbody,FIELD_HEADER_SZ);
}
}`
Here , you have both of your answer ...
Note , using metastructure over structure is as much a burden than find back the code and recompile it in case of rupture of protocole.
Usually you do not define ABI for protocole structure, and that why C++ Spirit was made.
A parser must be used to handle protocole ( always, because protocole is a grammar on its own, define a EBNF and your code will run for decades without people to recompile it ... )
There is only exception for not using a parser , its when you need to pass MSRA or Heatlh Care or any Regulated sector. Rest of time, don't bind external data to ABI structure with C or C++ , it's a 100% cause of bugs .
I am totally new to parallel computing and Boost library. But in my current project, I need send/recv a vector contain serialized class objects and the size will be decided in run time. After read the boost::mpi and boost::serialization document, I find below code while search in Google and compiled it using vs2008 with no error.
#include <boost/mpi.hpp>
#include <iostream>
#include <vector>
namespace mpi = boost::mpi;
class gps_position
{
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive &ar, const unsigned int version)
{
ar & degrees;
ar & minutes;
ar & seconds;
}
public:
int degrees;
int minutes;
float seconds;
gps_position() {};
gps_position(int d, int m, float s) :
degrees(d), minutes(m), seconds(s)
{}
};
int main(int argc, char *argv[]) {
mpi::environment env(argc, argv);
mpi::communicator world;
if(world.rank() == 0) {
std::vector<gps_position> positions;
positions.push_back(gps_position(1, 2, 3.));
positions.push_back(gps_position(5, 6, 10.0));
std::cout<< "Sent GPS positions:"<<positions.size()<<std::endl;
world.send(1, 0, positions);
}
else {
std::vector<gps_position> positions;
world.recv(0, 0, positions);
std::cout << "Received GPS positions: "<<positions.size() << std::endl;
for(unsigned int i=0;i<positions.size(); i++) {
std::cout << positions[i].degrees << "\t"
<< positions[i].minutes << "\t"
<< positions[i].seconds << std::endl;
}
}
return 0;
}
However the program is not working properly. looks like the process1 can never receive the vector contain gps_position objects from process0. The output is
c:\mpi3>mpiexec -n 2 mpitest
Sent GPS positions:2
Received GPS positions: 0
I have modified the code to allow it pass single element instead whole vector and it works perfectly. So I have totally no idea about whats wrong with the original code. Is boost::mpi capable to pass this type of vector at all? Any suggestions are greatly appreciated.
Thank you all in advance
Zac
Boost says that it can handle vector...and your type itself it obviously can handle too. naivly i would it expect to work.
check out the following from the boost documentation
Send data to another process.
This routine executes a potentially blocking send with tag tag to the
process with rank dest. It can be received by the destination process
with a matching recv call.
The given value must be suitable for transmission over MPI. There are
several classes of types that meet these requirements:
Types with mappings to MPI data types: If is_mpi_datatype<T> is convertible to mpl::true_, then value will be transmitted using the
MPI data type get_mpi_datatype(). All primitive C++ data types that
have MPI equivalents, e.g., int, float, char, double, etc., have
built-in mappings to MPI data types. You may turn a Serializable type
with fixed structure into an MPI data type by specializing
is_mpi_datatype for your type.
Serializable types: Any type that provides the serialize() functionality required by the Boost.Serialization library can be
transmitted and received.
Packed archives and skeletons: Data that has been packed into an mpi::packed_oarchive or the skeletons of data that have been backed
into an mpi::packed_skeleton_oarchive can be transmitted, but will be
received as mpi::packed_iarchive and mpi::packed_skeleton_iarchive,
respectively, to allow the values (or skeletons) to be extracted by
the destination process.
Content: Content associated with a previously-transmitted skeleton can be transmitted by send and received by recv. The receiving process
may only receive content into the content of a value that has been
constructed with the matching skeleton.
For types that have mappings to an MPI data type (including the
concent of a type), an invocation of this routine will result in a
single MPI_Send call. For variable-length data, e.g., serialized types
and packed archives, two messages will be sent via MPI_Send: one
containing the length of the data and the second containing the data
itself. Note that the transmission mode for variable-length data is an
implementation detail that is subject to change.
Thank you all for your help.
Finally solved this problem by recompile it under VS 2010. Not sure the root reason though. I guess some mismatch in head files and libs?