Using stream to treat received data

Using stream to treat received data - c++

I am receiving messages from a socket.
The socket is packed within a header (that is basically the size of the message) and a footer that is a crc (a kind of code to check if the message is not corrupted)
So, the layout is something like :
size (2 bytes) | message (240 bytes) | crc (4 byte)
I wrote a operator>>
The operator>> is as following :
std::istream &operator>>(std::istream &stream, Message &msg) {
std::int16_t size;
stream >> size;
stream.read(reinterpret_cast<char*>(&msg), size);
// Not receive enough data
if (stream.rdbuf()->in_avail() < dataSize + 4) {
stream.setstate(std::ios_base::failbit);
return stream;
}
std::int16_t gotCrc;
stream >> gotCrc;
// Data not received correctly
if(gotCrc != computeCrc(msg)) {
stream.setstate(std::ios_base::failbit);
}
return stream;
}
The message can arrive byte by byte, or can arrive totally. We can even receive several messages in once.
Basically, what I did is something like this :
struct MessageReceiver {
std::string totalDataReceived;
void messageArrived(std::string data) {
// We add the data to totaldataReceived
totalDataReceived += data;
std::stringbuf buf(totalDataReceived);
std::istream stream(&buf);
std::vector<Message> messages(
std::istream_iterator<Message>(stream),
std::istream_iterator<Message>{});
std::for_each(begin(messages), end(messages), processMessage);
// +4 for crc and + 2 for the size to remove
auto sizeToRemove = [](auto init, auto message) {return init + message.size + 4 + 2;};
// remove the proceed messages
totalDataReceived.remove(accumulate(begin(messages), end(messages), 0, sizeToRemove);
}
};
So basically, we receive data, we insert it into a total array of data received. We stream it, and if we got at least one message, we remove it from the buffer totalDataReceived.
However, I am not sure it is the good way to go. Indeed, this code does not work when a compute a bad crc... (The message is not created, so we don't iterate over it). So each time, I am going to try to read the message with a bad crc...
How can I do this? I can not keep all the data in totalDataReceived because I can receive a lot of messages during the execution life time.
Should I implement my own streambuf?

I found what you want to create is a class which acts like a std::istream. Of course you can choose to create your own class, but I prefer to implement std::streambuf for some reasons.
First, people using your class are accustomed to using it since it acts the same as std::istream if you inherit and implement std::streambuf and std::istream.
Second, you don't need to create extra method or don't need to override operators. They're already ready in std::istream's class level.
What you have to do to implement std::streambuf is to inherit it, override underflow() and setting get pointers using setg().

Related

How to access serialized data of Cap'n'Proto?

I'm working with Cap'n'Proto and my understanding is there is no need to do serialization as it's already being done. So my question is, how would I access the serialized data and get it's size so that I can pass it in as a byte array to another library.
// person.capnp
struct Person {
name #0 :Text;
age #1 :Int16;
}
// ...
::capnp::MallocMessageBuilder message;
Person::Builder person = message.initRoot<Person>();
person.setName("me");
person.setAge(20);
// at this point, how do I get some sort of handle to
// the serialized data of 'person' as well as it's size?
I've seen the writePackedMessageToFd(fd, message); call, but didn't quite understand what was being passed and couldn't find any API docs on it. I also wasn't trying to write to a file descriptor as I need the serialized data returned as const void*.
Looking in Capnproto's message.h file is this function which is in the base class for MallocMessageBuilder which says it gets the raw data making up the message.
kj::ArrayPtr<const kj::ArrayPtr<const word>> getSegmentsForOutput();
// Get the raw data that makes up the message.
But even then, Im' not sure how to get it as const void*.
Thoughts?

::capnp::MallocMessageBuilder message;
is your binary message, and its size is
message.sizeInWords()
(size in bytes divided by 8).

This appears to be whats needed.
// ...
::capnp::MallocMessageBuilder message;
Person::Builder person = message.initRoot<Person>();
person.setName("me");
person.setAge(20);
kj::Array<capnp::word> dataArr = capnp::messageToFlatArray(message);
kj::ArrayPtr<kj::byte> bytes = dataArr.asBytes();
std::string data(bytes.begin(), bytes.end());
const void* dataPtr = data.c_str();
At this point, I have a const void* dataPtr and size using data.size().

Is it possible to use std::basic_ifstream and std::basic_ofstream with a custom buffer?

I cannot figure out, is it possible to use std::basic_ifstream and std::basic_ofstream with a custom implementation of std::basic_filebuf?
How complicated can be an implementation of an input file stream that reads the file by blocks of 64KB size and internally checks some hash value of the block? If the hash is not valid it throws corruption_exception, for example. The output file stream writes the block and the hash value after it.
I found some examples that creates std::ifstream and then creates another stream that reads from it and does additional processing:
std::ifstream infile("test.img");
decompress_stream in(infile, 288);
char data[144 * 128];
in.read(data, 144 * 128);
infile.close();
But at first I expected it should be something like this (without an additional stream):
std::ifstrem in;
in.setbuffer(new MyBuffer());
in.read();
MyBuffer::underflow()
{
//read from original buffer
if (hash != calculated_sash) throw curruption_exception();
//return the data with omitted hash.
}
is this possible?

The file stream objects are effectively a combination of a std::basic_filebuf and a std::basic_[io]stream. The stream interface allows access to the std::basic_streambuf via the rdbuf() methods. Thus, you can replace the file stream stream buffer by another one. However, it wouldn’t have anything to do with the original file buffer.
As the stream buffer you have is a filtering stream buffer it may be reasonable to construct it with a stream and have the constructor inject the filter, i.e., something like this (I’m omitting the templates as these are irrelevant to this discussion but can easily be added):
class filterbuf
: public std::streambuf {
std::istream* istream = nullptr;
std::ostream* ostream = nullptr;
std::streambuf * sbuf;
// override virtual functions as needed
public:
explicit filterbuf(std::istream& in)
: istream(&in)
, sbuf(istream->rdbuf(this)) {
}
explict filterbuf(std::ostream& out)
: ostream(&out)
, sbuf(ostream->rdbuf(this)) {
}
explicit filebuf(std::iostream& inout)
: istream(&inout)
, sbuf(istream->rdbuf(this)) {
}
~filebuf() {
istream && istream->rdbuf(sbuf);
ostream && ostream->rdbuf(sbuf);
}
};
The point of restoring the stream buffer in the destructor is that the std::ostream destructor calls flush() on the object and the custom stream buffer is gone by that time.
The filter would be used like this:
std::istream fin(“whatever”);
filterbuf buf(fin);
if (fin >> whatever) {
...
}

If you want to customise the behaviour of iostreams the easiest way is to use boost::iostreams. Your use case could probably be implemented as an Inputfilter and an OutputFilter, you can use basic_file_source and basic_file_sink to read and write to files.

C++ NTOH conversion with dispatcher - event queue

We are rewriting our legacy code in C to C++. At the core of our system, we have a TCP client, which is connected to master. Master will be streaming messages continuously. Each socket read will result in say an N number of message of the format - {type, size, data[0]}.
Now we don't copy these messages into individual buffers - but just pass the pointer the beginning of the message, the length and shared_ptr to the underlying buffer to a workers.
The legacy C version was single threaded and would do an inplace NTOH conversion like below:
struct Message {
uint32_t something1;
uint16_t something2;
};
process (char *message)
Message *m = (message);
m->something1 = htonl(m->something1);
m->something2 = htons(m->something2);
And then use the Message.
There are couple of issues with following the logging in new code.
Since we are dispatching the messages to different workers, each worker doing an ntoh conversion will cause cache miss issues as the messages are not cache aligned - i.e there is no padding b/w the messages.
Same message can be handled by different workers - this is the case where the message needs to processed locally and also relayed to another process. Here the relay worker needs the message in original network order and the local work needs to convert to host order. Obviously as the message is not duplicated both cannot be satisfied.
The solutions that comes to my mind are -
Duplicate the message and send one copy for all relay workers if any. Do the ntoh conversion of all messages belonging to same buffer in the dispatcher itself before dispatching - say by calling a handler->ntoh(message); so that the cache miss issue is solved.
Send each worker the original copy. Each worker will copy the message to local buffer and then do ntoh conversion and use it. Here each worker can use a thread-specific (thread_local) static buffer as a scratch pad to copy the message.
Now my question is
Is the option 1 way of doing ntoh conversion - C++sy? I mean the alignment requirement of the structure will be different from the char buffer. (we havent had any issue with this yet.). Using scheme 2 should be fine in this case as the scratch buffer can have alignment of max_align_t and hence should typecastable to any structure. But this incur copying the entire message - which can be quite big (say few K size)
Is there a better way to handle the situation?

Your primary issue seems to be how to handle messages that come in misaligned. That is, if each message structure doesn't have enough padding on the end of it so that the following message is properly aligned, you can trigger misaligned reads by reinterpreting a pointer to the beginning of a message as an object.
We can get around this a number of ways, perhaps the simplest would be to ntoh based on a single-byte pointer, which is effectively always aligned.
We can hide the nasty details behind wrapper classes, which will take a pointer to the start of a message and have accessors that will ntoh the appropriate field.
As indicated in the comments, it's a requirement that offsets be determined by a C++ struct, since that's how the message is initially created, and it may not be packed.
First, our ntoh implementation, templated so we can select one by type:
template <typename R>
struct ntoh_impl;
template <>
struct ntoh_impl<uint16_t>
{
static uint16_t ntoh(uint8_t const *d)
{
return (static_cast<uint16_t>(d[0]) << 8) |
d[1];
}
};
template <>
struct ntoh_impl<uint32_t>
{
static uint32_t ntoh(uint8_t const *d)
{
return (static_cast<uint32_t>(d[0]) << 24) |
(static_cast<uint32_t>(d[1]) << 16) |
(static_cast<uint32_t>(d[2]) << 8) |
d[3];
}
};
template<>
struct ntoh_impl<uint64_t>
{
static uint64_t ntoh(uint8_t const *d)
{
return (static_cast<uint64_t>(d[0]) << 56) |
(static_cast<uint64_t>(d[1]) << 48) |
(static_cast<uint64_t>(d[2]) << 40) |
(static_cast<uint64_t>(d[3]) << 32) |
(static_cast<uint64_t>(d[4]) << 24) |
(static_cast<uint64_t>(d[5]) << 16) |
(static_cast<uint64_t>(d[6]) << 8) |
d[7];
}
};
Now we'll define a set of nasty macros that will automatically implement accessors for a given name by looking up the member with the matching name in the struct proto (a private struct to each class):
#define MEMBER_TYPE(MEMBER) typename std::decay<decltype(std::declval<proto>().MEMBER)>::type
#define IMPL_GETTER(MEMBER) MEMBER_TYPE(MEMBER) MEMBER() const { return ntoh_impl<MEMBER_TYPE(MEMBER)>::ntoh(data + offsetof(proto, MEMBER)); }
Finally, we have an example implementation of the message structure you have given:
class Message
{
private:
struct proto
{
uint32_t something1;
uint16_t something2;
};
public:
explicit Message(uint8_t const *p) : data(p) {}
explicit Message(char const *p) : data(reinterpret_cast<uint8_t const *>(p)) {}
IMPL_GETTER(something1)
IMPL_GETTER(something2)
private:
uint8_t const *data;
};
Now Message::something1() and Message::something2() are implemented and will read from the data pointer at the same offsets they wind up being in Message::proto.
Providing the implementation in the header (effectively inline) has the potential to inline the entire ntoh sequence at the call site of each accessor!
This class does not own the data allocation it is constructed from. Presumably you could write a base class if there's ownership-maintaining details here.

Object serialization in C++

I would like to serialize/deserialize some structured data in order to send it over the network via a char* buffer.
More precisely, suppose I have a message of type struct Message.
struct Message {
Header header;
Address address;
size_t size; // size of data part
char* data;
} message
In C, I would use something such as:
size = sizeof(Header) + sizeof(Address) + sizeof(size_t) + message.size;
memcpy(buffer, (char *) message, size);
to serialize, and
Message m = (Message) buffer;
to deserialize.
What would be the "right" way to do it in C++. Is it better to define a class rather than a struct. Should I overload some operators? are there alignment issues to consider?
EDIT: thanks for pointing the "char *" problem. The provided C version is incorrect. The data section pointed to by the data field should be copied separately.

Actually there are many flavors:
You can boost let it do for you: http://www.boost.org/doc/libs/1_52_0/libs/serialization/doc/tutorial.html
Overloading the stream operators << for serialization and >> for deserialization works well with file and string streams
You could specify a constructor Message (const char*) for constructing from a char*.
I am a fan of static methods for deserialization like:
Message {
...
static bool desirialize (Message& dest, char* source);
}
since you could catch errors directly when deserializing.
And the version you proposed is ok, when applying the modifications in the comments are respected.

Why not insert a virtual 'NetworkSerializable' Class into your inheritance tree? A 'void NetSend(fd socket)' method would send stuff, (without exposing any private data), and 'int(bufferClass buffer)' could return -1 if no complete, valid message was deserilalized, or, if a valid message has been assembled, the number of unused chars in 'buffer'.
That encapsulates all the assembly/disassembly protocol state vars and other gunge inside the class, where it belongs. It also allows message/s to be assembled from multiple stream input buffers.
I'm not a fan of static methods. Protocol state data associated with deserialization should be per-instance, (thread-safety).

How to serialize an object to send over network

I'm trying to serialize objects to send over network through a socket using only STL. I'm not finding a way to keep objects' structure to be deserialized in the other host. I tried converting to string, to char* and I've spent a long time searching for tutorials on the internet and until now I have found nothing.
Is there a way to do it only with STL?
Are there any good tutorials?
I am almost trying boost, but if there is how to do it with STL I'd like to learn.

You can serialize with anything. All serialization means is that you are converting the object to bytes so that you can send it over a stream (like an std::ostream) and read it with another (like an std::istream). Just override operator <<(std::ostream&, const T&) and operator >>(std::istream&, T&) where T is each of your types. And all the types contained in your types.
However, you should probably just use an already-existing library (Boost is pretty nice). There are tons of things that a library like Boost does for you, like byte-ordering, taking care of common objects (like arrays and all the stuff from the standard library), providing a consistent means of performing serialization and tons of other stuff.

My first question will be: do you want serialization or messaging ?
It might seem stupid at first, since you asked for serialization, but then I have always distinguished the two terms.
Serialization is about taking a snapshot of your memory and restoring it later on. Each object is represented as a separate entity (though they might be composed)
Messaging is about sending information from one point to another. The message usually has its own grammar and may not reflect the organization of your Business Model.
Too often I've seen people using Serialization where Messaging should have been used. It does not mean that Serialization is useless, but it does mean that you should think ahead of times. It's quite difficult to alter the BOM once you have decided to serialize it, especially if you decide to relocate some part of information (move it from one object to another)... because how then are you going to decode the "old" serialized version ?
Now that that's been cleared up...
... I will recommend Google's Protocol Buffer.
You could perfectly rewrite your own using the STL, but you would end up doing work that has already been done, and unless you wish to learn from it, it's quite pointless.
One great thing about protobuf is that it's language agnostic in a way: ie you can generate the encoder/decoder of a given message for C++, Java or Python. The use of Python is nice for message injection (testing) or message decoding (to check the output of a logged message). It's not something that would come easy were you to use the STL.

Serializing C++ Objects over a Network Socket
This is 6 years late but I just recently had this problem and this was one of the threads that I came across in my search on how to serialize object through a network socket in C++. This solution uses just 2 or 3 lines of code. There are a lot of answers that I found work but the easiest that I found was to use reinterpret_cast<obj*>(target) to convert the class or structure into an array of characters and feed it through the socket. Here's an example.
Class to be serialized:
/* myclass.h */
#ifndef MYCLASS_H
#define MYCLASS_H
class MyClass
{
public:
int A;
int B;
MyClass(){A=1;B=2;}
~MyClass(){}
};
#endif
Server Program:
/* server.cpp */
#include "myclass.h"
int main (int argc, char** argv)
{
// Open socket connection.
// ...
// Loop continuously until terminated.
while(1)
{
// Read serialized data from socket.
char buf[sizeof(MyClass)];
read(newsockfd,buf, sizeof(MyClass));
MyClass *msg = reinterpret_cast<MyClass*>(buf);
std::cout << "A = " << std::to_string(msg->A) << std::endl;
std::cout << "B = " << std::to_string(msg->B) << std::endl;
}
// Close socket connection.
// ...
return 0;
}
Client Program:
/* client.cpp */
#include "myClass.h"
int main(int argc, char *argv[])
{
// Open socket connection.
// ...
while(1)
{
printf("Please enter the message: ");
bzero(buffer,256);
fgets(buffer,255,stdin);
MyClass msg;
msg.A = 1;
msg.B = 2;
// Write serialized data to socket.
char* tmp = reinterpret_cast<char*>(&msg);
write(sockfd,tmp, sizeof(MyClass));
}
// Close socket connection.
// ...
return 0;
}
Compile both server.cpp and client.cpp using g++ with -std=c++11 as an option. You can then open two terminals and run both programs, however, start the server program before the client so that it has something to connect to.
Hope this helps.

I got it!
I used strinstream to serialize objects and I sent it as a message using the stringstream's method str() and so string's c_str().
Look.
class Object {
public:
int a;
string b;
void methodSample1 ();
void methosSample2 ();
friend ostream& operator<< (ostream& out, Object& object) {
out << object.a << " " << object.b; //The space (" ") is necessari for separete elements
return out;
}
friend istream& operator>> (istream& in, Object& object) {
in >> object.a;
in >> object.b;
return in;
}
};
/* Server side */
int main () {
Object o;
stringstream ss;
o.a = 1;
o.b = 2;
ss << o; //serialize
write (socket, ss.str().c_str(), 20); //send - the buffer size must be adjusted, it's a sample
}
/* Client side */
int main () {
Object o2;
stringstream ss2;
char buffer[20];
string temp;
read (socket, buffer, 20); //receive
temp.assign(buffer);
ss << temp;
ss >> o2; //unserialize
}
I'm not sure if is necessary convert to string before to serialize (ss << o), maybe is possible directly from char.

I think you should use google Protocol Buffers in your project.In network transport Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:
are simpler
are 3 to 10 times smaller
are 20 to 100 times faster
are less ambiguous
generate data access classes that are easier to use programmaticall
and so on. I think you need read https://developers.google.com/protocol-buffers/docs/overview about protobuf

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js