Boost serialization : ensure data safety over socket transmition - c++

I'm using boost 1.53 and serialization to transfer an array of 520 floats over TCP/IP. I put a debug code printout to see the amount of data to be send : it's about 5 K. No problem for me here, but this value somehow depends on the actual data to be serialized. It could be 5400, 5500 and so on.
The question is : what is the right way to receive such data block? For the moment I use read_some() call. But as I've figured out it doesn't guarantee that the whole serialized block of data will be read out. Am I wrong?
How to ensure that there will be a complete archive at RX side? Is there any exception to be thrown when it is not possible to deserialize a chunk of data?

as far as tcpip packet can be received to a number of smaller packets so I'd recommend to add some additional data to tcpip
something like this:
serialize you data to stream
get size of stream
send to tcpip buffer starting with size of stream and then data from the stream
receiver reads size and then reads the rest of the packet.
after you received the full packet - call deserialization

Yes. read_some is potentially a no-op on conforming implementations[1].
Instead do a loop using read() and gcount(), like:
std::istream& is = gotten_from_somewhere_or_a_parameter();
std::vector<byte> v(256);
std::streamsize bytes_read;
do
{
is.read(v.data(),v.size());
bytes_read = stream.gcount ();
// do something with the bytes read
} while(bytes_read);
[1] Notably, gcc's standard library implementation seems to always return something for std::filebuf but on MSVC, the first call will simply always return 0 bytes read :)

Related

boost asio find beginning of message in tcp based protocol

I want to implement a client for a sensor that sends data over tcp and uses the following protocol:
the message-header starts with the byte-sequence 0xAFFEC0CC2 of type uint32
the header in total is 24 Bytes long (including the start sequence) and contains the size in bytes of the message-body as a uint32
the message-body is sent directly after the header and not terminated by a demimiter
Currently, I got the following code (assume a connected socket exists)
typedef unsigned char byte;
boost::system::error_code error;
boost::asio::streambuf buf;
std::string magic_word_s = {static_cast<char>(0xAF), static_cast<char>(0xFE),
static_cast<char>(0xC0), static_cast<char>(0xC2)};
ssize_t n = boost::asio::read_until(socket_, buf, magic_word_s, error);
if(error)
std::cerr << boost::system::system_error(error).what() << std::endl;
buf.consume(n);
n = boost::asio::read(socket_, buf, boost::asio::transfer_exactly(20);
const byte * p = boost::asio::buffer_cast<const byte>(buf.data());
uint32_t size_of_body = *((byte*)p);
unfortunately the documentation for read_until remarks:
After a successful read_until operation, the streambuf may contain additional data beyond the delimiter. An application will typically leave that data in the streambuf for a subsequent read_until operation to examine.
which means that I loose synchronization with the described protocol.
Is there an elegant way to solve this?
Well... as it says... you just "leave" it in the object, or temporary store it in another, and handle the whole message (below called 'packet') if it is complete.
I have a similar approach in one of my projects. I'll explain a little how I did it, that should give you a rough idea how you can handle the packets correctly.
In my Read-Handler (-callback) I keep checking if the packet is complete. The meta-data information (header for you) is temporary stored in a map associated with the remote-partner (map<RemoteAddress, InfoStructure>).
For example it can look like this:
4 byte identifier
4 byte message-length
n byte message
Handle incoming data, check if identifier + message-length are received already, continue to check if message-data is completed with received data.
Leave rest of the packet in the temporary buffer, erase old data.
Continue with handling when next packet arrives or check if received data completes next packet already...
This approach may sound a little slow, but I get even with SSL 10MB/s+ on a slow machine.
Without SSL much higher transfer-rates are possible.
With this approach, you may also take a look into read_some or its asynchronous version.

Reading data from socket using read function

I am trying to read data using the following code from a socket:
n = read(fd, buffer, 50000);
The question is: when the data from the web server is larger than the tcp package size, these data will be splited into multi packages. In this case, will read function just read one data package from fd, or it will read all the packages from fd?
Note that read function is called only once.
Because you are using TCP, your socket is of type SOCK_STREAM. A SOCK_STREAM socket is a byte stream and does not maintain packet boundaries, so the call to read() or recv() will read data that came from multiple packets if multiple packets of data have been received and there is sufficient space in your buffer. It may also return data from a portion of a packet if your buffer if not large enough to hold all of the data. The next read() will continue reading from the next byte.
The function read receives at maximum the specified count of bytes, in your example 50000.
When the function read returns, you need to check the return value. The actual number of bytes written to buffer is in your variable n.

boost::asio async_read guarantee all bytes are read

I have a server that receives a compressed string (compressed with zlib) from a client, and I was using async_receive from the boost::asio library to receive this string, it turns out however that there is no guarantee that all bytes will be received, so I now have to change it to async_read. The problem I face is that the size of the bytes received is variable, so I am not sure how to use async_read without knowing the number of bytes to be received. With the async_receive I just have a boost::array<char, 1024>, however this is a buffer that is not necessarily filled completely.
I wondered if anyone can suggest a solution where I can use async_read even though I do not know the number of bytes to be received in advance?
void tcp_connection::start(boost::shared_ptr<ResolverQueueHandler> queue_handler)
{
if (!_queue_handler.get())
_queue_handler = queue_handler;
std::fill(buff.begin(), buff.end(), 0);
//socket_.async_receive(boost::asio::buffer(buff), boost::bind(&tcp_connection::handle_read, shared_from_this(), boost::asio::placeholders::error));
boost::asio::async_read(socket_, boost::asio::buffer(buff), boost::bind(&tcp_connection::handle_read, shared_from_this(), boost::asio::placeholders::error));
}
buff is a boost::array<char, 1024>
How were you expecting to do this using any other method?
There are a few general methods to sending data of variable sizes in an async manor:
By message - meaning that you have a header that defines the length of the expected message followed by a body which contains data of the specified length.
By stream - meaning that you have some marker (and this is very broad) method of knowing when you've gotten a complete packet.
By connection - each complete packet of data is sent in a single connection which is closed once the data is complete.
So can your data be parsed, or a length sent etc...
Use async_read_until and create your own match condition, or change your protocol to send a header including the number of bytes to expect in the compressed string.
A single IP packet is limited to an MTU size of ~1500 bytes, and yet still you can download gigabyte-large files from your favourite website, and watch megabyte-sized videos on YouTube.
You need to send a header indicating the actual size of the raw data, and then receive the data by pieces on smaller chunks until you finish receiving all the bytes.
For example, when you download a large file over HTTP, there is a field on the header indicating the size of the file: Content-Length:.

synchronizing between send/recv in sockets

I have a server thats sending out data records as strings of varying length(for eg, 79,80,81,82)
I want to be able to receive exactly one record at a time.I've delimited records with a (r) but because I dont know howmany bytes I have to receive, It sometimes merges records and makes it difficult for me to process.
I have two ideas for you:
Use XML for the protocol. This way you know exactly when each message ends.
Send in the header of each "packet" the packet size, this way you know how much to read from the socket for this specific packet.
Edit:
Look at this dummy code for (2)
int buffer_size;
char* buffer;
read( socket, &buffer_size, sizeof(buffer_size));
buffer = (char*) malloc(packet_size);
read( socket, buffer, buffer_size );
// do something
free( buffer) ;
EDIT:
I recommend looking at the comments here, as they note that the contect might not be ready by a simple "read()", you need to keep "read()"ing, until you get the correct buffer size.
Also - you might not need to read the size. Basically you need to look for the ending top level tag of the XML. This can be done by parsing the whole XML, or parlty parsing the XML you get from the stream untill you have 0 nodes "open".
You should delimit with null byte. Show us your code, and we may be able to help you.
Stream sockets do not natively support an idea of a "record" - the abstraction they provide is that of a continuous stream.
You must implement a layer on top of them to provide "records". It sounds like you are already part way there, with the end-of-record delimiter. The pseudo-code to complete it is:
create empty buffer;
forever {
recv data and append to buffer;
while (buffer contains end-of-record marker) {
remove first record from buffer and process it;
move remaining data to beginning of buffer;
}
}
Are you sending your data as a stream?
You can send it as a structure which is easier to parse and retrieve the data from.
struct Message
{
int dataSize;
char data[256];
};

Using Boost.Asio to get "the whole packet"

I have a TCP client connecting to my server which is sending raw data packets. How, using Boost.Asio, can I get the "whole" packet every time (asynchronously, of course)? Assume these packets can be any size up to the full size of my memory.
Basically, I want to avoid creating a statically sized buffer.
Typically when you build a custom protocol on the top of TCP/IP you use a simple message format where first 4 bytes is an unsigned integer containing the message length and the rest is the message data. If you have such a protocol then the reception loop is as simple as below (not sure what is ASIO notation, so it's just an idea)
for(;;) {
uint_32_t len = 0u;
read(socket, &len, 4); // may need multiple reads in non-blocking mode
len = ntohl(len);
assert (len < my_max_len);
char* buf = new char[len];
read(socket, buf, len); // may need multiple reads in non-blocking mode
...
}
typically, when you do async IO, your protocol should support it.
one easy way is to prefix a byte array with it's length at the logical level, and have the reading code buffer up until it has a full buffer ready for parsing.
if you don't do it, you will end up with this logic scattered all over the place (think about reading a null terminated string, and what it means if you just get a part of it every time select/poll returns).
TCP doesn't operate with packets. It provides you one contiguous stream. You can ask for the next N bytes, or for all the data received so far, but there is no "packet" boundary, no way to distinguish what is or is not a packet.