Using Boost.Asio to get "the whole packet" - c++

I have a TCP client connecting to my server which is sending raw data packets. How, using Boost.Asio, can I get the "whole" packet every time (asynchronously, of course)? Assume these packets can be any size up to the full size of my memory.
Basically, I want to avoid creating a statically sized buffer.

Typically when you build a custom protocol on the top of TCP/IP you use a simple message format where first 4 bytes is an unsigned integer containing the message length and the rest is the message data. If you have such a protocol then the reception loop is as simple as below (not sure what is ASIO notation, so it's just an idea)
for(;;) {
uint_32_t len = 0u;
read(socket, &len, 4); // may need multiple reads in non-blocking mode
len = ntohl(len);
assert (len < my_max_len);
char* buf = new char[len];
read(socket, buf, len); // may need multiple reads in non-blocking mode
...
}

typically, when you do async IO, your protocol should support it.
one easy way is to prefix a byte array with it's length at the logical level, and have the reading code buffer up until it has a full buffer ready for parsing.
if you don't do it, you will end up with this logic scattered all over the place (think about reading a null terminated string, and what it means if you just get a part of it every time select/poll returns).

TCP doesn't operate with packets. It provides you one contiguous stream. You can ask for the next N bytes, or for all the data received so far, but there is no "packet" boundary, no way to distinguish what is or is not a packet.

Related

boost asio find beginning of message in tcp based protocol

I want to implement a client for a sensor that sends data over tcp and uses the following protocol:
the message-header starts with the byte-sequence 0xAFFEC0CC2 of type uint32
the header in total is 24 Bytes long (including the start sequence) and contains the size in bytes of the message-body as a uint32
the message-body is sent directly after the header and not terminated by a demimiter
Currently, I got the following code (assume a connected socket exists)
typedef unsigned char byte;
boost::system::error_code error;
boost::asio::streambuf buf;
std::string magic_word_s = {static_cast<char>(0xAF), static_cast<char>(0xFE),
static_cast<char>(0xC0), static_cast<char>(0xC2)};
ssize_t n = boost::asio::read_until(socket_, buf, magic_word_s, error);
if(error)
std::cerr << boost::system::system_error(error).what() << std::endl;
buf.consume(n);
n = boost::asio::read(socket_, buf, boost::asio::transfer_exactly(20);
const byte * p = boost::asio::buffer_cast<const byte>(buf.data());
uint32_t size_of_body = *((byte*)p);
unfortunately the documentation for read_until remarks:
After a successful read_until operation, the streambuf may contain additional data beyond the delimiter. An application will typically leave that data in the streambuf for a subsequent read_until operation to examine.
which means that I loose synchronization with the described protocol.
Is there an elegant way to solve this?
Well... as it says... you just "leave" it in the object, or temporary store it in another, and handle the whole message (below called 'packet') if it is complete.
I have a similar approach in one of my projects. I'll explain a little how I did it, that should give you a rough idea how you can handle the packets correctly.
In my Read-Handler (-callback) I keep checking if the packet is complete. The meta-data information (header for you) is temporary stored in a map associated with the remote-partner (map<RemoteAddress, InfoStructure>).
For example it can look like this:
4 byte identifier
4 byte message-length
n byte message
Handle incoming data, check if identifier + message-length are received already, continue to check if message-data is completed with received data.
Leave rest of the packet in the temporary buffer, erase old data.
Continue with handling when next packet arrives or check if received data completes next packet already...
This approach may sound a little slow, but I get even with SSL 10MB/s+ on a slow machine.
Without SSL much higher transfer-rates are possible.
With this approach, you may also take a look into read_some or its asynchronous version.

TCP Socket - read most recent data from input queue [duplicate]

I've been reading through Beej's Guide to Network Programming to get a handle on TCP connections. In one of the samples the client code for a simple TCP stream client looks like:
if ((numbytes = recv(sockfd, buf, MAXDATASIZE-1, 0)) == -1) {
perror("recv");
exit(1);
}
buf[numbytes] = '\0';
printf("Client: received '%s'\n", buf);
close(sockfd);
I've set the buffer to be smaller than the total number of bytes that I'm sending. I'm not quite sure how I can get the other bytes. Do I have to loop over recv() until I receive '\0'?
*Note on the server side I'm also implementing his sendall() function, so it should actually be sending everything to the client.
See also 6.1. A Simple Stream Server in the guide.
Yes, you will need multiple recv() calls, until you have all data.
To know when that is, using the return status from recv() is no good - it only tells you how many bytes you have received, not how many bytes are available, as some may still be in transit.
It is better if the data you receive somehow encodes the length of the total data. Read as many data until you know what the length is, then read until you have received length data. To do that, various approaches are possible; the common one is to make a buffer large enough to hold all data once you know what the length is.
Another approach is to use fixed-size buffers, and always try to receive min(missing, bufsize), decreasing missing after each recv().
The first thing you need to learn when doing TCP/IP programming: 1 write/send call might take
several recv calls to receive, and several write/send calls might need just 1 recv call to receive. And anything in-between.
You'll need to loop until you have all data. The return value of recv() tells you how much data you received. If you simply want to receive all data on the TCP connection, you can loop until recv() returns 0 - provided that the other end closes the TCP connection when it is done sending.
If you're sending records/lines/packets/commands or something similar, you need to make your own protocol over TCP, which might be as simple as "commands are delimited with \n".
The simple way to read/parse such a command would be to read 1 byte at a time, building up a buffer with the received bytes and check for a \n byte every time. Reading 1 byte is extremely inefficient, so you should read larger chunks at a time.
Since TCP is stream oriented and does not provide record/message boundaries it becomes a bit more tricky - you'd
have to recv a piece of bytes, check in the received buffer for a \n byte, if it's there - append the bytes to previously received bytes and output that message. Then check the remainder of the buffer after the \n - which might contain another whole message or just the start of another message.
Yes, you have to loop over recv() until you receive '\0' or an
error happen (negative value from recv) or 0 from recv().
For the first option: only if this zero is part of your
protocol (the server sends it). However from your code it seems that
the zero is just to be able to use the buffer content as a
C-string (on the client side).
The check for a return value of 0 from recv:
this means that the connection was closed (it could be part
of your protocol that this happens.)

Winsock2 tcp/ip - some data packets are ignored probably due to null terminator from the previous packet

I wrote a simple client-server program. Network.h is a header file which uses Winsock2.h (TCP/IP mode) to create socket, accept/connect in blocking mode, send/recv in non-blocking mode. I made it so that the function string TNetwork::Recv(int size) will return the string "Nothing" if it gets WSAWOULDBLOCK error (no data is received yet)
Here is my main function:
int main(){
string Ans;
TNetwork::StartUp(); //WSA start up, etc
cin >> Ans;
if (Ans == "0"){ // 0 --> server
TNetwork::SetupAsServer(); //accept connection (in blocking mode!)
while (true){
TNetwork::Send("\nAss" + '\0'); //without null terminator, the client may read extra bytes, causing undefined behavior (?)
TNetwork::Send("embly" + '\0');
cin >> Ans;
}
}
else{ // others --> regard Ans as IP address. e.g. I can type "127.0.0.1"
TNetwork::SetupAsClient(Ans);
string Rec;
while (true){
Rec = TNetwork::Recv(1000);
if (Rec != "Nothing"){
cout << Rec;
}
}
}
system("PAUSE");
}
Supposedly, the client would print "Assembly" when connected, and when the server enters anything to its console window. Sometimes, though, the client would only print out "\nAss" in the console without the "embly.
To my understanding, TCP/IP ensures all data to be sent and in the correct order, so I guess what happens is that both packets arrive at the same time, which happen quite often over the unstable internet. And due to this null terminator, the client would ignore the "embly", since the Recv() function stopped reading when it hits a null terminator.
So, how can I ensure that the client will always read all data packets correctly?
Yes, the network stack will send the data in the correct order and doesn't care what termination type you use. This has to do with how you're receiving and processing the data stream (note: not packets, stream). If you receive all 11 bytes and print it to the screen, the print function will stop when it reaches the zero, but the rest of the data is still there.
Note: since it's a stream, what happens if you received only 10 bytes of data from the stream? You need to scan what you receive for the zero to know if you've received a full "zero-terminated string" if that's how you want to communicate your data.
EDIT: Also, I don't think "\nAss" + '\0' is doing what you think it is. Instead of adding a 0 character to the end of the string (which already has one, by the way), it's adding 0 to your string pointer.
As #mark points out, TCP is all about streams, not packets. TCP takes care of ensuring that data is reliably transmitted from A to B and that the data is delivered to the consumer in the order in which it was transmitted. Yes, the data is packetized on the wire, but the TCP stack on the system takes those packets and builds the stream which it makes available to you through the recv() function. The TCP stack handles out-of-order data, missing data, and duplicated data such that by the time your application sees it, the stream is a mirror-copy of when the sender sent.
To properly receive TCP data, you will typically need some kind of loop that reads data from the socket when it becomes available. The way I normally do this is to have a thread that is dedicated to servicing the socket. In the thread function is a loop that reads data from the socket when it becomes available and is idle otherwise. This loop reads data into a buffer of, say, 1 KB. Once the data is received from the socket into this buffer, the buffer is copied to another thread for processing. In the thread function for the processing thread is a loop that receives the 1 KB buffers from the socket thread and adds them to the back end of a master buffer of, say, 1 MB. The processing thread then processes the messages out of this master buffer and makes them available to the application.
For a simple demo application, two threads may be overkill. The two threads I've described could be certainly be combined into one, but for my application, it is more efficient to have two threads and take advantage of the multiple cores on my system. The point is, if you're going to have a front-end UI, there's not going to be a way around using at least one thread and still have the UI be responsive.
One other thing. There are two commonly-used mechanisms for protocol design. You're using one, namely, a marker (e.g., a null terminator, etc.) to signal the begin/end of a message. I don't prefer this mechanism mainly because the marker may actually need to be part of the message at some point. The other mechanism is to have a header on each message that tells, at a minimum, how long the message is. I prefer this mechanism and include in my headers a sync word and the message type as well. For example,
struct Header
{
__int16 _sync; // a hex pattern, e.g., 0xABCD
__int16 _type;
__int32 _length;
}
That's a total of 8 bytes. So when processing from the master buffer, I read the first 8 bytes, verify the sync word, and get the length. I determine if there are 'length' bytes available in the master buffer. If not, I have to wait until the socket thread provides me more data before checking again. If so, I extract 'length' bytes from the master buffer and pass that to an object created according to the specified type, which knows how to interpret that particular message. Then repeat.
As I mentioned, I use a master buffer of 1 MB or so. As messages are processed, it is important to remove them from the master buffer so there is additional space available for new data on the back end. This involves simply copying the unprocessed data, if any, to the beginning of the buffer. In cases where data comes in faster than you can process it, the master buffer may need the ability to resize itself to accommodate the additional data.
I hope that's not overwhelming. Start simple and add as you go.

Socket Commuication with High frequency

I need to send data to another process every 0.02s.
The Server code:
//set socket, bind, listen
while(1){
sleep(0.02);
echo(newsockfd);
}
void echo (int sock)
{
int n;
char buffer[256]="abc";
n=send(sock,buffer,strlen(buffer),0);
if (n < 0) error("ERROR Sending");
}
The Client code:
//connect
while(1)
{
bzero(buffer,256);
n = read(sock,buffer,255);
printf("Recieved data:%s\n",buffer);
if (n < 0)
error("ERROR reading from socket");
}
The problem is that:
The client shows something like this:
Recieved data:abc
Recieved data:abcabcabc
Recieved data:abcabc
....
How does it happen? When I set sleep time:
...
sleep(2)
...
It would be ok:
Recieved data:abc
Recieved data:abc
Recieved data:abc
...
TCP sockets do not guarantee framing. When you send bytes over a TCP socket, those bytes will be received on the other end in the same order, but they will not necessarily be grouped the same way — they may be split up, or grouped together, or regrouped, in any way the operating system sees fit.
If you need framing, you will need to send some sort of packet header to indicate where each chunk of data starts and ends. This may take the form of either a delimiter (e.g, a \n or \0 to indicate where each chunk ends), or a length value (e.g, a number at the head of each chunk to denote how long it is).
Also, as other respondents have noted, sleep() takes an integer, so you're effectively not sleeping at all here.
sleep takes unsigned int as argument, so sleep(0.02) is actually sleep(0).
unsigned int sleep(unsigned int seconds);
Use usleep(20) instead. It will sleep in microseconds:
int usleep(useconds_t usec);
The OS is at liberty to buffer data (i.e. why not just send a full packet instead of multiple packets)
Besides sleep takes a unsigned integer.
The reason is that the OS is buffering data to be sent. It will buffer based on either size or time. In this case, you're not sending enough data, but you're sending it fast enough the OS is choosing to bulk it up before putting it on the wire.
When you add the sleep(2), that is long enough that the OS chooses to send a single "abc" before the next one comes in.
You need to understand that TCP is simply a byte stream. It has no concept of messages or sizes. You simply put bytes on the wire on one end and take them off on the other. If you want to do specific things, then you need to interpret the data special ways when you read it. Because of this, the correct solution is to create an actual protocol for this. That protocol could be as simple as "each 3 bytes is one message", or more complicated where you send a size prefix.
UDP may also be a good solution for you, depending on your other requirements.
sleep(0.02)
is effectively
sleep(0)
because argument is unsigned int, so implicit conversion does it for you. So you have no sleep at all here. You can use sleep(2) to sleep for 2 microseconds.Next, even if you had, there is no guarantee that your messages will be sent in a different frames. If you need this, you should apply some sort of delimiter, I have seen
'\0'
character in some implementation.
TCPIP stacks buffer up data until there's a decent amount of data, or until they decide that there's no more coming from the application and send what they've got anyway.
There are two things you will need to do. First, turn off Nagle's algorithm. Second, sort out some sort of framing mechanism.
Turning off Nagle's algorithm will cause the stack to "send data immediately", rather than waiting on the off chance that you'll be wanting to send more. It actually leads to less network efficiency because you're not filling up Ethernet frames, something to bare in mind on Gigabit where jumbo frames are required to get best throughput. But in your case timeliness is more important than throughput.
You can do your own framing by very simple means, eg by send an integer first that says how long the rest if the message will be. At the reader end you would read the integer, and then read that number of bytes. For the next message you'd send another integer saying how long that message is, etc.
That sort of thing is ok but not hugely robust. You could look at something like ASN.1 or Google Protocol buffers.
I've used Objective System's ASN.1 libraries and tools (they're not free) and they do a good job of looking after message integrity, framing, etc. They're good because they don't read data from a network connection one byte at a time so the efficiency and speed isn't too bad. Any extra data read is retained and included in the next message decode.
I've not used Google Protocol Buffers myself but it's possible that they have similar characteristics, and there maybe other similar serialisation mechanisms out there. I'd recommend avoiding XML serialisation for speed/efficiency reasons.

Handling partial return from recv() TCP in C

I've been reading through Beej's Guide to Network Programming to get a handle on TCP connections. In one of the samples the client code for a simple TCP stream client looks like:
if ((numbytes = recv(sockfd, buf, MAXDATASIZE-1, 0)) == -1) {
perror("recv");
exit(1);
}
buf[numbytes] = '\0';
printf("Client: received '%s'\n", buf);
close(sockfd);
I've set the buffer to be smaller than the total number of bytes that I'm sending. I'm not quite sure how I can get the other bytes. Do I have to loop over recv() until I receive '\0'?
*Note on the server side I'm also implementing his sendall() function, so it should actually be sending everything to the client.
See also 6.1. A Simple Stream Server in the guide.
Yes, you will need multiple recv() calls, until you have all data.
To know when that is, using the return status from recv() is no good - it only tells you how many bytes you have received, not how many bytes are available, as some may still be in transit.
It is better if the data you receive somehow encodes the length of the total data. Read as many data until you know what the length is, then read until you have received length data. To do that, various approaches are possible; the common one is to make a buffer large enough to hold all data once you know what the length is.
Another approach is to use fixed-size buffers, and always try to receive min(missing, bufsize), decreasing missing after each recv().
The first thing you need to learn when doing TCP/IP programming: 1 write/send call might take
several recv calls to receive, and several write/send calls might need just 1 recv call to receive. And anything in-between.
You'll need to loop until you have all data. The return value of recv() tells you how much data you received. If you simply want to receive all data on the TCP connection, you can loop until recv() returns 0 - provided that the other end closes the TCP connection when it is done sending.
If you're sending records/lines/packets/commands or something similar, you need to make your own protocol over TCP, which might be as simple as "commands are delimited with \n".
The simple way to read/parse such a command would be to read 1 byte at a time, building up a buffer with the received bytes and check for a \n byte every time. Reading 1 byte is extremely inefficient, so you should read larger chunks at a time.
Since TCP is stream oriented and does not provide record/message boundaries it becomes a bit more tricky - you'd
have to recv a piece of bytes, check in the received buffer for a \n byte, if it's there - append the bytes to previously received bytes and output that message. Then check the remainder of the buffer after the \n - which might contain another whole message or just the start of another message.
Yes, you have to loop over recv() until you receive '\0' or an
error happen (negative value from recv) or 0 from recv().
For the first option: only if this zero is part of your
protocol (the server sends it). However from your code it seems that
the zero is just to be able to use the buffer content as a
C-string (on the client side).
The check for a return value of 0 from recv:
this means that the connection was closed (it could be part
of your protocol that this happens.)