boost::asio async_read_some example code not reading all data in the socket - c++

I'm using the TCP echo example (1.62 is what is currently shipping in the main Ubuntu package).
https://www.boost.org/doc/libs/1_62_0/doc/html/boost_asio/example/cpp11/echo/async_tcp_echo_server.cpp
It works great for small things, you can see it has a buffer of 1024 and uses async_read_some.
But then I try to send it the Python string ("A"*4096)+("B"*4096)+("C"*4096)... I will see 4 calls to the read handler for 1024 bits each... i.e. it will print all the As but never any Bs or Cs.
Expected behavior: If there is 4096*3 data in the socket, subsequent calls to async_read_some should be pulling all the data out 1024 at a time??
One cannot use async_read in such an echo protocol, because variable data is passed over the wire. The problem is async_read_some is ignored/deleting data that is still to be read from the socket.
How to fix the example code?

I took that sample and ran it with your alleged client code:
#!/usr/bin/env python
import socket
TCP_IP = '127.0.0.1'
TCP_PORT = 6767
BUFFER_SIZE = 1024
MESSAGE = ("A"*4096)+("B"*4096)+("C"*4096);
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))
s.send(MESSAGE)
received = "";
while (len(received) < len(MESSAGE)):
data = s.recv(BUFFER_SIZE)
print "received data: %d bytes ending in ...%s" % (len(data), data[-10:])
received += data
s.close()
It correctly runs and prints
 sehe  ~  Projects  stackoverflow  ./sotest 6767&
 sehe  ~  Projects  stackoverflow  python ./test.py
received data: 1024 bytes ending in ...AAAAAAAAAA
received data: 1024 bytes ending in ...AAAAAAAAAA
received data: 1024 bytes ending in ...AAAAAAAAAA
received data: 1024 bytes ending in ...AAAAAAAAAA
received data: 1024 bytes ending in ...BBBBBBBBBB
received data: 1024 bytes ending in ...BBBBBBBBBB
received data: 1024 bytes ending in ...BBBBBBBBBB
received data: 1024 bytes ending in ...BBBBBBBBBB
received data: 1024 bytes ending in ...CCCCCCCCCC
received data: 1024 bytes ending in ...CCCCCCCCCC
received data: 1024 bytes ending in ...CCCCCCCCCC
received data: 1024 bytes ending in ...CCCCCCCCCC
So you're doing something wrong.
Expected behavior: If there is 4096*3 data in the socket, subsequent calls to async_read_some should be pulling all the data out 1024 at a time??
Yes. This is exactly what happens. Mind you, you should not assume the blocks "arrive" in 1024 blocks. They could happen to arrive in smaller chunks depending on the buffering in intermediate OS/network layers. IOW: TCP is a stream protocol and packeting is an implementation detail you should not usually depend on¹
One cannot use async_read in such an echo protocol, because variable data is passed over the wire.
Data is always variable (otherwise there would be no reason to send it). async_read can always be used where read can be, because it's merely the asynchronous IO version of the same function.
¹ using various advanced techniques/flags you can somewhat control these effects but they're partly platform dependent and nearly always operate with margins giving the OS/network layers leeway to optimize network performance.

Related

C++ nonblocking sockets - wait for all recv data

I wasn't running into this problem on my local system (of course), but now that I am setting up a virtual server, I am having some issues with a part of my code.
In order to receive all data from a nonblocking TCP recv(), I have this function
ssize_t Server::recvAll(int sockfd, const void *buf, size_t len, int flags) {
// just showing here that they are non-blocking sockets
u_long iMode=1;
ioctlsocket(sockfd,FIONBIO,&iMode);
ssize_t result;
char *pbuf = (char *)buf;
while ( len > 0 ) {
result = recv(sockfd,pbuf,len,flags);
printf("\tRES: %d", result);
if ( result <= 0 ) break;
pbuf += result;
len -= result;
}
return result;
}
I noticed that recvAll will usually print RES: 1024 (1024 being the amount of bytes I'm sending) and it works great. But less frequently, there is data loss and it prints only RES: 400 (where 400 is some number greater than 0 and less than 1024) and my code does not work, as it expects all 1024 bytes.
I tried also printing WSAGetLastError() and also running in debug, but it looks like the program runs slow enough due to the print/debug that I don't come across this issue.
I assume this function works great for blocking sockets, but not non-blocking sockets.
Any suggestions on measurements I can take to make sure that I do receive all 1024 bytes without data loss on non-blocking sockets?
If you use non-blocking mode then you read all data that has already arrived to the system. Once you read out all data recv returns error and reason is depending on system:
EWOULDBLOCK (in posix system)
WSAEWOULDBLOCK in windows sockets system
Once you receive this error you need to wait arrival of another data. You can do it in several ways:
Wait with special function like select/poll/epoll
Sleep some time and try to recv again (user-space polling)
If you need to reduce delay select/poll/epoll is preferable. Sleep is much more simple to implement.
Also you need consider that TCP is stream protocol and does NOT keep framing. This means that you can send, for example, 256 bytes then another 256 bytes but receive 512 bytes at once. This also true in opposite way: you may send 512 bytes at once and receive 256 bytes with first read and another 256 bytes in next read.

TCP Socket - read most recent data from input queue [duplicate]

I've been reading through Beej's Guide to Network Programming to get a handle on TCP connections. In one of the samples the client code for a simple TCP stream client looks like:
if ((numbytes = recv(sockfd, buf, MAXDATASIZE-1, 0)) == -1) {
perror("recv");
exit(1);
}
buf[numbytes] = '\0';
printf("Client: received '%s'\n", buf);
close(sockfd);
I've set the buffer to be smaller than the total number of bytes that I'm sending. I'm not quite sure how I can get the other bytes. Do I have to loop over recv() until I receive '\0'?
*Note on the server side I'm also implementing his sendall() function, so it should actually be sending everything to the client.
See also 6.1. A Simple Stream Server in the guide.
Yes, you will need multiple recv() calls, until you have all data.
To know when that is, using the return status from recv() is no good - it only tells you how many bytes you have received, not how many bytes are available, as some may still be in transit.
It is better if the data you receive somehow encodes the length of the total data. Read as many data until you know what the length is, then read until you have received length data. To do that, various approaches are possible; the common one is to make a buffer large enough to hold all data once you know what the length is.
Another approach is to use fixed-size buffers, and always try to receive min(missing, bufsize), decreasing missing after each recv().
The first thing you need to learn when doing TCP/IP programming: 1 write/send call might take
several recv calls to receive, and several write/send calls might need just 1 recv call to receive. And anything in-between.
You'll need to loop until you have all data. The return value of recv() tells you how much data you received. If you simply want to receive all data on the TCP connection, you can loop until recv() returns 0 - provided that the other end closes the TCP connection when it is done sending.
If you're sending records/lines/packets/commands or something similar, you need to make your own protocol over TCP, which might be as simple as "commands are delimited with \n".
The simple way to read/parse such a command would be to read 1 byte at a time, building up a buffer with the received bytes and check for a \n byte every time. Reading 1 byte is extremely inefficient, so you should read larger chunks at a time.
Since TCP is stream oriented and does not provide record/message boundaries it becomes a bit more tricky - you'd
have to recv a piece of bytes, check in the received buffer for a \n byte, if it's there - append the bytes to previously received bytes and output that message. Then check the remainder of the buffer after the \n - which might contain another whole message or just the start of another message.
Yes, you have to loop over recv() until you receive '\0' or an
error happen (negative value from recv) or 0 from recv().
For the first option: only if this zero is part of your
protocol (the server sends it). However from your code it seems that
the zero is just to be able to use the buffer content as a
C-string (on the client side).
The check for a return value of 0 from recv:
this means that the connection was closed (it could be part
of your protocol that this happens.)

TCP memcpy buffer returns rubbish data using C++

I'm doing something similar to Stack Overflow question Handling partial return from recv() TCP in C.
The data receive is bigger than the buffer initialised (for example, 1000 bytes). Therefore a temporary buffer of a bigger size (for example, 10000 bytes) is used. The problem is that the multiple data received is rubbish. I've already checked the offset to memcpy to the temporary buffer, but I keep receiving rubbish data.
This sample shows what I do:
First message received:
memcpy(tmpBuff, dataRecv, 1000);
offSet = offSet + 1000;
Second msg onwards:
memcpy(tmpBuffer + offSet, dataRecv, 1000);
Is there something I should check?
I've checked the TCP hex that was sent out. Apparently, the sender is sending an incomplete message. How my program works is that when the sender sends the message, it will pack (message header + actual message). the message header has some meta data, and one of it is the message length.
When the receiver receives the packet, it will get the message header using the message header offset and message header length. It will extract the message length, check if the current packet size is more than or equal to the message length and return the correct message size to the users. If there's a remaining amount of message left in the packet, it will store it into a temporary buffer and wait to receive the next packet. When it receives the next packet, it will check the message header for the message length and do the same thing.
If the sender pack three messages in a packet, each message have its own message header indicating the message length. Assume all three messages are 300 bytes each in length. Also assume that the second message sent is incomplete and turns out to be only 100 bytes.
When the receiver receives the three messages in a packet, it will return the first message correctly. Since the second message is incomplete, my program wouldn't know, and so it will return 100 bytes from the second message and 200 bytes from the third message since the message header indicates the total size is 300 bytes. Thus the second message returned will have some rubbish data.
As for the third message, my program will try to get the message length from the message header. Since the first 200 bytes are already returned, the message header is invalid. Thus, the message length returned to my program will be rubbish as well. Is there a way to check for a complete message?
Suppose you are expecting 7000 bytes over the tcp connection. In this case it is very likely that your messages will be split into tcp packets with an actual payload size of let's say 1400 bytes (so 5 messages).
In this case it is perfectly possible consecutive recv calls with a target buffer of 1000 bytes will behave as follows:
recv -> reads 1000 bytes (packet 1)
recv -> reads 400 bytes (packet 1)
recv -> reads 1000 bytes (packet 2)
recv -> reads 400 bytes (packet 2)
...
Now, in this case, when reading the 400 bytes packet you still copy the full 1000 bytes to your larger buffer, actually pasting 600 bytes of rubbish in between. You should actually only memcpy the number of bytes received, which is the return value of recv itself. Of course you should also check if this value is 0 (socket closed) or less than zero (socket error).

Reading data from socket using read function

I am trying to read data using the following code from a socket:
n = read(fd, buffer, 50000);
The question is: when the data from the web server is larger than the tcp package size, these data will be splited into multi packages. In this case, will read function just read one data package from fd, or it will read all the packages from fd?
Note that read function is called only once.
Because you are using TCP, your socket is of type SOCK_STREAM. A SOCK_STREAM socket is a byte stream and does not maintain packet boundaries, so the call to read() or recv() will read data that came from multiple packets if multiple packets of data have been received and there is sufficient space in your buffer. It may also return data from a portion of a packet if your buffer if not large enough to hold all of the data. The next read() will continue reading from the next byte.
The function read receives at maximum the specified count of bytes, in your example 50000.
When the function read returns, you need to check the return value. The actual number of bytes written to buffer is in your variable n.

Handling partial return from recv() TCP in C

I've been reading through Beej's Guide to Network Programming to get a handle on TCP connections. In one of the samples the client code for a simple TCP stream client looks like:
if ((numbytes = recv(sockfd, buf, MAXDATASIZE-1, 0)) == -1) {
perror("recv");
exit(1);
}
buf[numbytes] = '\0';
printf("Client: received '%s'\n", buf);
close(sockfd);
I've set the buffer to be smaller than the total number of bytes that I'm sending. I'm not quite sure how I can get the other bytes. Do I have to loop over recv() until I receive '\0'?
*Note on the server side I'm also implementing his sendall() function, so it should actually be sending everything to the client.
See also 6.1. A Simple Stream Server in the guide.
Yes, you will need multiple recv() calls, until you have all data.
To know when that is, using the return status from recv() is no good - it only tells you how many bytes you have received, not how many bytes are available, as some may still be in transit.
It is better if the data you receive somehow encodes the length of the total data. Read as many data until you know what the length is, then read until you have received length data. To do that, various approaches are possible; the common one is to make a buffer large enough to hold all data once you know what the length is.
Another approach is to use fixed-size buffers, and always try to receive min(missing, bufsize), decreasing missing after each recv().
The first thing you need to learn when doing TCP/IP programming: 1 write/send call might take
several recv calls to receive, and several write/send calls might need just 1 recv call to receive. And anything in-between.
You'll need to loop until you have all data. The return value of recv() tells you how much data you received. If you simply want to receive all data on the TCP connection, you can loop until recv() returns 0 - provided that the other end closes the TCP connection when it is done sending.
If you're sending records/lines/packets/commands or something similar, you need to make your own protocol over TCP, which might be as simple as "commands are delimited with \n".
The simple way to read/parse such a command would be to read 1 byte at a time, building up a buffer with the received bytes and check for a \n byte every time. Reading 1 byte is extremely inefficient, so you should read larger chunks at a time.
Since TCP is stream oriented and does not provide record/message boundaries it becomes a bit more tricky - you'd
have to recv a piece of bytes, check in the received buffer for a \n byte, if it's there - append the bytes to previously received bytes and output that message. Then check the remainder of the buffer after the \n - which might contain another whole message or just the start of another message.
Yes, you have to loop over recv() until you receive '\0' or an
error happen (negative value from recv) or 0 from recv().
For the first option: only if this zero is part of your
protocol (the server sends it). However from your code it seems that
the zero is just to be able to use the buffer content as a
C-string (on the client side).
The check for a return value of 0 from recv:
this means that the connection was closed (it could be part
of your protocol that this happens.)