How to beat delays in UDP client - c++

I'm trying to write a UDP client App, which receives some control packets(length 52-104 bytes) from a server fragmented to datagrams of size 1-4 bytes each (Why this is not a big packet and is fragmented instead? That's a mystery to me...).
I created a thread, and in this thread I used a typical recvfrom example from MS. The received data from the small buffer I append to string to recreate the packet (If the packet is too big, the string would be cleared).
My problem is the latency:
The inbound packets are changed, but the data from the buffer and the string hasn't changed during the minute or more. I tried to use a circular buffer instead of a string, but it has no effect on the latency.
So, what am I doing wrong and how do I receive a fragmented UDP packet in a proper way?
I don't have the original sender code, so i'm attaching a part of my sender emulator. As you can see, the original data string (mSendString) is fragmented to some four-bytes packets and sent to the net. When the data string has changed on sender side, the data on receiver side hasn't changed in aceptable time, it changed a few minutes later.
UdpClient mSendClient = new UdpClient();
string mSendString = "head,data,data,data,data,data,data,data,chksumm\n";//Control string
public static void SendCallback(IAsyncResult ar)
{
UdpClient u = (UdpClient)ar.AsyncState;
mMsgSent = true;
}
public void Send()
{
while (!mThreadStop)
{
if (!mSendStop)
{
for (int i = 0; i < mSendString.Length; i+=4)
{
Byte[] sendBytes = new Byte[4];
Encoding.ASCII.GetBytes(mSendString,i,4,sendBytes,0);
mSendClient.BeginSend(sendBytes, 1, mEndPoint, new AsyncCallback(SendCallback), mSendClient);
}
}
Thread.Sleep(100);
}
}

I was wrong when I asked this question in some points:
First,the wrong terms - the string was chopped/sliced/divided into
four bytes packets, not fragmented.
Second, I was thought, that too
much small UDP packets are the cause of latency in my app, but when I
ran my UDP receive code separately from other app code, I found this
UDP receive code is working without latency.
Seems like there are threading problems, not UDP sockets.

Related

What means blocking for boost::asio::write?

I'm using boost::asio::write() to write data from a buffer to a com-Port. It's a serial port with a baud rate 115200 which means (as far as my understanding goes) that I can write effectively 11520 byte/s or 11,52KB/s data to the socket.
Now I'm having a quite big chunk of data (10015 bytes) which i want to write. I think that this should take little less than a second to really write on the port. But boost::asio::write() returns already 300 microseconds after the call with the transferred bytes 10015. I think this is impossible with that baud rate?
So my question is what is it actually doing? Really writing it to the port, or just some other kind of buffer maybe, which later writes it to the port.
I'd like the write() to only return after all the bytes have really been written to the port.
EDIT with code example:
The problem is that i always run into the timeout for the future/promise because it takes alone more than 100ms to send the message, but I think the timer should only start after the last byte is sent. Because write() is supposed to block?
void serial::write(std::vector<uint8_t> message) {
//create new promise for the request
promise = new boost::promise<deque<uint8_t>>;
boost::unique_future<deque<uint8_t>> future = promise->get_future();
// --- Write message to serial port --- //
boost::asio::write(serial_,boost::asio::buffer(message));
//wait for data or timeout
if (future.wait_for(boost::chrono::milliseconds(100))==boost::future_status::timeout) {
cout << "ACK timeout!" << endl;
//delete pointer and set it to 0
delete promise;
promise=nullptr;
}
//delete pointer and set it to 0 after getting a message
delete promise;
promise=nullptr;
}
How can I achieve this?
Thanks!
In short, boost::asio::write() blocks until all data has been written to the stream; it does not block until all data has been transmitted. To wait until data has been transmitted, consider using tcdrain().
Each serial port has both a receive and transmit buffer within kernel space. This allows the kernel to buffer received data if a process cannot immediately read it from the serial port, and allows data written to a serial port to be buffered if the device cannot immediately transmit it. To block until the data has been transmitted, one could use tcdrain(serial_.native_handle()).
These kernel buffers allow for the write and read rates to exceed that of the transmit and receive rates. However, while the application may write data at a faster rate than the serial port can transmit, the kernel will transmit at the appropriate rates.

How to receive more than 40Kb in C++ socket using read()

I am developing a client-server application (TCP) in Linux using C++. This application is in charge of testing the network performance.
The connection between client and server is established only once, and then data are transmitted/received using write()/read() with an own-defined protocol.
When data exceeds 40Kb I receive just a part of the data only once. (i.e. I receive about 48KB)
Please find down the relevant part of the code:
while (1) {
servMtx.lock();
...
serv_bytes = (byte *) malloc(size_bytes);
n = read(newsockfd, serv_bytes,size_bytes);
if (n != (int)size_bytes ) {
std::cerr << "No enough data available for msg. Received just: " << n << std::endl;
continue;
}
receivedBytes += n + size_header_bytes + sizeof(ssize_t);
....
}
I increased the kernel buffer size to become 1MB using:
int buffsize = 1024*1024;
setsockopt(newsockfd, SOL_SOCKET, SO_RCVBUF, &buffsize, sizeof(buffsize));
and modified sysctl variables too:
sysctl -w net.core.rmem_max=8388608;
sysctl -w net.core.wmem_max=8388608;
as mentioned on this How to recive more than 65000 bytes in C++ socket using recv() but nothing was changed. Also, I tried to change the package size to no avail.
You should read or recv in several chunks (in general; if you are unlucky, the "several" becomes "one"). So you need to manage your buffering and keep (and use) the count of received bytes.
So at some point, you'll code
int nbrecv = recv(s, buffer + off, bufsize, 0);
if (nbrec>0) { off += nbrecv; bufsize -= nbrecv; }
and you probably should do that in your event loop (often around poll(2)...). And it does happen that nbrec is a lot less than bufsize and you should be handling that common case.
TCP does not guarantee that you'll get all the bytes in the same recv! It could depend on external factors (routing, network hardware, ...); it is a stream-oriented protocol, not a message-packet one. If your application wants messages it should buffer the input and chunk that input into messages according to the content. Look at HTTP or SMTP: their message have a well defined boundary given by header information (Content-Length: in HTTP) or by ending convention (line with a single . in SMTP).
Please read carefully read(2), recv(2), socket(7), tcp(7), some sockets tutorial, Advanced Linux Programming.

Server programming in C++

I'd like to make a chatting program using win socket in c/c++. (I am totally newbie.)
The first question is about how to check if the client receives packets from server.
For instance, a server sends "aaaa" to a client.
And if the client doesn't receive packet "aaaa", the server should re-send the packet again.(I think). However, I don't know how to check it out.
Here is my thought blow.
First case.
Server --- "aaaa" ---> Client.
Server will be checking a sort of time waiting confirm msg from the client.
Client --- "I received it" ---> Server.
Server won't re-send the packet.
The other case.
Server --- "aaaa" ---> Client.
Server is waiting for client msg until time out
Server --- "aaaa" ---> Client again.
But these are probably inappropriate.
Look at second case. Server is waiting a msg from client for a while.
And if time's out, server will re-send a packet again.
In this case, client might receive the packet twice.
Second question is how to send unlimited size packet.
A book says packet should have a type, size, and msg.
Following it, I can only send msg with the certain size.
But i want to send msg like 1Mbytes or more.(unlimited)
How to do that?
Anyone have any good link or explain correct logic to me as easy as possible.
Thanks.
Use TCP. Think "messages" at the application level, not packets.
TCP already handles network-level packet data, error checking & resending lost packets. It presents this to the application as a "stream" of bytes, but without necessarily guaranteed delivery (since either end can be forcibly disconnected).
So at the application level, you need to handle Message Receipts & buffering -- with a re-connecting client able to request previous messages, which they hadn't (yet) correctly received.
Here are some data structures:
class or struct Message {
int type; // const MESSAGE.
int messageNumber; // sequentially incrementing.
int size; // 4 bytes, probably signed; allows up to 2GB data.
byte[] data;
}
class or struct Receipt {
int type; // const RECEIPT.
int messageNumber; // last #, successfully received.
}
You may also want a Connect/ Hello and perhaps a Disconnect/ Goodbye handshake.
class Connect {
int type; // const CONNECT.
int lastReceivedMsgNo; // last #, successfully received.
// plus, who they are?
short nameLen;
char[] name;
}
etc.
If you can be really simple & don't need to buffer/ re-send messages to re-connecting clients, it's even simpler.
You could also adopt a "uniform message structure" which had TYPE and SIZE (4-byte int) as the first two fields of every message or handshake. This might help standardize your routines for handling these, at the expense of some redundancy (eg in 'name' field-sizes).
For first part, have a look over TCP.
It provides a ordered and reliable packet transfer. Plus you can have lot of customizations in it by implementing it yourself using UDP.
Broadly, what it does is,
Server:
1. Numbers each packet and sends it
2. Waits for acknowledge of a specific packet number. And then re-transmits the lost packets.
Client:
1. Receives a packet and maintains a buffer (sliding window)
2. It keeps on collecting packets in buffer until the buffer overflows or a wrong sequenced packet arrives. As soon as it happens, the packets with right sequence are 'delivered', and the sequence number of last correct packet is send with acknowledgement.
For second part:
I would use HTTP for it.
With some modifications. Like you should have some very unique indicator to tell client that transmission is complete now, etc

Server's NonBlocking TCP socket taking time to stream content

Problem
- I am working on a Streaming server & created a nonblocking socket using:
flag=fcntl(m_fd,F_GETFL);
flag|=O_NONBLOCK;
fcntl(m_fd,F_SETFL,flag);
Server then sends the Media file contents using code:
bool SendData(const char *pData,long nSize)
{
int fd=m_pSock->get_fd();
fd_set write_flag;
while(1)
{
FD_ZERO(&write_flag);
FD_SET(fd,&write_flag);
struct timeval tout;
tout.tv_sec=0;
tout.tv_usec=500000;
int res=select(fd+1,0,&write_flag,0,&tout);
if(-1==res)
{
print("select() failure\n");
return false;
}
if(1==res)
{
unsigned long sndLen=0;
if(!m_pSock->send(pData,nSize,&sndLen))
{
print(socket send() failure\n");
return false;
}
nSize-=sndLen;
if(!nSize)
return true; //everything is sent
}
}
}
Using above code, I am streaming a say 200sec audio file, which I expect that Server should stream it in 2-3secs using full n/w available bandwidth(Throttle off), but the problem is that Server is taking 199~200secs to stream full contents.
While debugging, I commented the
m_pSock->send()
section & tried to dump the file locally. It takes 1~2secs to dump the file.
Questions
- If I am using a NonBlocking TCP socket, why does send() taking so much time?
Since the data is always available, select() will return immediately (as we have seen while dumping the file). Does that mean send() is affected by the recv() on the client side?
Any inputs on this would be helpul. Client behavior is not in our scope.
Your client is probably doing some buffering to avoid network jitter, but it is likely still playing the audio file in real time. So, the file transfer rate is matched to the rate that the client is consuming the data. Since it is a 200 second audio file, it will take about 200 seconds to complete the transfer.
Because TCP output and input buffers are propably much smaller than the audio file, reading speed of the receiving application can slow down the sending speed.
When both the TCP output buffer of sender and the input buffer of receiver are both full, TCP stack of the sender is not able to receive any data from the sender. So sending will be blocked, until there is space.
If the receiver reads the TCP stream same speed as data is needed for playing. Then the transfer takes about 200 seconds. Or little bit less.
This can be avoided by using application layer buffering in the receiving end.
The problem could be that if the client side is using blocking TCP, plus is processing all the data on a single thread with no no buffer/queue etc right through to the "player" of the file, then your side being non-blocking will only speed things until you reach the point where the TCP/IP protocol stack buffers, NIC buffers etc are full. Then you will ultimately still only be able to send data as fast as the client side is consuming it. Remember TCP is a reliable, point-to-point protocol.
Where does your client code come from in your testing? Is it some sort of simple test client someone has written?

Calculating socket upload speed

I'm wondering if anyone knows how to calculate the upload speed of a Berkeley socket in C++. My send call isn't blocking and takes 0.001 seconds to send 5 megabytes of data, but takes a while to recv the response (so I know it's uploading).
This is a TCP socket to a HTTP server and I need to asynchronously check how many bytes of data have been uploaded / are remaining. However, I can't find any API functions for this in Winsock, so I'm stumped.
Any help would be greatly appreciated.
EDIT: I've found the solution, and will be posting as an answer as soon as possible!
EDIT 2: Proper solution added as answer, will be added as solution in 4 hours.
I solved my issue thanks to bdolan suggesting to reduce SO_SNDBUF. However, to use this code you must note that your code uses Winsock 2 (for overlapped sockets and WSASend). In addition to this, your SOCKET handle must have been created similarily to:
SOCKET sock = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, WSA_FLAG_OVERLAPPED);
Note the WSA_FLAG_OVERLAPPED flag as the final parameter.
In this answer I will go through the stages of uploading data to a TCP server, and tracking each upload chunk and it's completion status. This concept requires splitting your upload buffer into chunks (minimal existing code modification required) and uploading it piece by piece, then tracking each chunk.
My code flow
Global variables
Your code document must have the following global variables:
#define UPLOAD_CHUNK_SIZE 4096
int g_nUploadChunks = 0;
int g_nChunksCompleted = 0;
WSAOVERLAPPED *g_pSendOverlapped = NULL;
int g_nBytesSent = 0;
float g_flLastUploadTimeReset = 0.0f;
Note: in my tests, decreasing UPLOAD_CHUNK_SIZE results in increased upload speed accuracy, but decreases overall upload speed. Increasing UPLOAD_CHUNK_SIZE results in decreased upload speed accuracy, but increases overall upload speed. 4 kilobytes (4096 bytes) was a good comprimise for a file ~500kB in size.
Callback function
This function increments the bytes sent and chunks completed variables (called after a chunk has been completely uploaded to the server)
void CALLBACK SendCompletionCallback(DWORD dwError, DWORD cbTransferred, LPWSAOVERLAPPED lpOverlapped, DWORD dwFlags)
{
g_nChunksCompleted++;
g_nBytesSent += cbTransferred;
}
Prepare socket
Initially, the socket must be prepared by reducing SO_SNDBUF to 0.
Note: In my tests, any value greater than 0 will result in undesirable behaviour.
int nSndBuf = 0;
setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (char*)&nSndBuf, sizeof(nSndBuf));
Create WSAOVERLAPPED array
An array of WSAOVERLAPPED structures must be created to hold the overlapped status of all of our upload chunks. To do this I simply:
// Calculate the amount of upload chunks we will have to create.
// nDataBytes is the size of data you wish to upload
g_nUploadChunks = ceil(nDataBytes / float(UPLOAD_CHUNK_SIZE));
// Overlapped array, should be delete'd after all uploads have completed
g_pSendOverlapped = new WSAOVERLAPPED[g_nUploadChunks];
memset(g_pSendOverlapped, 0, sizeof(WSAOVERLAPPED) * g_nUploadChunks);
Upload data
All of the data that needs to be send, for example purposes, is held in a variable called pszData. Then, using WSASend, the data is sent in blocks defined by the constant, UPLOAD_CHUNK_SIZE.
WSABUF dataBuf;
DWORD dwBytesSent = 0;
int err;
int i, j;
for(i = 0, j = 0; i < nDataBytes; i += UPLOAD_CHUNK_SIZE, j++)
{
int nTransferBytes = min(nDataBytes - i, UPLOAD_CHUNK_SIZE);
dataBuf.buf = &pszData[i];
dataBuf.len = nTransferBytes;
// Now upload the data
int rc = WSASend(sock, &dataBuf, 1, &dwBytesSent, 0, &g_pSendOverlapped[j], SendCompletionCallback);
if ((rc == SOCKET_ERROR) && (WSA_IO_PENDING != (err = WSAGetLastError())))
{
fprintf(stderr, "WSASend failed: %d\n", err);
exit(EXIT_FAILURE);
}
}
The waiting game
Now we can do whatever we wish while all of the chunks upload.
Note: the thread which called WSASend must be regularily put into an alertable state, so that our 'transfer completed' callback (SendCompletionCallback) is dequeued out of the APC (Asynchronous Procedure Call) list.
In my code, I continuously looped until g_nUploadChunks == g_nChunksCompleted. This is to show the end-user upload progress and speed (can be modified to show estimated completion time, elapsed time, etc.)
Note 2: this code uses Plat_FloatTime as a second counter, replace this with whatever second timer your code uses (or adjust accordingly)
g_flLastUploadTimeReset = Plat_FloatTime();
// Clear the line on the screen with some default data
printf("(0 chunks of %d) Upload speed: ???? KiB/sec", g_nUploadChunks);
// Keep looping until ALL upload chunks have completed
while(g_nChunksCompleted < g_nUploadChunks)
{
// Wait for 10ms so then we aren't repeatedly updating the screen
SleepEx(10, TRUE);
// Updata chunk count
printf("\r(%d chunks of %d) ", g_nChunksCompleted, g_nUploadChunks);
// Not enough time passed?
if(g_flLastUploadTimeReset + 1 > Plat_FloatTime())
continue;
// Reset timer
g_flLastUploadTimeReset = Plat_FloatTime();
// Calculate how many kibibytes have been transmitted in the last second
float flByteRate = g_nBytesSent/1024.0f;
printf("Upload speed: %.2f KiB/sec", flByteRate);
// Reset byte count
g_nBytesSent = 0;
}
// Delete overlapped data (not used anymore)
delete [] g_pSendOverlapped;
// Note that the transfer has completed
Msg("\nTransfer completed successfully!\n");
Conclusion
I really hope this has helped somebody in the future who has wished to calculate upload speed on their TCP sockets without any server-side modifications. I have no idea how performance detrimental SO_SNDBUF = 0 is, although I'm sure a socket guru will point that out.
You can get a lower bound on the amount of data received and acknowledged by subtracting the value of the SO_SNDBUF socket option from the number of bytes you have written to the socket. This buffer may be adjusted using setsockopt, although in some cases the OS may choose a length smaller or larger than you specify, so you must re-check after setting it.
To get more precise than that, however, you must have the remote side inform you of progress, as winsock does not expose an API to retrieve the amount of data currently pending in the send buffer.
Alternately, you could implement your own transport protocol on UDP, but implementing rate control for such a protocol can be quite complex.
Since you don't have control over the remote side, and you want to do it in the code, I'd suggest doing very simple approximation. I assume a long living program/connection. One-shot uploads would be too skewed by ARP, DNS lookups, socket buffering, TCP slow start, etc. etc.
Have two counters - length of the outstanding queue in bytes (OB), and number of bytes sent (SB):
increment OB by number of bytes to be sent every time you enqueue a chunk for upload,
decrement OB and increment SB by the number returned from send(2) (modulo -1 cases),
on a timer sample both OB and SB - either store them, log them, or compute running average,
compute outstanding bytes a second/minute/whatever, same for sent bytes.
Network stack does buffering and TCP does retransmission and flow control, but that doesn't really matter. These two counters will tell you the rate your app produces data with, and the rate it is able to push it to the network. It's not the method to find out the real link speed, but a way to keep useful indicators about how good the app is doing.
If data production rate is bellow the network output rate - everything is fine. If it's the other way around and the network cannot keep up with the app - there's a problem - you need either faster network, slower app, or different design.
For one-time experiments just take periodic snapshots of netstat -sp tcp output (or whatever that is on Windows) and calculate the send-rate manually.
Hope this helps.
If your app uses packet headers like
0001234DT
where 000123 is the packet length for a single packet, you can consider using MSG_PEEK + recv() to get the length of the packet before you actually read it with recv().
The problem is send() is NOT doing what you think - it is buffered by the kernel.
getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket send buffer = %d\n", now(), flag);
sz=sizeof(int);
ERR_CHK(getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket recv buffer = %d\n", now(), flag);
See what these show for you.
When you recv on a NON-blocking socket that has data, it normally does not have MB of data parked in the buufer ready to recv. Most of what I have experienced is that the socket has ~1500 bytes of data per recv. Since you are probably reading on a blocking socket it takes a while for the recv() to complete.
Socket buffer size is the probably single best predictor of socket throughput. setsockopt() lets you alter socket buffer size, up to a point. Note: these buffers are shared among sockets in a lot of OSes like Solaris. You can kill performance by twiddling these settings too much.
Also, I don't think you are measuring what you think you are measuring. The real efficiency of send() is the measure of throughput on the recv() end. Not the send() end.
IMO.