I'm trying to write a UDP client App, which receives some control packets(length 52-104 bytes) from a server fragmented to datagrams of size 1-4 bytes each (Why this is not a big packet and is fragmented instead? That's a mystery to me...).
I created a thread, and in this thread I used a typical recvfrom example from MS. The received data from the small buffer I append to string to recreate the packet (If the packet is too big, the string would be cleared).
My problem is the latency:
The inbound packets are changed, but the data from the buffer and the string hasn't changed during the minute or more. I tried to use a circular buffer instead of a string, but it has no effect on the latency.
So, what am I doing wrong and how do I receive a fragmented UDP packet in a proper way?
I don't have the original sender code, so i'm attaching a part of my sender emulator. As you can see, the original data string (mSendString) is fragmented to some four-bytes packets and sent to the net. When the data string has changed on sender side, the data on receiver side hasn't changed in aceptable time, it changed a few minutes later.
UdpClient mSendClient = new UdpClient();
string mSendString = "head,data,data,data,data,data,data,data,chksumm\n";//Control string
public static void SendCallback(IAsyncResult ar)
{
UdpClient u = (UdpClient)ar.AsyncState;
mMsgSent = true;
}
public void Send()
{
while (!mThreadStop)
{
if (!mSendStop)
{
for (int i = 0; i < mSendString.Length; i+=4)
{
Byte[] sendBytes = new Byte[4];
Encoding.ASCII.GetBytes(mSendString,i,4,sendBytes,0);
mSendClient.BeginSend(sendBytes, 1, mEndPoint, new AsyncCallback(SendCallback), mSendClient);
}
}
Thread.Sleep(100);
}
}
I was wrong when I asked this question in some points:
First,the wrong terms - the string was chopped/sliced/divided into
four bytes packets, not fragmented.
Second, I was thought, that too
much small UDP packets are the cause of latency in my app, but when I
ran my UDP receive code separately from other app code, I found this
UDP receive code is working without latency.
Seems like there are threading problems, not UDP sockets.
I wrote a simple tcp/ip connection between a client and a server in localhost in c++. It sends over tcp/ip an array of unsigned char. The size of this array is the following:
unsigned char *bytes = (unsigned char*)malloc(sizeof(unsigned char)*96000000);
//array is filled
However when I write on the socket
n = write(sockfd,bytes,96000000);
if(n<0){
cout << "error writing"<< endl;
exit(1);
} else{
cout << "bytes written " << n <<endl;
}
the number of bytes written (the n variable) that the standard output prints out is 5196978 and not 96000000 as I expected. Why? is there a limit in the number of bytes that I can write in a TCP /IP connection? How can I solve this problem?
is there a limit in the number of bytes that I can write in a TCP /IP connection? How can I solve this problem?
Yes - your TCP stack (likely part of your Operating System) won't simply let your application enqueue an arbitrary amount of data, potentially taking up absurd amounts of buffer memory outside your app. Instead, it has a limited buffer size and after that's full you're expected to loop around and enqueue more data in the buffer - by further calls to write - after some has actually been sent over the network. So - loop and resend from where the previous send stopped: if your socket's not been set non-blocking, the call will block until more buffer space is available.
Why?
There could be several reasons. There might be some sort of physical limitation (hardware). The client buffer could be full. Some sort of implementation limit could have been reached. Some sort of signal could have been received.
is there a limit in the number of bytes that I can write in a TCP /IP connection?
The limit is around 2^32 bytes.
how can I solve this problem?
Keep track of how much is sent with each write and keep writing until everything in the buffer has been written.
I did not add a sample as the first question could have been answered by checking the documentation for write(), the second could have been answered with a quick search and the third question has a lot of samples out there already.
I'wm writing an app, which transmits video and obviously uses UDP protocol fot this purpose.
So I am wondering how can I increase a size of send/recieve buffer, cause currently the maximal size of data, which I can send is 65000 bytes.
I already tried to do it in following way:
int option = 262144;
if(setsockopt(m_SocketHandle,SOL_SOCKET,SO_RCVBUF ,(char*)&option,sizeof(option)) < 0)
{
printf("setsockopt failed\n");
}
But it did not work. So how can I do it?
How can I do it?
You can't. The maximum size of an IPv4 UDP datagram is 65535-20-8=65507 bytes. Increasing the buffer size cannot change that. Datagrams larger than the path MTU (< 1500 bytes) will be fragmented, and fragmented datagrams are more likely to be lost, statistically, so using datagram sizes up around 64k is contra-indicated anyway.
I'm wondering if anyone knows how to calculate the upload speed of a Berkeley socket in C++. My send call isn't blocking and takes 0.001 seconds to send 5 megabytes of data, but takes a while to recv the response (so I know it's uploading).
This is a TCP socket to a HTTP server and I need to asynchronously check how many bytes of data have been uploaded / are remaining. However, I can't find any API functions for this in Winsock, so I'm stumped.
Any help would be greatly appreciated.
EDIT: I've found the solution, and will be posting as an answer as soon as possible!
EDIT 2: Proper solution added as answer, will be added as solution in 4 hours.
I solved my issue thanks to bdolan suggesting to reduce SO_SNDBUF. However, to use this code you must note that your code uses Winsock 2 (for overlapped sockets and WSASend). In addition to this, your SOCKET handle must have been created similarily to:
SOCKET sock = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, WSA_FLAG_OVERLAPPED);
Note the WSA_FLAG_OVERLAPPED flag as the final parameter.
In this answer I will go through the stages of uploading data to a TCP server, and tracking each upload chunk and it's completion status. This concept requires splitting your upload buffer into chunks (minimal existing code modification required) and uploading it piece by piece, then tracking each chunk.
My code flow
Global variables
Your code document must have the following global variables:
#define UPLOAD_CHUNK_SIZE 4096
int g_nUploadChunks = 0;
int g_nChunksCompleted = 0;
WSAOVERLAPPED *g_pSendOverlapped = NULL;
int g_nBytesSent = 0;
float g_flLastUploadTimeReset = 0.0f;
Note: in my tests, decreasing UPLOAD_CHUNK_SIZE results in increased upload speed accuracy, but decreases overall upload speed. Increasing UPLOAD_CHUNK_SIZE results in decreased upload speed accuracy, but increases overall upload speed. 4 kilobytes (4096 bytes) was a good comprimise for a file ~500kB in size.
Callback function
This function increments the bytes sent and chunks completed variables (called after a chunk has been completely uploaded to the server)
void CALLBACK SendCompletionCallback(DWORD dwError, DWORD cbTransferred, LPWSAOVERLAPPED lpOverlapped, DWORD dwFlags)
{
g_nChunksCompleted++;
g_nBytesSent += cbTransferred;
}
Prepare socket
Initially, the socket must be prepared by reducing SO_SNDBUF to 0.
Note: In my tests, any value greater than 0 will result in undesirable behaviour.
int nSndBuf = 0;
setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (char*)&nSndBuf, sizeof(nSndBuf));
Create WSAOVERLAPPED array
An array of WSAOVERLAPPED structures must be created to hold the overlapped status of all of our upload chunks. To do this I simply:
// Calculate the amount of upload chunks we will have to create.
// nDataBytes is the size of data you wish to upload
g_nUploadChunks = ceil(nDataBytes / float(UPLOAD_CHUNK_SIZE));
// Overlapped array, should be delete'd after all uploads have completed
g_pSendOverlapped = new WSAOVERLAPPED[g_nUploadChunks];
memset(g_pSendOverlapped, 0, sizeof(WSAOVERLAPPED) * g_nUploadChunks);
Upload data
All of the data that needs to be send, for example purposes, is held in a variable called pszData. Then, using WSASend, the data is sent in blocks defined by the constant, UPLOAD_CHUNK_SIZE.
WSABUF dataBuf;
DWORD dwBytesSent = 0;
int err;
int i, j;
for(i = 0, j = 0; i < nDataBytes; i += UPLOAD_CHUNK_SIZE, j++)
{
int nTransferBytes = min(nDataBytes - i, UPLOAD_CHUNK_SIZE);
dataBuf.buf = &pszData[i];
dataBuf.len = nTransferBytes;
// Now upload the data
int rc = WSASend(sock, &dataBuf, 1, &dwBytesSent, 0, &g_pSendOverlapped[j], SendCompletionCallback);
if ((rc == SOCKET_ERROR) && (WSA_IO_PENDING != (err = WSAGetLastError())))
{
fprintf(stderr, "WSASend failed: %d\n", err);
exit(EXIT_FAILURE);
}
}
The waiting game
Now we can do whatever we wish while all of the chunks upload.
Note: the thread which called WSASend must be regularily put into an alertable state, so that our 'transfer completed' callback (SendCompletionCallback) is dequeued out of the APC (Asynchronous Procedure Call) list.
In my code, I continuously looped until g_nUploadChunks == g_nChunksCompleted. This is to show the end-user upload progress and speed (can be modified to show estimated completion time, elapsed time, etc.)
Note 2: this code uses Plat_FloatTime as a second counter, replace this with whatever second timer your code uses (or adjust accordingly)
g_flLastUploadTimeReset = Plat_FloatTime();
// Clear the line on the screen with some default data
printf("(0 chunks of %d) Upload speed: ???? KiB/sec", g_nUploadChunks);
// Keep looping until ALL upload chunks have completed
while(g_nChunksCompleted < g_nUploadChunks)
{
// Wait for 10ms so then we aren't repeatedly updating the screen
SleepEx(10, TRUE);
// Updata chunk count
printf("\r(%d chunks of %d) ", g_nChunksCompleted, g_nUploadChunks);
// Not enough time passed?
if(g_flLastUploadTimeReset + 1 > Plat_FloatTime())
continue;
// Reset timer
g_flLastUploadTimeReset = Plat_FloatTime();
// Calculate how many kibibytes have been transmitted in the last second
float flByteRate = g_nBytesSent/1024.0f;
printf("Upload speed: %.2f KiB/sec", flByteRate);
// Reset byte count
g_nBytesSent = 0;
}
// Delete overlapped data (not used anymore)
delete [] g_pSendOverlapped;
// Note that the transfer has completed
Msg("\nTransfer completed successfully!\n");
Conclusion
I really hope this has helped somebody in the future who has wished to calculate upload speed on their TCP sockets without any server-side modifications. I have no idea how performance detrimental SO_SNDBUF = 0 is, although I'm sure a socket guru will point that out.
You can get a lower bound on the amount of data received and acknowledged by subtracting the value of the SO_SNDBUF socket option from the number of bytes you have written to the socket. This buffer may be adjusted using setsockopt, although in some cases the OS may choose a length smaller or larger than you specify, so you must re-check after setting it.
To get more precise than that, however, you must have the remote side inform you of progress, as winsock does not expose an API to retrieve the amount of data currently pending in the send buffer.
Alternately, you could implement your own transport protocol on UDP, but implementing rate control for such a protocol can be quite complex.
Since you don't have control over the remote side, and you want to do it in the code, I'd suggest doing very simple approximation. I assume a long living program/connection. One-shot uploads would be too skewed by ARP, DNS lookups, socket buffering, TCP slow start, etc. etc.
Have two counters - length of the outstanding queue in bytes (OB), and number of bytes sent (SB):
increment OB by number of bytes to be sent every time you enqueue a chunk for upload,
decrement OB and increment SB by the number returned from send(2) (modulo -1 cases),
on a timer sample both OB and SB - either store them, log them, or compute running average,
compute outstanding bytes a second/minute/whatever, same for sent bytes.
Network stack does buffering and TCP does retransmission and flow control, but that doesn't really matter. These two counters will tell you the rate your app produces data with, and the rate it is able to push it to the network. It's not the method to find out the real link speed, but a way to keep useful indicators about how good the app is doing.
If data production rate is bellow the network output rate - everything is fine. If it's the other way around and the network cannot keep up with the app - there's a problem - you need either faster network, slower app, or different design.
For one-time experiments just take periodic snapshots of netstat -sp tcp output (or whatever that is on Windows) and calculate the send-rate manually.
Hope this helps.
If your app uses packet headers like
0001234DT
where 000123 is the packet length for a single packet, you can consider using MSG_PEEK + recv() to get the length of the packet before you actually read it with recv().
The problem is send() is NOT doing what you think - it is buffered by the kernel.
getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket send buffer = %d\n", now(), flag);
sz=sizeof(int);
ERR_CHK(getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket recv buffer = %d\n", now(), flag);
See what these show for you.
When you recv on a NON-blocking socket that has data, it normally does not have MB of data parked in the buufer ready to recv. Most of what I have experienced is that the socket has ~1500 bytes of data per recv. Since you are probably reading on a blocking socket it takes a while for the recv() to complete.
Socket buffer size is the probably single best predictor of socket throughput. setsockopt() lets you alter socket buffer size, up to a point. Note: these buffers are shared among sockets in a lot of OSes like Solaris. You can kill performance by twiddling these settings too much.
Also, I don't think you are measuring what you think you are measuring. The real efficiency of send() is the measure of throughput on the recv() end. Not the send() end.
IMO.
I have a TCP client connecting to my server which is sending raw data packets. How, using Boost.Asio, can I get the "whole" packet every time (asynchronously, of course)? Assume these packets can be any size up to the full size of my memory.
Basically, I want to avoid creating a statically sized buffer.
Typically when you build a custom protocol on the top of TCP/IP you use a simple message format where first 4 bytes is an unsigned integer containing the message length and the rest is the message data. If you have such a protocol then the reception loop is as simple as below (not sure what is ASIO notation, so it's just an idea)
for(;;) {
uint_32_t len = 0u;
read(socket, &len, 4); // may need multiple reads in non-blocking mode
len = ntohl(len);
assert (len < my_max_len);
char* buf = new char[len];
read(socket, buf, len); // may need multiple reads in non-blocking mode
...
}
typically, when you do async IO, your protocol should support it.
one easy way is to prefix a byte array with it's length at the logical level, and have the reading code buffer up until it has a full buffer ready for parsing.
if you don't do it, you will end up with this logic scattered all over the place (think about reading a null terminated string, and what it means if you just get a part of it every time select/poll returns).
TCP doesn't operate with packets. It provides you one contiguous stream. You can ask for the next N bytes, or for all the data received so far, but there is no "packet" boundary, no way to distinguish what is or is not a packet.