C++ non blocking socket select send too slow? - c++

I have a program that maintains a list of "streaming" sockets. These sockets are configured to be non-blocking sockets.
Currently, I have used a list to store these streaming sockets. I have some data that I need to send to all these streaming sockets hence I used the iterator to loop through this list of streaming sockets and calling the send_TCP_NB function below:
The issue is that my own program buffer that stores the data before sending to this send_TCP_NB function slowly decreases in free size indicating that the send is slower than the rate at which data is put into the program buffer. The rate at which the program buffer is about 1000 data per second. Each data is quite small, about 100 bytes.
Hence, i am not sure if my send_TCP_NB function is working efficiently or correct?
int send_TCP_NB(int cs, char data[], int data_length) {
bool sent = false;
FD_ZERO(&write_flags); // initialize the writer socket set
FD_SET(cs, &write_flags); // set the write notification for the socket based on the current state of the buffer
int status;
int err;
struct timeval waitd; // set the time limit for waiting
waitd.tv_sec = 0;
waitd.tv_usec = 1000;
err = select(cs+1, NULL, &write_flags, NULL, &waitd);
if(err==0)
{
// time limit expired
printf("Time limit expired!\n");
return 0; // send failed
}
else
{
while(!sent)
{
if(FD_ISSET(cs, &write_flags))
{
FD_CLR(cs, &write_flags);
status = send(cs, data, data_length, 0);
sent = true;
}
}
int nError = WSAGetLastError();
if(nError != WSAEWOULDBLOCK && nError != 0)
{
printf("Error sending non blocking data\n");
return 0;
}
else
{
if(nError == WSAEWOULDBLOCK)
{
printf("%d\n", nError);
}
return 1;
}
}
}

One thing that would help is if you thought out exactly what this function is supposed to do. What it actually does is probably not what you wanted, and has some bad features.
The major features of what it does that I've noticed are:
Modify some global state
Wait (up to 1 millisecond) for the write buffer to have some empty space
Abort if the buffer is still full
Send 1 or more bytes on the socket (ignoring how much was sent)
If there was an error (including the send decided it would have blocked despite the earlier check), obtain its value. Otherwise, obtain a random error value
Possibly print something to screen, depending on the value obtained
Return 0 or 1, depending on the error value.
Comments on these points:
Why is write_flags global?
Did you really intend to block in this function?
This is probably fine
Surely you care how much of the data was sent?
I do not see anything in the documentation that suggests that this will be zero if send succeeds
If you cleared up what the actual intent of this function was, it would probably be much easier to ensure that this function actually fulfills that intent.
That said
I have some data that I need to send to all these streaming sockets
What precisely is your need?
If your need is that the data must be sent before proceeding, then using a non-blocking write is inappropriate*, since you're going to have to wait until you can write the data anyways.
If your need is that the data must be sent sometime in the future, then your solution is missing a very critical piece: you need to create a buffer for each socket which holds the data that needs to be sent, and then you periodically need to invoke a function that checks the sockets to try writing whatever it can. If you spawn a new thread for this latter purpose, this is the sort of thing select is very useful for, since you can make that new thread block until it is able to write something. However, if you don't spawn a new thread and just periodically invoke a function from the main thread to check, then you don't need to bother. (just write what you can to everything, even if it's zero bytes)
*: At least, it is a very premature optimization. There are some edge cases where you could get slightly more performance by using the non-blocking writes intelligently, but if you don't understand what those edge cases are and how the non-blocking writes would help, then guessing at it is unlikely to get good results.
EDIT: as another answer implied, this is something the operating system is good at anyways. Rather than try to write your own code to manage this, if you find your socket buffers filling up, then make the system buffers larger. And if they're still filling up, you should really give serious thought to the idea that your program needs to block anyways, so that it stops sending data faster than the other end can handle it. i.e. just use ordinary blocking sends for all of your data.

Some general advice:
Keep in mind you are multiplying data. So if you get 1 MB/s in, you output N MB/s with N clients. Are you sure your network card can take it ? It gets worse with smaller packets, you get more general overhead. You may want to consider broadcasting.
You are using non blocking sockets, but you block while they are not free. If you want to be non blocking, better discard the packet immediately if the socket is not ready.
What would be better is to "select" more than one socket at once. Do everything that you are doing but for all the sockets that are available. You'll write to each "ready" socket, then repeat again while there are sockets that are not ready. This way, you'll proceed with the sockets that are available first, and then with some chance, the busy sockets will become themselves available.
the while (!sent) loop is useless and probably buggy. Since you are checking only one socket FD_ISSET will always be true. It is wrong to check again FD_ISSET after a FD_CLR
Keep in mind that your OS has some internal buffers for the sockets and that there are way to extend them (not easy on Linux, though, to get large values you need to do some config as root).
There are some socket libraries that will probably work better than what you can implement in a reasonable time (boost::asio and zmq for the ones I know).
If you need to implement it yourself, (i.e. because for instance zmq has its own packet format), consider using a threadpool library.
EDIT:
Sleeping 1 millisecond is probably a bad idea. Your thread will probably get descheduled and it will take much more than that before you get some CPU time again.

This is just a horrible way to do things. The select serves no purpose but to waste time. If the send is non-blocking, it can mangle data on a partial send. If it's blocking, you still waste arbitrarily much time waiting for one receiver.
You need to pick a sensible I/O strategy. Here is one: Set all sockets non-blocking. When you need to send data to a socket, just call write. If all the data writes, lovely. If not, save the portion of data that wasn't sent for later and add the socket to your write set. When you have nothing else to do, call select. If you get a hit on any socket in your write set, write as many bytes as you can from what you saved. If you write all of them, remove that socket from the write set.
(If you need to write to a data that's already in your write set, just add the data to the saved data to be sent. You may need to close the connection if too much data gets buffered.)
A better idea might be to use a library that already does all these things. Boost::asio is a good one.

You are calling select() before calling send(). Do it the other way around. Call select() only if send() reports WSAEWOULDBLOCK, eg:
int send_TCP_NB(int cs, char data[], int data_length)
{
int status;
int err;
struct timeval waitd;
char *data_ptr = data;
while (data_length > 0)
{
status = send(cs, data_ptr, data_length, 0);
if (status > 0)
{
data_ptr += status;
data_length -= status;
continue;
}
err = WSAGetLastError();
if (err != WSAEWOULDBLOCK)
{
printf("Error sending non blocking data\n");
return 0; // send failed
}
FD_ZERO(&write_flags);
FD_SET(cs, &write_flags); // set the write notification for the socket based on the current state of the buffer
waitd.tv_sec = 0;
waitd.tv_usec = 1000;
status = select(cs+1, NULL, &write_flags, NULL, &waitd);
if (status > 0)
continue;
if (status == 0)
printf("Time limit expired!\n");
else
printf("Error waiting for time limit!\n");
return 0; // send failed
}
return 1;
}

Related

epoll_wait return EPOLLOUT even with EPOLLET flag

I am using linux epoll in edge trigger mode.
Each time a new connection is incoming, I add the file descriptor to epoll with EPOLLIN|EPOLLOUT|EPOLLET flag. My first question is: What's the right way to check which kind of event(s) occur for each ready file descriptor after the epoll_wait returns? I mean, I see some example code e.g from https://github.com/yedf/handy/blob/master/raw-examples/epoll-et.cc line 124 do it like this:
for (int i = 0; i < n; i++) {
//...
if (events & (EPOLLIN | EPOLLERR)) {
if (fd == lfd) {
handleAccept(efd, fd);
} else {
handleRead(efd, fd);
}
} else if (events & EPOLLOUT) {
if (output_log)
printf("handling epollout\n");
handleWrite(efd, fd);
} else {
exit_if(1, "unknown event");
}
}
What caught my attention is: it uses "if and else if and else" to check which event occurs, which means if it handleRead, then it can't handleWrite at the same time. And I think this may cause loss of event in the following condition: Both socket read and write operation have meet EAGAIN and then the remote end both read and send some data, thus the epoll wait may set both EPOLLIN and EPOLLOUT, but it can only handleRead, and the data remaining in output buffer can't be sent since handleWrite is not being called.
So is the above usage wrong?
According man 7 epoll QA:
If more than one event occurs between epoll_wait(2) calls, are
they combined or reported separately?
They will be combined.
If i got it right, several events can occur on a single file descriptor between epoll_wait calls. So I think I should use multiple "if if and if" to check on by one whether readable/writable/error events occur instead of using "if and else if". I went to see how nginx epoll module do, from https://github.com/nginx/nginx/blob/953f53921505a884f3912f2d8db5217a71c0479a/src/event/modules/ngx_epoll_module.c#L867 I see the following code:
if (revents & (EPOLLERR|EPOLLHUP)) {
//...
}
if ((revents & EPOLLIN) && rev->active) {
//....
rev->handler(rev);
}
if ((revents & EPOLLOUT) && wev->active) {
//....
wev->handler(wev);
}
It seems to adhere to my thoughts of checking all EPOLLERR..,EPOLLIN,EPOLLOUT events one after another.
Then I do the same kind of thing as nginx do in my application. But What I realized after experiment is: if I add the file descriptor to epoll with EPOLLIN|EPOLLOUT|EPOLLET flag, and I didn't fill up the output buffer, I will always get EPOLLOUT flag set after epoll_wait returns due to some data arrives and this fd becomes readable, therefore redundant write_handler would be called, which is not what I expect.
I did some search and found that this situation indeed exists and not caused by any bug in my application. According to the top voted answer at epoll with edge triggered event says:
On a somewhat related note: if you register for EPOLLIN and EPOLLOUT events and assuming you never fill up the send buffer, you still get the EPOLLOUT flag set in the event returned by epoll_wait each time EPOLLIN is triggered - see https://lkml.org/lkml/2011/11/17/234 for a more detailed explanation.
And the link in this answer says:
It's doesn't mean there's an EPOLLOUT "event", it just means a message
is triggered (by the socket becoming readable) so you get a status
update. In theory the program doesn't need to be told about EPOLLOUT
here (it should be assuming the socket is writable already), but it
doesn't do any harm.
So far What I understand about epoll edge trigger mode is:
the epoll_wait return when the state of any fd being monitored has changed, e.g from nothing to read -> readable or buffer is full-> buffer can write
the epoll_wait may return one or several event(flags) for each fd in the ready list.
the flags in sturct epoll_event.events field indicate the current state of this fd. Even if we don't fill out the output buffer, the EPOLLOUT flag would be set when epoll_wait return due to readable, because the current state of the fd is just writable.
Please correct me if I am wrong.
Then my question would be: Should I maintain a flag in each connection to indicate whether EAGAIN occurs when write to output buffer, if it is not set, don't call write_handler/handleWrite in "if (events & EPOLLOUT)" branch, so that my upper layer program would not be told about EPOLLOUT here?
What a great question (since I had pretty much the same question)! I'll just summarize what I think I know now wrt to your informative question/description and your helpful links and hopefully smarter folk will correct any mistakes.
Yes, the if/else handling of event flags is definitely bogus. For sure at least two can events can arrive at effectively the same time. E.g., both the read and write sides might have become unblocked since last you called epoll_wait(). And, of course, as soon as you accept() the connection, both reading and writing suddenly become possible, so you get an "event" of EPOLLIN|EPOLLOUT.
I really didn't grok that epoll_wait() is always delivering the entire current state, rather than only the parts of the state that changed -- thanks for clearing that up. To be perhaps clearer, epoll_wait() won't return an fd unless something changed on that socket, but if something did change, it returns all the flags representing the current state. So, I found myself staring at a stream of EPOLLIN|EPOLLOUT events wondering why it was claiming there was an "output" event, even though I hadn't written anything yet. Your answer being correct: it's just telling me the output side is still writeable.
"Should I maintain a flag..." Yes, but I would imagine that in all but the most trivial situations you were probably going to end up maintaining at least one bit of "am I currently blocked" state for your readers/writers anyway. For example, if you ever want to process data in an order different than how it arrives (e.g., prioritize responses over requests to make your server more resistant to overload) you instantly have to give up the simplicity of just having the arrival of I/O drive everything. In the particular case of writing, epoll simply doesn't have enough information to notify you at the "right" time. As soon as you accept a connection, there's an event that says "you can write now"--but you probably have nothing to write if you're a server who couldn't possibly have already gotten a request from the client. epoll just can't know whether you have something to write or not, so you were always going to have to either suffer essentially "extraneous" events, or maintain your own state.
In all but the simplest cases, the socket file descriptor ends up being insufficient information for handling I/O events, so you invariably have to associate some data structure with it, or object if you prefer. So, my C++ looks something like:
nAwake = epoll_wait(epollFd, events, 100, milliseconds);
if(nAwake < 0)
{
perror("epoll_wait failed");
assert(false);
}
for(int iSocket=0; iSocket < nAwake; ++iSocket)
{
auto This = static_cast<Eventable*>(events[iSocket].data.ptr);
auto eventFlags = events[iSocket].events;
fprintf(stderr, "%s event on socket [%d] -> %s\n",
This->ClassName(), This->fd, DumpEvent(eventFlags));
This->Event(eventFlags);
}
Where Eventable is a C++ class (or derivative thereof) that has all the state needed to decide how to handle the flags epoll delivers. (Of course, this is letting the kernel store a pointer to a C++ object, requiring a design that is very clear about pointer ownership/lifetimes.)
And since you're writing low-level code on Linux, you may also care about EPOLLRDHUP. This not-highly-portable flag lets you save one call to read(). If the client (curl seems pretty good at evoking this behavior) closes its write side of the connection (sends a FIN), you normally discover that when epoll tells you EPOLLIN, but read() returns zero bytes. However, Linux maintains an extra bit to indicate your client's write side (your read side) has been closed. So, if you tell epoll you want the EPOLLRDHUP event you can use it to avoid doing a read() whose sole purpose will turn out to be telling you the writer closed their side.
Note that EPOLLIN will still be turned on whenever EPOLLRDHUP is, AFAIK. Even after you do a shutdown(fd, SHUT_RD). Another example of how you will usually be driven to maintain your own idea of the state of the connection. You care more about clients who are kind enough to do half-shutdowns if you are implementing HTTP.
When used as an edge-triggered interface, for performance reasons,
it
is possible to add the file descriptor inside the epoll interface
(EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT).
This allows you
to avoid continuously switching between EPOLLIN and EPOLLOUT calling
epoll_ctl(2) with EPOLL_CTL_MOD.

UDP real time sending and receiving on Linux on command from control computer

I am currently working on a project written in C++ involving UDP real time connection. I receive UDP packets from a control computer containing commands to start/stop an infinite while loop that reads data from an IMU and sends that data to the control computer.
My problem is the following: First I implemented an exit condition from the loop using recvfrom() and read(), but the control computer sends a UDP packet every second, which was delaying the whole loop and made sending the data in the desired time interval of 5ms impossible.
I tried to fix this problem by usingfcntl(fd, F_SETFL, O_NONBLOCK);and using only read(), which actually works fine, but I am unsure whether this is a wise idea or not, since I am not checking for errors anymore. Is there any elegant way how to solve this problem? I thought about using Pthreads or something like that, however I have never worked with threads or parallel programming so I would have to spend some time learning that.
I appreciate any advice on that problem you could give me.
Here is a code example:
//include
...
int main() {
RNet cmd; //RNet: struct that contains all the information of the UDP header and the command
RNet* pCmd = &cmd;
ssize_t b;
int fd2;
struct sockaddr_in snd; // sender is control computer
socklen_t length;
// further declaration of variables, connecting to socket, etc...
...
fcntl(fd2, F_SETFL, O_NONBLOCK);
while (1)
{
// read messages from control computer
if ((b = read(fd2, pCmd, 19)) > 0) {
memcpy(&cmd, pCmd, b);
}
// transmission
while (cmd.CLout.MotionCommand == 1) // MotionCommand: 1 - send messages; 0 - do nothing
{
if(time_elapsed >= 5) // elapsed time in ms
{
// update sensor values
...
//sendto ()
...
// update control time, timestamp, etc.
...
}
if (recvfrom(fd2, pCmd, (int)sizeof(pCmd), 0, (struct sockaddr*) &snd, &length) < 0) {
perror("error receiving data");
return 0;
}
// checking Control Model Command
if ((b = read(fd2, pCmd, 19)) > 0) {
memcpy(&cmd, pCmd, b);
}
}
}
}
I really like the "blocking calls on multiple threads" design. It enables you to have distinct independent tasks, and you don't have to worry about how each task can disturb another. It can have some drawbacks but it is usually a good fit for many needs.
To do that, just use pthread_create to create a new thread for each task (you may keep the main thread for one task). In your case, you should have a thread to receive commands, and another one to send your data. You also need for the receiving thread to notify the sending thread of the commands. To do that, you can use some synchronization tool, like a mutex.
Overall, you should have your receiving thread blocking on recvfrom, and the sending thread waiting for a signal from the mutex (wait for the mutex to be freed, technically). When the receiving thread receive a start command, it signals the mutex and go back to recvfrom (optionally you can set a variable to provide more information to the other thread).
As a comment, remember that UDP are 1-to-many, thus your code here will react to any packet sent to you (even from some random or malicious host). You may want to filter with the remote sockaddr after recvfrom, or use connect + recv. It depends on what you want.

C/C++ recv() with timeout

Hoping someone can help me out. I'm wanting to implement some sort of timeout if my socket is not able to receive data in a certain amount of time... I've looked up ways online but the examples I've looked at doesn't have their recv() in a while loop like mine, they typically just receive the whole buffer that is waiting. Maybe mine just isn't very efficient and someone could point me in a better direction in receiving all the data.
The string that is to be received is not a fixed length which is why I receive 1 at a time because I don't know how big the string might be. As you can see my recv() will receive data until it finds the End of text character (). The examples with select() I found would use the select before calling receive, but should I be doing that for each go around of my while loop? or maybe call select() before I even enter the while loop?
Anyways, any help is appreciated.
string recv_data(int socket){
bool endfound = false;
char temp[1];
string recvstring ;
while(endfound == false)//receives 1 character at a time until ETX(\x03)character is found
{
if(recv(socket,temp,sizeof(temp),0)<0)
{
perror("error in recv data");
}
if(memchr(temp,'\x03',1) != NULL)
{
endfound = true;
}
recvstring += temp;
temp[0] = 0;
}
return formatting(recvstring);
}
You must put your socket into non-blocking mode, and use poll().
poll() technically works on regular blocking sockets too; however it's been my experience that there are various subtle differences in semantics and race conditions, when using poll() with blocking sockets, and for best portability I always used non-blocking mode sockets, together with poll(), and careful inspection of the returning value from recv() or read().
Some Google food for you: fcntl(), F_SETFL, O_NONBLOCK.

send and recv on same socket from different threads not working

I read that it should be safe from different threads concurrently, but my program has some weird behaviour and I don't know what's wrong.
I have concurrent threads communicating with a client socket
one doing send to a socket
one doing select and then recv from the same socket
As I'm still sending, the client has already received the data and closed the socket.
At the same time, I'm doing a select and recv on that socket, which returns 0 (since it is closed) so I close this socket. However, the send has not returned yet...and since I call close on this socket the send call fails with EBADF.
I know the client has received the data correctly since I output it after I close the socket and it is right. However, on my end, my send call is still returning an error (EBADF), so I want to fix it so it doesn't fail.
This doesn't always happen. It happens maybe 40% of the time. I don't use sleep anywhere. Am I supposed to have pauses between sends or recvs or anything?
Here's some code:
Sending:
while(true)
{
// keep sending until send returns 0
n = send(_sfd, bytesPtr, sentSize, 0);
if (n == 0)
{
break;
}
else if(n<0)
{
cerr << "ERROR: send returned an error "<<errno<< endl; // this case is triggered
return n;
}
sentSize -= n;
bytesPtr += n;
}
Receiving:
while(true)
{
memset(bufferPointer,0,sizeLeft);
n = recv(_sfd,bufferPointer,sizeLeft, 0);
if (debug) cerr << "Receiving..."<<sizeLeft<<endl;
if(n == 0)
{
cerr << "Connection closed"<<endl; // this case is triggered
return n;
}
else if (n < 0)
{
cerr << "ERROR reading from socket"<<endl;
return n;
}
bufferPointer += n;
sizeLeft -= n;
if(sizeLeft <= 0) break;
}
On the client, I use the same receive code, then I call close() on the socket.
Then on my side, I get 0 from the receive call and also call close() on the socket
Then my send fails. It still hasn't finished?! But my client already got the data!
I must admit I'm surprised you see this problem as often as you do, but it's always a possibility when you're dealing with threads. When you call send() you'll end up going into the kernel to append the data to the socket buffer in there, and it's therefore quite likely that there'll be a context switch, maybe to another process in the system. Meanwhile the kernel has probably buffered and transmitted the packet quite quickly. I'm guessing you're testing on a local network, so the other end receives the data and closes the connection and sends the appropriate FIN back to your end very quickly. This could all happen while the sending machine is still running other threads or processes because the latency on a local ethernet network is so low.
Now the FIN arrives - your receive thread hasn't done a lot lately since it's been waiting for input. Many scheduling systems will therefore raise its priority quite a bit and there's a good chance it'll be run next (you don't specify which OS you're using but this is likely to happen on at least Linux, for example). This thread closes the socket due to its zero read. At some point shortly after this the sending thread will be re-awoken, but presumably the kernel notices that the socket is closed before it returns from the blocked send() and returns EBADF.
Now this is just speculation as to the exact cause - among other things it heavily depends on your platform. But you can see how this could happen.
The easiest solution is probably to use poll() in the sending thread as well, but wait for the socket to become write-ready instead of read-ready. Obviously you also need to wait until there's any buffered data to send - how you do that depends on which thread buffers the data. The poll() call will let you detect when the connection has been closed by flagging it with POLLHUP, which you can detect before you try your send().
As a general rule you shouldn't close a socket until you're certain that the send buffer has been fully flushed - you can only be sure of this once the send() call has returned and indicates that all the remaining data has gone out. I've handled this in the past by checking the send buffer when I get a zero read and if it's not empty I set a "closing" flag. In your case the sending thread would then use this as a hint to do the close once everything is flushed. This matters because if the remote end does a half-close with shutdown() then you'll get a zero read even if it might still be reading. You might not care about half closes, however, in which case your strategy above is OK.
Finally, I personally would avoid the hassle of sending and receiving threads and just have a single thread which does both - that's more or less the point of select() and poll(), to allow a single thread of execution to deal with one or more filehandles without worrying about performing an operation which blocks and starves the other connections.
Found the problem. It's with my loop. Notice that it's an infinite loop. When I don't have anymore left to send, my sentSize is 0, but I'll still loop to try to send more. At this time, the other thread has already closed this thread and so my send call for 0 bytes returns with an error.
I fixed it by changing the loop to stop looping when sentSize is 0 and it fixed the problem!

Calculating socket upload speed

I'm wondering if anyone knows how to calculate the upload speed of a Berkeley socket in C++. My send call isn't blocking and takes 0.001 seconds to send 5 megabytes of data, but takes a while to recv the response (so I know it's uploading).
This is a TCP socket to a HTTP server and I need to asynchronously check how many bytes of data have been uploaded / are remaining. However, I can't find any API functions for this in Winsock, so I'm stumped.
Any help would be greatly appreciated.
EDIT: I've found the solution, and will be posting as an answer as soon as possible!
EDIT 2: Proper solution added as answer, will be added as solution in 4 hours.
I solved my issue thanks to bdolan suggesting to reduce SO_SNDBUF. However, to use this code you must note that your code uses Winsock 2 (for overlapped sockets and WSASend). In addition to this, your SOCKET handle must have been created similarily to:
SOCKET sock = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, WSA_FLAG_OVERLAPPED);
Note the WSA_FLAG_OVERLAPPED flag as the final parameter.
In this answer I will go through the stages of uploading data to a TCP server, and tracking each upload chunk and it's completion status. This concept requires splitting your upload buffer into chunks (minimal existing code modification required) and uploading it piece by piece, then tracking each chunk.
My code flow
Global variables
Your code document must have the following global variables:
#define UPLOAD_CHUNK_SIZE 4096
int g_nUploadChunks = 0;
int g_nChunksCompleted = 0;
WSAOVERLAPPED *g_pSendOverlapped = NULL;
int g_nBytesSent = 0;
float g_flLastUploadTimeReset = 0.0f;
Note: in my tests, decreasing UPLOAD_CHUNK_SIZE results in increased upload speed accuracy, but decreases overall upload speed. Increasing UPLOAD_CHUNK_SIZE results in decreased upload speed accuracy, but increases overall upload speed. 4 kilobytes (4096 bytes) was a good comprimise for a file ~500kB in size.
Callback function
This function increments the bytes sent and chunks completed variables (called after a chunk has been completely uploaded to the server)
void CALLBACK SendCompletionCallback(DWORD dwError, DWORD cbTransferred, LPWSAOVERLAPPED lpOverlapped, DWORD dwFlags)
{
g_nChunksCompleted++;
g_nBytesSent += cbTransferred;
}
Prepare socket
Initially, the socket must be prepared by reducing SO_SNDBUF to 0.
Note: In my tests, any value greater than 0 will result in undesirable behaviour.
int nSndBuf = 0;
setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (char*)&nSndBuf, sizeof(nSndBuf));
Create WSAOVERLAPPED array
An array of WSAOVERLAPPED structures must be created to hold the overlapped status of all of our upload chunks. To do this I simply:
// Calculate the amount of upload chunks we will have to create.
// nDataBytes is the size of data you wish to upload
g_nUploadChunks = ceil(nDataBytes / float(UPLOAD_CHUNK_SIZE));
// Overlapped array, should be delete'd after all uploads have completed
g_pSendOverlapped = new WSAOVERLAPPED[g_nUploadChunks];
memset(g_pSendOverlapped, 0, sizeof(WSAOVERLAPPED) * g_nUploadChunks);
Upload data
All of the data that needs to be send, for example purposes, is held in a variable called pszData. Then, using WSASend, the data is sent in blocks defined by the constant, UPLOAD_CHUNK_SIZE.
WSABUF dataBuf;
DWORD dwBytesSent = 0;
int err;
int i, j;
for(i = 0, j = 0; i < nDataBytes; i += UPLOAD_CHUNK_SIZE, j++)
{
int nTransferBytes = min(nDataBytes - i, UPLOAD_CHUNK_SIZE);
dataBuf.buf = &pszData[i];
dataBuf.len = nTransferBytes;
// Now upload the data
int rc = WSASend(sock, &dataBuf, 1, &dwBytesSent, 0, &g_pSendOverlapped[j], SendCompletionCallback);
if ((rc == SOCKET_ERROR) && (WSA_IO_PENDING != (err = WSAGetLastError())))
{
fprintf(stderr, "WSASend failed: %d\n", err);
exit(EXIT_FAILURE);
}
}
The waiting game
Now we can do whatever we wish while all of the chunks upload.
Note: the thread which called WSASend must be regularily put into an alertable state, so that our 'transfer completed' callback (SendCompletionCallback) is dequeued out of the APC (Asynchronous Procedure Call) list.
In my code, I continuously looped until g_nUploadChunks == g_nChunksCompleted. This is to show the end-user upload progress and speed (can be modified to show estimated completion time, elapsed time, etc.)
Note 2: this code uses Plat_FloatTime as a second counter, replace this with whatever second timer your code uses (or adjust accordingly)
g_flLastUploadTimeReset = Plat_FloatTime();
// Clear the line on the screen with some default data
printf("(0 chunks of %d) Upload speed: ???? KiB/sec", g_nUploadChunks);
// Keep looping until ALL upload chunks have completed
while(g_nChunksCompleted < g_nUploadChunks)
{
// Wait for 10ms so then we aren't repeatedly updating the screen
SleepEx(10, TRUE);
// Updata chunk count
printf("\r(%d chunks of %d) ", g_nChunksCompleted, g_nUploadChunks);
// Not enough time passed?
if(g_flLastUploadTimeReset + 1 > Plat_FloatTime())
continue;
// Reset timer
g_flLastUploadTimeReset = Plat_FloatTime();
// Calculate how many kibibytes have been transmitted in the last second
float flByteRate = g_nBytesSent/1024.0f;
printf("Upload speed: %.2f KiB/sec", flByteRate);
// Reset byte count
g_nBytesSent = 0;
}
// Delete overlapped data (not used anymore)
delete [] g_pSendOverlapped;
// Note that the transfer has completed
Msg("\nTransfer completed successfully!\n");
Conclusion
I really hope this has helped somebody in the future who has wished to calculate upload speed on their TCP sockets without any server-side modifications. I have no idea how performance detrimental SO_SNDBUF = 0 is, although I'm sure a socket guru will point that out.
You can get a lower bound on the amount of data received and acknowledged by subtracting the value of the SO_SNDBUF socket option from the number of bytes you have written to the socket. This buffer may be adjusted using setsockopt, although in some cases the OS may choose a length smaller or larger than you specify, so you must re-check after setting it.
To get more precise than that, however, you must have the remote side inform you of progress, as winsock does not expose an API to retrieve the amount of data currently pending in the send buffer.
Alternately, you could implement your own transport protocol on UDP, but implementing rate control for such a protocol can be quite complex.
Since you don't have control over the remote side, and you want to do it in the code, I'd suggest doing very simple approximation. I assume a long living program/connection. One-shot uploads would be too skewed by ARP, DNS lookups, socket buffering, TCP slow start, etc. etc.
Have two counters - length of the outstanding queue in bytes (OB), and number of bytes sent (SB):
increment OB by number of bytes to be sent every time you enqueue a chunk for upload,
decrement OB and increment SB by the number returned from send(2) (modulo -1 cases),
on a timer sample both OB and SB - either store them, log them, or compute running average,
compute outstanding bytes a second/minute/whatever, same for sent bytes.
Network stack does buffering and TCP does retransmission and flow control, but that doesn't really matter. These two counters will tell you the rate your app produces data with, and the rate it is able to push it to the network. It's not the method to find out the real link speed, but a way to keep useful indicators about how good the app is doing.
If data production rate is bellow the network output rate - everything is fine. If it's the other way around and the network cannot keep up with the app - there's a problem - you need either faster network, slower app, or different design.
For one-time experiments just take periodic snapshots of netstat -sp tcp output (or whatever that is on Windows) and calculate the send-rate manually.
Hope this helps.
If your app uses packet headers like
0001234DT
where 000123 is the packet length for a single packet, you can consider using MSG_PEEK + recv() to get the length of the packet before you actually read it with recv().
The problem is send() is NOT doing what you think - it is buffered by the kernel.
getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket send buffer = %d\n", now(), flag);
sz=sizeof(int);
ERR_CHK(getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket recv buffer = %d\n", now(), flag);
See what these show for you.
When you recv on a NON-blocking socket that has data, it normally does not have MB of data parked in the buufer ready to recv. Most of what I have experienced is that the socket has ~1500 bytes of data per recv. Since you are probably reading on a blocking socket it takes a while for the recv() to complete.
Socket buffer size is the probably single best predictor of socket throughput. setsockopt() lets you alter socket buffer size, up to a point. Note: these buffers are shared among sockets in a lot of OSes like Solaris. You can kill performance by twiddling these settings too much.
Also, I don't think you are measuring what you think you are measuring. The real efficiency of send() is the measure of throughput on the recv() end. Not the send() end.
IMO.