User with slow data rate, slows down all other users on the server

User with slow data rate, slows down all other users on the server - c++

I am developing an echo server that accepts multiple clients using the poll technique. The source code is here.
The problem I am facing is that when a client sends data to the server with a slow rate, it will slow down all other clients, waiting for the slow user to finish sending all the data first.
To overcome this problem I though about creating worker threads on the server and send each socket read there, but this would make things more complicated because I will have to use some kind of synchronisation so that not two threads will read from the same socket descriptor at the same time.
What method are you using to overcome this problem with the slow clients? Is there any design pattern for this?
Edit 1: Some code to describe the question better
Edit 2: Added a check for POLLIN before recv
//simplified code
//poll loop
do {
rc = poll(fds, nfds, -1);
//when some socket descriptors are ready for read, read them all
for (i = 0; i < nfds; i++)
{
if (fds[i].revents & POLLIN) {
//here if the fds[i] socket descriptor has a slow data rate, it will slow down all the other
//clients waiting for the server to complete the recv with the slow client.
int bytes_recv = recv(fds[i], buffer, 1024, 0);
}
}
} while (end_server == false);

Related

Can I create a chat program with IOCP?

I've just entered socket programming.
But I was given homework to implement a simple chat program with IOCP.
WSASend(), which is used in IOCP, forwards data to only one client connected to the socket, and how do I change this to transfer data to all connected clients?
After many attempts, the server was still transferring data only to the client that sent it.
When one client sends data to the server, I want the server to send it to all connected clients.
After many attempts, the server was still transferring data only to the client that sent it.
When one client sends data to the server, I want the server to send it to all connected clients.
This is a function that, in the IOCP program that I am currently dealing with, transfers data received to the client through a connected socket back to the client. I'm thinking about the problem here.
Of course there may be problems elsewhere. I'll show you if you want to see other parts of the code.
bool Session::SendMsg(int size) { // size : size of send data
ZeroMemory(&_sio.over, sizeof(_sio.over));
_sio.wbuf.len = size;
_sio.wbuf.buf = _sio.mbuf;
_sio.type = Iotype::Send;
DWORD trans;
// It sends data to only one client....
int ret = WSASend(_sock, &_sio.wbuf, 1, &trans, 0, &_sio.over, NULL);
if (ret == SOCKET_ERROR && WSAGetLastError() != WSA_IO_PENDING)
return false;
return true;
}

UDP real time sending and receiving on Linux on command from control computer

I am currently working on a project written in C++ involving UDP real time connection. I receive UDP packets from a control computer containing commands to start/stop an infinite while loop that reads data from an IMU and sends that data to the control computer.
My problem is the following: First I implemented an exit condition from the loop using recvfrom() and read(), but the control computer sends a UDP packet every second, which was delaying the whole loop and made sending the data in the desired time interval of 5ms impossible.
I tried to fix this problem by usingfcntl(fd, F_SETFL, O_NONBLOCK);and using only read(), which actually works fine, but I am unsure whether this is a wise idea or not, since I am not checking for errors anymore. Is there any elegant way how to solve this problem? I thought about using Pthreads or something like that, however I have never worked with threads or parallel programming so I would have to spend some time learning that.
I appreciate any advice on that problem you could give me.
Here is a code example:
//include
...
int main() {
RNet cmd; //RNet: struct that contains all the information of the UDP header and the command
RNet* pCmd = &cmd;
ssize_t b;
int fd2;
struct sockaddr_in snd; // sender is control computer
socklen_t length;
// further declaration of variables, connecting to socket, etc...
...
fcntl(fd2, F_SETFL, O_NONBLOCK);
while (1)
{
// read messages from control computer
if ((b = read(fd2, pCmd, 19)) > 0) {
memcpy(&cmd, pCmd, b);
}
// transmission
while (cmd.CLout.MotionCommand == 1) // MotionCommand: 1 - send messages; 0 - do nothing
{
if(time_elapsed >= 5) // elapsed time in ms
{
// update sensor values
...
//sendto ()
...
// update control time, timestamp, etc.
...
}
if (recvfrom(fd2, pCmd, (int)sizeof(pCmd), 0, (struct sockaddr*) &snd, &length) < 0) {
perror("error receiving data");
return 0;
}
// checking Control Model Command
if ((b = read(fd2, pCmd, 19)) > 0) {
memcpy(&cmd, pCmd, b);
}
}
}
}

I really like the "blocking calls on multiple threads" design. It enables you to have distinct independent tasks, and you don't have to worry about how each task can disturb another. It can have some drawbacks but it is usually a good fit for many needs.
To do that, just use pthread_create to create a new thread for each task (you may keep the main thread for one task). In your case, you should have a thread to receive commands, and another one to send your data. You also need for the receiving thread to notify the sending thread of the commands. To do that, you can use some synchronization tool, like a mutex.
Overall, you should have your receiving thread blocking on recvfrom, and the sending thread waiting for a signal from the mutex (wait for the mutex to be freed, technically). When the receiving thread receive a start command, it signals the mutex and go back to recvfrom (optionally you can set a variable to provide more information to the other thread).
As a comment, remember that UDP are 1-to-many, thus your code here will react to any packet sent to you (even from some random or malicious host). You may want to filter with the remote sockaddr after recvfrom, or use connect + recv. It depends on what you want.

UDP recvfrom thread use too much CPU resources

I am writing a Windows 7 visual c++ server application, which should receive UDP datagrams with 3.6 MB/s.
I have a main thread where the recvfrom() receives the data. The socket is a non-blocking socket and has 64kB receive buffer. If no data has been received on the socket the thread executes a sleep(1).
My problem is that the thread uses up almost 50% of my dual-core processor and I have no idea how could I decrease it. Wireshark use only 20% of it, so my main goal is to achieve a similar percentage.
Do you have any ideas?

Rather than polling you could use a select-like approach to wait for either data to arrive at your socket or the client to decide to shutdown:
First make your socket non-blocking:
u_long nonBlocking = 0;
WSAEventSelect(sock, NULL, 0);
ioctlsocket(sock, FIONBIO, &nonBlocking);
then use WSAWaitForMultipleEvents to wait until either data arrives or you want to cancel the recv:
int32_t MyRecv(THandle sock, WSAEVENT* recvCancelEvt,
uint8_t* buffer, uint32_t bufferBytes)
{
int32_t bytesReceived;
WSAEVENT evt;
DWORD ret;
HANDLE handles[2];
event = WSACreateEvent();
if (NULL == evt) {
return -1;
}
if (0 != WSAEventSelect(handle->iSocket, evt, FD_READ|FD_CLOSE)) {
WSACloseEvent(evt);
return -1;
}
bytesReceived = recv(sock, (char*)buffer, bufferBytes, 0);
if (SOCKET_ERROR==received && WSAEWOULDBLOCK==WSAGetLastError()) {
handles[0] = evt;
handles[1] = *recvCancelEvt;
ret = WSAWaitForMultipleEvents(2, handles, FALSE, INFINITE, FALSE);
if (WAIT_OBJECT_0 == ret) {
bytesReceived = recv(handle->iSocket, (char*)buffer, bufferBytes, 0);
}
}
WSACloseEvent(evt);
return bytesReceived;
}
Client code would call WSASetEvent on recvCancelEvt if it wanted to cancel a recv.

While solutions based on Select or blocking sockets are the correct approach, the reason you're running one core at 100% is due to the behaviour of Sleep:-
Look at the docs for WinAPI sleep():
This function causes a thread to relinquish the remainder of its time
slice and become unrunnable for an interval based on the value of
dwMilliseconds. The system clock "ticks" at a constant rate. If
dwMilliseconds is less than the resolution of the system clock, the
thread may sleep for less than the specified length of time.
So, if you're polling, you either need to use a much larger sleep time (maybe 20Ms, which is typically a bit greater than the Windows tick rate), or use a more accurate multimedia timer.

I would recommend using a boost::asio::io_service. We receive about 200MB/s of UDP multicast traffic while maxing out a modern CPU. This includes a full reliability protocol and data dispatch to the application. The bottleneck in profiling is the processing, not the boost::asio receive. Code here

It looks like most of the times your call to recvfrom does not return data. Sleeping 1 ms is not much. You should think about increasing the sleep time (cheap, but not best solution) or, the better solution, think about using an event driven approach. Use select() or Windows API to block until the socket is signalled or some other event you are interested in occurs and then call recvfrom. You might need to redesign your program's main loop for that.

How to avoid DOS attack in this code?

I have a code written in C/C++ that look like this:
while(1)
{
//Accept
struct sockaddr_in client_addr;
int client_fd = this->w_accept(&client_addr);
char client_ip[64];
int client_port = ntohs(client_addr.sin_port);
inet_ntop(AF_INET, &client_addr.sin_addr, client_ip, sizeof(client_ip));
//Listen first string
char firststring[512];
memset(firststring,0,512);
if(this->recvtimeout(client_fd,firststring,sizeof(firststring),u->timeoutlogin) < 0){
close(client_fd);
}
if(strcmp(firststring,"firststr")!=0)
{
cout << "Disconnected!" << endl;
close(client_fd);
continue;
}
//Send OK first string
send(client_fd, "OK", 2, 0);
//Listen second string
char secondstring[512];
memset(secondstring,0,512);
if(this->recvtimeout(client_fd,secondstring,sizeof(secondstring),u->timeoutlogin) < 0){
close(client_fd);
}
if(strcmp(secondstring,"secondstr")!=0)
{
cout << "Disconnected!!!" << endl;
close(client_fd);
continue;
}
//Send OK second string
send(client_fd, "OK", 2, 0);
}
}
So, it's dossable.
I've write a very simple dos script in perl that takedown the server.
#Evildos.pl
use strict;
use Socket;
use IO::Handle;
sub dosfunction
{
my $host = shift || '192.168.4.21';
my $port = 1234;
my $firststr = 'firststr';
my $secondstr = 'secondstr';
my $protocol = getprotobyname('tcp');
$host = inet_aton($host) or die "$host: unknown host";
socket(SOCK, AF_INET, SOCK_STREAM, $protocol) or die "socket() failed: $!";
my $dest_addr = sockaddr_in($port,$host);
connect(SOCK,$dest_addr) or die "connect() failed: $!";
SOCK->autoflush(1);
print SOCK $firststr;
#sleep(1);
print SOCK $secondstr;
#sleep(1);
close SOCK;
}
my $i;
for($i=0; $i<30;$i++)
{
&dosfunction;
}
With a loop of 30 times, the server goes down.
The question is: is there a method, a system, a solution that can avoid this type of attack?
EDIT: recvtimeout
int recvtimeout(int s, char *buf, int len, int timeout)
{
fd_set fds;
int n;
struct timeval tv;
// set up the file descriptor set
FD_ZERO(&fds);
FD_SET(s, &fds);
// set up the struct timeval for the timeout
tv.tv_sec = timeout;
tv.tv_usec = 0;
// wait until timeout or data received
n = select(s+1, &fds, NULL, NULL, &tv);
if (n == 0){
return -2; // timeout!
}
if (n == -1){
return -1; // error
}
// data must be here, so do a normal recv()
return recv(s, buf, len, 0);
}

I don't think there is any 100% effective software solution to DOS attacks in general; no matter what you do, someone could always throw more packets at your network interface than it can handle.
In this particular case, though, it looks like your program can only handle one connection at a time -- that is, incoming connection #2 won't be processed until connection #1 has completed its transaction (or timed out). So that's an obvious choke point -- all an attacker has to do is connect to your server and then do nothing, and your server is effectively disabled for (however long your timeout period is).
To avoid that you would need to rewrite the server code to handle multiple TCP connections at once. You could either do that by switching to non-blocking I/O (by passing O_NONBLOCK flag to fcntl()), and using select() or poll() or etc to wait for I/O on multiple sockets at once, or by spawning multiple threads or sub-processes to handle incoming connections in parallel, or by using async I/O. (I personally prefer the first solution, but all can work to varying degrees). In the first approach it is also practical to do things like forcibly closing any existing sockets from a given IP address before accepting a new socket from that IP address, which means that any given attacking computer could only tie up a maximum of one socket on your server at a time, which would make it harder for that person to DOS your machine unless he had access to a number of client machines.
You might read this article for more discussion about handling many TCP connections at the same time.

The main issue with DOS and DDOS attacks is that they play on your weakness: namely the fact that there is a limited memory / number of ports / processing resources that you can use to provide the service. Even if you have infinite scalability (or close) using something like the Amazon farms, you'll probably want to limit it to avoid the bill going through the roof.
At the server level, your main worry should be to avoid a crash, by imposing self-preservation limits. You can for example set a maximum number of connections that you know you can handle and simply refuse any other.
Full strategies will include specialized materials, like firewalls, but there is always a way to play them and you will have to live with that.
For example of nasty attacks, read about Slow Loris on wikipedia.
Slowloris tries to keep many connections to the target web server open and hold them open as long as possible. It accomplishes this by opening connections to the target web server and sending a partial request. Periodically, it will send subsequent HTTP headers, adding to—but never completing—the request. Affected servers will keep these connections open, filling their maximum concurrent connection pool, eventually denying additional connection attempts from clients.
There are many variants of DOS attacks, so a specific answer is quite difficult.

Your code leaks a filehandle when it succeeds, this will eventually make you run out of fds to allocate, making accept() fail.
close() the socket when you're done with it.
Also, to directly answer your question, there is no solution for DOS caused by faulty code other than correcting it.

This isn't a cure-all for DOS attacks, but using non-blocking sockets will definitely help for scalability. And if you can scale-up, you can mitigate many DOS attacks. This design changes includes setting both the listen socket used in accept calls and the client connection sockets to non-blocking.
Then instead of blocking on a recv(), send(), or an accept() call, you block on either a poll, epoll, or select call - then handle that event for that connection as much as you are able to. Use a reasonable timeout (e.g. 30 seconds) such that you can wake up from polling call to sweep and close any connections that don't seem to be progressing through your protocol chain.
This basically requires every socket to have it's own "connection" struct that keeps track of the state of that connection with respect to the protocol you implement. It likely also means keeping a (hash) table of all sockets so they can be mapped to their connection structure instance. It also means "sends" are non-blocking as well. Send and recv can return partial data amounts anyway.
You can look at an example of a non-blocking socket server on my project code here. (Look around line 360 for the start of the main loop in Run method).
An example of setting a socket into non-blocking state:
int SetNonBlocking(int sock)
{
int result = -1;
int flags = 0;
flags = ::fcntl(sock, F_GETFL, 0);
if (flags != -1)
{
flags |= O_NONBLOCK;
result = fcntl(sock , F_SETFL , flags);
}
return result;
}

I would use boost::asio::async_connector from boost::asio functionality to create multiple connection handlers (works both on single and multi-threaded environment). In the single threaded case, you just need to run from time to time boost::asio::io_service::run in order to make sure communications have time to be processed
The reason why you want to use asio is because its very good at handling asynchronous communication logic, so it won't block (as in your case) if a connection gets blocked. You can even arrange how much processing you want to devote to opening new connections, while keep serving existing ones

Calculating socket upload speed

I'm wondering if anyone knows how to calculate the upload speed of a Berkeley socket in C++. My send call isn't blocking and takes 0.001 seconds to send 5 megabytes of data, but takes a while to recv the response (so I know it's uploading).
This is a TCP socket to a HTTP server and I need to asynchronously check how many bytes of data have been uploaded / are remaining. However, I can't find any API functions for this in Winsock, so I'm stumped.
Any help would be greatly appreciated.
EDIT: I've found the solution, and will be posting as an answer as soon as possible!
EDIT 2: Proper solution added as answer, will be added as solution in 4 hours.

I solved my issue thanks to bdolan suggesting to reduce SO_SNDBUF. However, to use this code you must note that your code uses Winsock 2 (for overlapped sockets and WSASend). In addition to this, your SOCKET handle must have been created similarily to:
SOCKET sock = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, WSA_FLAG_OVERLAPPED);
Note the WSA_FLAG_OVERLAPPED flag as the final parameter.
In this answer I will go through the stages of uploading data to a TCP server, and tracking each upload chunk and it's completion status. This concept requires splitting your upload buffer into chunks (minimal existing code modification required) and uploading it piece by piece, then tracking each chunk.
My code flow
Global variables
Your code document must have the following global variables:
#define UPLOAD_CHUNK_SIZE 4096
int g_nUploadChunks = 0;
int g_nChunksCompleted = 0;
WSAOVERLAPPED *g_pSendOverlapped = NULL;
int g_nBytesSent = 0;
float g_flLastUploadTimeReset = 0.0f;
Note: in my tests, decreasing UPLOAD_CHUNK_SIZE results in increased upload speed accuracy, but decreases overall upload speed. Increasing UPLOAD_CHUNK_SIZE results in decreased upload speed accuracy, but increases overall upload speed. 4 kilobytes (4096 bytes) was a good comprimise for a file ~500kB in size.
Callback function
This function increments the bytes sent and chunks completed variables (called after a chunk has been completely uploaded to the server)
void CALLBACK SendCompletionCallback(DWORD dwError, DWORD cbTransferred, LPWSAOVERLAPPED lpOverlapped, DWORD dwFlags)
{
g_nChunksCompleted++;
g_nBytesSent += cbTransferred;
}
Prepare socket
Initially, the socket must be prepared by reducing SO_SNDBUF to 0.
Note: In my tests, any value greater than 0 will result in undesirable behaviour.
int nSndBuf = 0;
setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (char*)&nSndBuf, sizeof(nSndBuf));
Create WSAOVERLAPPED array
An array of WSAOVERLAPPED structures must be created to hold the overlapped status of all of our upload chunks. To do this I simply:
// Calculate the amount of upload chunks we will have to create.
// nDataBytes is the size of data you wish to upload
g_nUploadChunks = ceil(nDataBytes / float(UPLOAD_CHUNK_SIZE));
// Overlapped array, should be delete'd after all uploads have completed
g_pSendOverlapped = new WSAOVERLAPPED[g_nUploadChunks];
memset(g_pSendOverlapped, 0, sizeof(WSAOVERLAPPED) * g_nUploadChunks);
Upload data
All of the data that needs to be send, for example purposes, is held in a variable called pszData. Then, using WSASend, the data is sent in blocks defined by the constant, UPLOAD_CHUNK_SIZE.
WSABUF dataBuf;
DWORD dwBytesSent = 0;
int err;
int i, j;
for(i = 0, j = 0; i < nDataBytes; i += UPLOAD_CHUNK_SIZE, j++)
{
int nTransferBytes = min(nDataBytes - i, UPLOAD_CHUNK_SIZE);
dataBuf.buf = &pszData[i];
dataBuf.len = nTransferBytes;
// Now upload the data
int rc = WSASend(sock, &dataBuf, 1, &dwBytesSent, 0, &g_pSendOverlapped[j], SendCompletionCallback);
if ((rc == SOCKET_ERROR) && (WSA_IO_PENDING != (err = WSAGetLastError())))
{
fprintf(stderr, "WSASend failed: %d\n", err);
exit(EXIT_FAILURE);
}
}
The waiting game
Now we can do whatever we wish while all of the chunks upload.
Note: the thread which called WSASend must be regularily put into an alertable state, so that our 'transfer completed' callback (SendCompletionCallback) is dequeued out of the APC (Asynchronous Procedure Call) list.
In my code, I continuously looped until g_nUploadChunks == g_nChunksCompleted. This is to show the end-user upload progress and speed (can be modified to show estimated completion time, elapsed time, etc.)
Note 2: this code uses Plat_FloatTime as a second counter, replace this with whatever second timer your code uses (or adjust accordingly)
g_flLastUploadTimeReset = Plat_FloatTime();
// Clear the line on the screen with some default data
printf("(0 chunks of %d) Upload speed: ???? KiB/sec", g_nUploadChunks);
// Keep looping until ALL upload chunks have completed
while(g_nChunksCompleted < g_nUploadChunks)
{
// Wait for 10ms so then we aren't repeatedly updating the screen
SleepEx(10, TRUE);
// Updata chunk count
printf("\r(%d chunks of %d) ", g_nChunksCompleted, g_nUploadChunks);
// Not enough time passed?
if(g_flLastUploadTimeReset + 1 > Plat_FloatTime())
continue;
// Reset timer
g_flLastUploadTimeReset = Plat_FloatTime();
// Calculate how many kibibytes have been transmitted in the last second
float flByteRate = g_nBytesSent/1024.0f;
printf("Upload speed: %.2f KiB/sec", flByteRate);
// Reset byte count
g_nBytesSent = 0;
}
// Delete overlapped data (not used anymore)
delete [] g_pSendOverlapped;
// Note that the transfer has completed
Msg("\nTransfer completed successfully!\n");
Conclusion
I really hope this has helped somebody in the future who has wished to calculate upload speed on their TCP sockets without any server-side modifications. I have no idea how performance detrimental SO_SNDBUF = 0 is, although I'm sure a socket guru will point that out.

You can get a lower bound on the amount of data received and acknowledged by subtracting the value of the SO_SNDBUF socket option from the number of bytes you have written to the socket. This buffer may be adjusted using setsockopt, although in some cases the OS may choose a length smaller or larger than you specify, so you must re-check after setting it.
To get more precise than that, however, you must have the remote side inform you of progress, as winsock does not expose an API to retrieve the amount of data currently pending in the send buffer.
Alternately, you could implement your own transport protocol on UDP, but implementing rate control for such a protocol can be quite complex.

Since you don't have control over the remote side, and you want to do it in the code, I'd suggest doing very simple approximation. I assume a long living program/connection. One-shot uploads would be too skewed by ARP, DNS lookups, socket buffering, TCP slow start, etc. etc.
Have two counters - length of the outstanding queue in bytes (OB), and number of bytes sent (SB):
increment OB by number of bytes to be sent every time you enqueue a chunk for upload,
decrement OB and increment SB by the number returned from send(2) (modulo -1 cases),
on a timer sample both OB and SB - either store them, log them, or compute running average,
compute outstanding bytes a second/minute/whatever, same for sent bytes.
Network stack does buffering and TCP does retransmission and flow control, but that doesn't really matter. These two counters will tell you the rate your app produces data with, and the rate it is able to push it to the network. It's not the method to find out the real link speed, but a way to keep useful indicators about how good the app is doing.
If data production rate is bellow the network output rate - everything is fine. If it's the other way around and the network cannot keep up with the app - there's a problem - you need either faster network, slower app, or different design.
For one-time experiments just take periodic snapshots of netstat -sp tcp output (or whatever that is on Windows) and calculate the send-rate manually.
Hope this helps.

If your app uses packet headers like
0001234DT
where 000123 is the packet length for a single packet, you can consider using MSG_PEEK + recv() to get the length of the packet before you actually read it with recv().
The problem is send() is NOT doing what you think - it is buffered by the kernel.
getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket send buffer = %d\n", now(), flag);
sz=sizeof(int);
ERR_CHK(getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket recv buffer = %d\n", now(), flag);
See what these show for you.
When you recv on a NON-blocking socket that has data, it normally does not have MB of data parked in the buufer ready to recv. Most of what I have experienced is that the socket has ~1500 bytes of data per recv. Since you are probably reading on a blocking socket it takes a while for the recv() to complete.
Socket buffer size is the probably single best predictor of socket throughput. setsockopt() lets you alter socket buffer size, up to a point. Note: these buffers are shared among sockets in a lot of OSes like Solaris. You can kill performance by twiddling these settings too much.
Also, I don't think you are measuring what you think you are measuring. The real efficiency of send() is the measure of throughput on the recv() end. Not the send() end.
IMO.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js