I am currently working on a server application in C++. My main inspirations are these examples:
Windows SDK IOCP Excample
The I/O Completion Port IPv4/IPv6 Server Program Example
My app is strongly similar to these (socketobj, packageobj, ...).
In general, my app is running without issues. The only things which still causes me troubles are half open connections.
My strategy for this is: I check every connected client in a time period and count an "idle counter" up. If one completion occurs, I reset this timer. If the Idle counter goes too high, I set a boolean to prevent other threads from posting operations, and then call closesocket().
My assumption was that now the socket is closed, the pending operations will complete (maybe not instantly but after a time). This is also the behavior the MSDN documentation is describing (hints, second paragraph). I need this because only after all operations are completed can I free the resources.
Long story short: this is not the case for me. I did some tests with my testclient app and some cout and breakpoint debugging, and discovered that pending operations for closed sockets are not completing (even after waiting 10 min). I also already tried with a shutdown() call before the closesocket(), and both returned no error.
What am I doing wrong? Does this happen to anyone else? Is the MSDN documentation wrong? What are the alternatives?
I am currently thinking of the "linger" functionality, or to cancel every operation explicitly with the CancelIoEx() function
Edit: (thank you for your responses)
Yesterday evening I added a chained list for every sockedobj to hold the per io obj of the pending operations. With this I tried the CancelIOEx() function. The function returned 0 and GetLastError() returned ERROR_NOT_FOUND for most of the operations.
Is it then safe to just free the per Io Obj in this case?
I also discovered, that this is happening more often, when I run my server app and the client app on the same machine. It happens from time to time, that the server is then not able to complete write operations. I thought that this is happening because the client side receive buffer gets to full. (The client side does not stop to receive data!).
Code snipped follows as soon as possible.
The 'linger' setting can used to reset the connection, but that way you will (a) lose data and (b) deliver a reset to the peer, which may terrify it.
If you're thinking of a positive linger timeout, it doesn't really help.
Shutdown for read should terminate read operations, but shutdown for write only gets queued after pending writes so it doesn't help at all.
If pending writes are the problem, and not completing, they will have to be cancelled.
Related
I've recently begun to implement a UDP socket receiver with Registered I/O on Win32. I've stumbled upon the following issue: I don't see any way to cancel pending RIOReceive()/RIOReceiveEx() operations without closing the socket.
To summarize the basic situation:
During normal operation I want to queue quite a few RIOReceive()/RIOReceiveEx() operations in the request queue to ensure that I get the best possible performance when it comes to receiving the UDP packets.
However, at some point I may want to stop what I'm doing. If UDP packets are still arriving at that point, fine, I can just wait until all pending requests have been processed. Unfortunately, if the sender has also stopped sending UDP packets, I still have the pending receive operations.
That in and by itself is not a problem, because I can just keep going once operations start again.
However, if I want to reconfigure the buffers used in between, I run into an issue. Because the documentation states that it's an error to deregister a buffer with RIO while it's still in use, but as long as receive operations are still pending, the buffers are still officially in use, so I can't do that.
What I've tried so far related to cancellation of these requests:
CancelIo() on the socket (no effect)
CancelSynchronousIo(GetCurrentThread()) (no effect)
shutdown(s, SD_RECEIVE) (success, but no effect, the socket even receives packets afterwards -- though shutdown probably wouldn't have been helpful anyway)
WSAIoctl(s, SIO_FLUSH, ...) because the docs of RIOReceiveEx() mentioned it, but that just gives me WSAEOPNOTSUPP on UDP sockets (probably only useful for TCP and probably also only useful for sending, not receiving)
Just for fun I tried to set setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, ...) with 1ms as the timeout -- and that doesn't appear to have any effect on RIO regardless of whether I call it before or after I queue the RIOReceive()/RIOReceiveEx() calls
Only closing the socket will successfully cancel the I/O.
I've also thought about doing RIOCloseCompletionQueue(), but there I wouldn't even know how to proceed afterwards, since there's no way of reassigning a completion queue to a request queue, as far as I can see, and you can only ever create a single request queue for a socket. (If there was something like RIOCloseRequestQueue() and that did cancel the pending requests, I'd be happy, but the docs only mention that closesocket() will free resources associated with the request queue.)
So what I'm left with is the following:
Either I have to write my logic so that the buffers that are being used are always fixed once the socket is opened, because I can't really ever change them in practice due to requests that could still be pending.
Or I have to close the socket and reopen it every time I want to change something here. But that is a race condition, because I'd have to bind the socket again, and I'd really like to avoid that if possible.
I've tested sending UDP packets to my own socket from a newly created different socket until all of the requests have been 'eaten up' -- and while that works in principle, I really don't like it, because if any kind of firewall rule decides to not allow this, the code would deadlock instantly.
On Linux io_uring I can just cancel existing operations, or even exit the uring, and once that's done, I'm sure that there are no receive operations still active, but the socket is still there and accessible. (And on Linux it's nice that the socket still behaves like a normal socket, on Windows if I create the socket with the WSA_FLAG_REGISTERED_IO flag, I can't use it outside of RIO except for operations such as bind().)
Am I missing something here or is this simply not possible with Registered I/O?
I'm using PostgreSQL 8.3, and writing a program in C++ that uses the libpq API. I execute commands asynchronously with the PQsendQuery() function. I'm trying to implement a timeout processing feature. I implemented it by calling PQcancel() when the timeout expires. I tested it with a query that returns 100 000 rows (it lasts about 0.5 s) with a timeout of 1 ms, and found that instead of cancelling the command, PQcancel() blocks until the server finishes execution, then returns with a successful query.
I understand that the documentation says that even with a successful cancel request the query may still be executed. My problem is that PQcancel() blocks my thread of execution, which is not acceptable because I use asynchronous processing (using the Boost Asio framework) so my program, which may have other tasks to do other than executing the SQL query, runs only on one thread.
Is it normal that PQcancel() blocks? Is there any way to make a non-blocking cancel request?
I looked at the implementation of PQcancel. It creates a separate TCP connection to the server, that's why it is blocking. This code part is exactly the same in the newest version of PostgreSQL too. So I concluded that there is no way to make it nonblocking other than starting the cancel in a separate thread. This is also the preferred way of using this feature, as the cancel object is completely independent from the connection object thus it is completely thread safe to use.
It sounds like you are doing this on a blocking connection. Check the documentation for PQsetnonblocking, set the connection to non-blocking and you should be able to get PQCancel to return immediately. But it will also make all operations on the connection non-blocking.
For every single tutorials and examples I have seen on the internet for Linux/Unix socket tutorials, the server side code always involves an infinite loop that checks for client connection every single time.
Example:
http://www.thegeekstuff.com/2011/12/c-socket-programming/
http://tldp.org/LDP/LG/issue74/tougher.html#3.2
Is there a more efficient way to structure the server side code so that it does not involve an infinite loop, or code the infinite loop in a way that it will take up less system resource?
the infinite loop in those examples is already efficient. the call to accept() is a blocking call: the function does not return until there is a client connecting to the server. code execution for the thread which called the accept() function is halted, and does not take any processing power.
think of accept() as a call to join() or like a wait on a mutex/lock/semaphore.
of course, there are many other ways to handle incoming connection, but those other ways deal with the blocking nature of accept(). this function is difficult to cancel, so there exists non-blocking alternatives which will allow the server to perform other actions while waiting for an incoming connection. one such alternative is using select(). other alternatives are less portable as they involve low-level operating system calls to signal the connection through a callback function, an event or any other asynchronous mechanism handled by the operating system...
For C++ you could look into boost.asio. You could also look into e.g. asynchronous I/O functions. There is also SIGIO.
Of course, even when using these asynchronous methods, your main program still needs to sit in a loop, or the program will exit.
The infinite loop is there to maintain the server's running state, so when a client connection is accepted, the server won't quit immediately afterwards, instead it'll go back to listening for another client connection.
The listen() call is a blocking one - that is to say, it waits until it receives data. It does this is an extremely efficient way, using zero system resources (until a connection is made, of course) by making use of the operating systems network drivers that trigger an event (or hardware interrupt) that wakes the listening thread up.
Here's a good overview of what techniques are available - The C10K problem.
When you are implementing a server that listens for possibly infinite connections, there is imo no way around some sort of infinite loops. Usually this is not a problem at all, because when your socket is not marked as non-blocking, the call to accept() will block until a new connection arrives. Due to this blocking, no system resources are wasted.
Other libraries that provide like an event-based system are ultimately implemented in the way described above.
In addition to what has already been posted, it's fairly easy to see what is going on with a debugger. You will be able to single-step through until you execute the accept() line, upon which the 'sigle-step' highlight will disappear and the app will run on - the next line is not reached. If you put a breadkpoint on the next line, it will not fire until a client connects.
We need to follow the best practice on writing client -server programing. The best guide I can recommend you at this time is The C10K Problem . There are specific stuff we need to follow in this case. We can go for using select or poll or epoll. Each have there own advantages and disadvantages.
If you are running you code using latest kernel version, then I would recommend to go for epoll. Click to see sample program to understand epoll.
If you are using select, poll, epoll then you will be blocked until you get an event / trigger so that your server will not run in to infinite loop by consuming your system time.
On my personal experience, I feel epoll is the best way to go further as I observed the threshold of my server machine on having 80k ACTIVE connection was very less on comparing it will select and poll. The load average of my server machine was just 3.2 on having 80k active connection :)
On testing with poll, I find my server load average went up to 7.8 on reaching 30k active client connection :(.
I am currently testing my network application in very low bandwidth environments. I currently have code that attempts to ensure that the connection is good by making sure I am still receiving information.
Traditionally I have done this by recording the timestamp in my ReadHandler function so that each time it gets called I know I have received data on the socket. With very low bandwidths this isn't sufficient because my ReadHandler is not getting called frequently enough.
I was toying around with the idea of writing my own completion condition function (right now I am using tranfer_at_least(1)) thinking it would get called more frequently and I could record my timestamp there, but I was wondering if there wasn't some other more standard way to go about this.
We had a similar issue in production: some of our connections may be idle for days, but we must detect if the remote is dead ASAP.
We solved it by enabling the TCP_KEEPALIVE option:
boost::asio::socket_base::keep_alive option(true);
mSocketTCP.set_option(option);
which had to be accompanied by new startup script that writes sensible values to /proc/sys/net/ipv4/tcp_keepalive_* which have very long timeouts by default (on LInux)
You can use the read_some method to get partial reads, and deal with the book keeping. This is more efficient than transfer_at_least(1), but you still have to keep track of what is going on.
However, a cleaner approach is just to use a concurrent deadline_timer. If the timer goes off before you are finished, then is taking too long and cancel whatever is going on. If not, just stop the timer and continue. Something like:
boost::asio::deadline_timer t;
t.expires_from_now(boost::posix_time::seconds(20));
t.async_wait(bind(&Class::timed_out, this, _1));
// Do stuff.
if (!t.cancel()) {
// Timer went off, abort
}
// And the timeout method
void Class::timed_out(error_code const& error)
{
if (error == boost::asio::error::operation_aborted) return;
// Deal with the timeout, close the socket, etc.
}
I don't know how to handle low latency of network from within application. Can you be sure if it's network latency, or if peer server or peer application busy and react slowly. Does it matter if it network/server/application quilt?
Even if you can discover network latency and find it's big, what are you going to do?
You can not improve the situation.
Consider other critical case which is a subset of what you're trying to handle - network is down (e.g. you disconnect cable from your machine). Since it a subset of your problem you want to handle it too.
Let's examine the network down effect on active TCP connection.How can you discover your active TCP connection is still alive? Calling send() will success, but it merely says that the message queued in TCP outgoing queue in kernel. TCP stack will try to send it, but since TCP ACK won't be sent back, TCP stack on your side will try to resend it again and again. You can see your message in netstat output (Send-Q column).
I'm aware of the following ways to deal with it:
One standard way is TCP keep alive proposed #Cubby.
Another way is to implement Keep Alive mechanism. Send Keep Alive req message and peer is obligated to send back Keep Alive ack message.
If you don't receive ack message after predefined timeout, try to send Keep Alive req N more times (e.g. N=2). If still no success, close the socket and open it again. If peer server is not available you'll not be abable to open connection, since TCP 3 way handshake requires peer to respond.
For THIS reason, I want to try something new - close the socket using some system call.
The situation in two words - can't set query timeout of the mysql library (the C API, refer to the link for more info), so I want to try closing the socket to see how the library will react. Probably this is not a good idea, but still wanna try it.
Here's what I've done - there's another started thread - a timer. So, after a specific timeout (let's say 10 second), if there's no response, I want to close the socket. The MYSQL struct has member net, that is also a struct, and holds the fd. But when I try to do this:
shutdown( m_pOwner->m_ptrDBConnection->m_mysql.net.fd, SHUT_RDWR );
close( m_pOwner->m_ptrDBConnection->m_mysql.net.fd );
nothing happens. The returned values from shutdown and close are 0, but the socket is still opened (because after 60sec waiting, there's a returned result from the DB, that means that the mysql client is still waiting for response from the DB.
Any ideas?
Thanks
EDIT - Yes, there's a running transaction, while I'm trying to close the socket. But this is the actual problem - I cannot terminate the query, nor to close the connection, nothing, and I don't wanna wait the whole timeout, which is 20min and 30 sec, or something like this. That's why I'm looking for a brute-force.. :/
Just a shot in the dark, but make sure you cancel/terminate any running transactions. I'm not familiar with the MySQL C API, but I would imagine there is a way to check if there are any active connections/queries. You may not be able to close the socket simply because there are still things running, and they need to be brought to some "resolved" state, be that either committed or rolled back. I would begin there and see what happens. You really don't want to shutdown the socket "brute force" style if you have anything pending anyway because your data would not be in a reliable "state" afterwards - you would not know what transactions succeeded and which ones did not, although I would imagine that MySQL would rollback any pending transactions if the connection failed abruptly.
EDIT:
From what I have found via Googling "MySQL stopping runaway query", the consensus seems to be to ask MySQL to terminate the thread of the runaway/long-running query using
KILL thread-id
I would imagine that the thread ID is available to you in the MySQL data structure that contains the socket. You may want to try this, although IIRC to do so requires super user priviledges.
EDIT #2:
Apparently MySQL provides a fail-safe mechanism that will restart a closed connection, so forcefully shutting down the socket will not actually terminate the query. Once you close it, MySQL will open another and attempt to complete the query. Turning this off will allow you to close the socket and cause the query to terminate.
The comments below show how the answer was found, and the thought process involved therein.
It looks like you are running into an issue with the TCP wait timer, meaning it will close eventually. [Long story short] it is sort of unavoidable. There was another discussion on this.
close vs shutdown socket?
As far as I know, If shutdown() and close() both return 0 there's no doubt you had successfully closed a socket. The fact is that you could have closed the wrong fd. Or the server could not react properly to a correct shutdown (if so, this could be considered a bug of the server: no reason to still wait for data incoming). I'd keep looking for a supported way to do this.