The following is a programming problem I am trying to solve...
I am creating a (C++ 11) server application that manages operations being performed on other servers. Thus far, I have designed it to do the following:
start a thread for each remote server (with operations on it that are being managed), that polls the server to make sure it's still there/working.
start a thread for each operation that it is managing, that polls the remote server to get info about the current state of the operation.
Where I am running into issues is... The calls to the remote server to verify that it is there/working, and the calls to get the status of the operations being managed on the remote server are "waited" operations. If something stops working (either a remote server stops responding, or I am not able to get the state of an operation running on a remote server, I WAS attempting to just "kill" the thread that monitors that server or operation, and flag it (the server or operation) as dead/non-responsive. Unfortunately, there does not seem to be any clean way to kill a thread.
Note that I was trying to "kill" threads because many of the calls that I am performing are "waited" operations, and I can not have the application hang because it is waiting for completion of an operation (that in some cases may never complete). I have done some research on ways to "kill" an active thread (stop it from a manager thread). One of the only C++ thread management libraries that I could find that even supported a C++ method to "kill" a thread is the "ZThread" library, but using it's kill method doesn't seem to work consistently.
I need to find a way to start executing a segment of code (ie: a thread), that will be performing some waited operation, and kill execution of that code when needed (not wait for the operation to complete.
Suggestions???
Related
I am currently working on a server application in C++. My main inspirations are these examples:
Windows SDK IOCP Excample
The I/O Completion Port IPv4/IPv6 Server Program Example
My app is strongly similar to these (socketobj, packageobj, ...).
In general, my app is running without issues. The only things which still causes me troubles are half open connections.
My strategy for this is: I check every connected client in a time period and count an "idle counter" up. If one completion occurs, I reset this timer. If the Idle counter goes too high, I set a boolean to prevent other threads from posting operations, and then call closesocket().
My assumption was that now the socket is closed, the pending operations will complete (maybe not instantly but after a time). This is also the behavior the MSDN documentation is describing (hints, second paragraph). I need this because only after all operations are completed can I free the resources.
Long story short: this is not the case for me. I did some tests with my testclient app and some cout and breakpoint debugging, and discovered that pending operations for closed sockets are not completing (even after waiting 10 min). I also already tried with a shutdown() call before the closesocket(), and both returned no error.
What am I doing wrong? Does this happen to anyone else? Is the MSDN documentation wrong? What are the alternatives?
I am currently thinking of the "linger" functionality, or to cancel every operation explicitly with the CancelIoEx() function
Edit: (thank you for your responses)
Yesterday evening I added a chained list for every sockedobj to hold the per io obj of the pending operations. With this I tried the CancelIOEx() function. The function returned 0 and GetLastError() returned ERROR_NOT_FOUND for most of the operations.
Is it then safe to just free the per Io Obj in this case?
I also discovered, that this is happening more often, when I run my server app and the client app on the same machine. It happens from time to time, that the server is then not able to complete write operations. I thought that this is happening because the client side receive buffer gets to full. (The client side does not stop to receive data!).
Code snipped follows as soon as possible.
The 'linger' setting can used to reset the connection, but that way you will (a) lose data and (b) deliver a reset to the peer, which may terrify it.
If you're thinking of a positive linger timeout, it doesn't really help.
Shutdown for read should terminate read operations, but shutdown for write only gets queued after pending writes so it doesn't help at all.
If pending writes are the problem, and not completing, they will have to be cancelled.
I have a application using pthreads and prior to C++11 is in use. We have several worker threads assigned for several purposes and tasks get distributed in producer-consumer way through shared circular pool of task data. Posix semaphores have been used to do inter-thread synchronizations in both wait/notify mode as well as mutex locks for shared data to ensure mutual exclusions.
Recently, noticing a strange problem with large volume of data that program seems to hang with signal 1 received. Signal 1 is basically a SIGHUP, that means hang-up, this signal is usually used to report that the user's terminal is disconnected, perhaps because a network or telephone connection was broken.
Can this be caused because the parent terminal time-outing? If so, can nohup help?
This occurs only for large volume of data (didn't notice with smaller volume) and the application is being run from command line from a solaris terminal (telnet session).
Thoughts, welcome.
For every single tutorials and examples I have seen on the internet for Linux/Unix socket tutorials, the server side code always involves an infinite loop that checks for client connection every single time.
Example:
http://www.thegeekstuff.com/2011/12/c-socket-programming/
http://tldp.org/LDP/LG/issue74/tougher.html#3.2
Is there a more efficient way to structure the server side code so that it does not involve an infinite loop, or code the infinite loop in a way that it will take up less system resource?
the infinite loop in those examples is already efficient. the call to accept() is a blocking call: the function does not return until there is a client connecting to the server. code execution for the thread which called the accept() function is halted, and does not take any processing power.
think of accept() as a call to join() or like a wait on a mutex/lock/semaphore.
of course, there are many other ways to handle incoming connection, but those other ways deal with the blocking nature of accept(). this function is difficult to cancel, so there exists non-blocking alternatives which will allow the server to perform other actions while waiting for an incoming connection. one such alternative is using select(). other alternatives are less portable as they involve low-level operating system calls to signal the connection through a callback function, an event or any other asynchronous mechanism handled by the operating system...
For C++ you could look into boost.asio. You could also look into e.g. asynchronous I/O functions. There is also SIGIO.
Of course, even when using these asynchronous methods, your main program still needs to sit in a loop, or the program will exit.
The infinite loop is there to maintain the server's running state, so when a client connection is accepted, the server won't quit immediately afterwards, instead it'll go back to listening for another client connection.
The listen() call is a blocking one - that is to say, it waits until it receives data. It does this is an extremely efficient way, using zero system resources (until a connection is made, of course) by making use of the operating systems network drivers that trigger an event (or hardware interrupt) that wakes the listening thread up.
Here's a good overview of what techniques are available - The C10K problem.
When you are implementing a server that listens for possibly infinite connections, there is imo no way around some sort of infinite loops. Usually this is not a problem at all, because when your socket is not marked as non-blocking, the call to accept() will block until a new connection arrives. Due to this blocking, no system resources are wasted.
Other libraries that provide like an event-based system are ultimately implemented in the way described above.
In addition to what has already been posted, it's fairly easy to see what is going on with a debugger. You will be able to single-step through until you execute the accept() line, upon which the 'sigle-step' highlight will disappear and the app will run on - the next line is not reached. If you put a breadkpoint on the next line, it will not fire until a client connects.
We need to follow the best practice on writing client -server programing. The best guide I can recommend you at this time is The C10K Problem . There are specific stuff we need to follow in this case. We can go for using select or poll or epoll. Each have there own advantages and disadvantages.
If you are running you code using latest kernel version, then I would recommend to go for epoll. Click to see sample program to understand epoll.
If you are using select, poll, epoll then you will be blocked until you get an event / trigger so that your server will not run in to infinite loop by consuming your system time.
On my personal experience, I feel epoll is the best way to go further as I observed the threshold of my server machine on having 80k ACTIVE connection was very less on comparing it will select and poll. The load average of my server machine was just 3.2 on having 80k active connection :)
On testing with poll, I find my server load average went up to 7.8 on reaching 30k active client connection :(.
Is there some portable way to check the number of parallel instances of my app?
I have a c++ app (win32) where I need to know how often it was started. The problem is
that several user can start it parallel (terminal server), so i cannot search the "running process" list because I'm not able to access the the list of other users.
I tried it with Semaphore (boost & win32 CreateSemaphore)
It worked, but now I have the problem if the app crashes (Assertion or just kill the process) the counter is not changed. (rebooting helps)
Also manually removing/resetting the semaphore counter in my code is not possible because I don't know if somebody else is running my application.
Edited to add:
Suppose you have a license that lets you run 20 full-functionality copies of your program. Then you could have 20 mutexes, named MyProgMutex1 through MyProgMutex20. At startup, your program can loop through the mutexes. If it finds a spare mutex that it can take, it stops looping and enters full-functionality mode. If it loops through all the mutexes without being able to take any of them, then it enters reduced-functionality mode.
Original answer:
I assume you want to make sure that only one copy of your process runs at once. (Or, for Terminal Server, one copy of your process per login session).
Your named semaphore solution is close. The right way to do this is a named mutex. Use CreateMutex to make the mutex, then call WaitForSingleObject with a timeout of zero. If WaitForSingleObject returns WAIT_TIMEOUT, another copy of the process is running. If it returns WAIT_OBJECT_0 or WAIT_ABANDONED, then you are the only copy of the process. You need to keep the mutex handle open while your program runs - either call CloseHandle when your process is about to exit, or just deliberately leak the handle and rely on Window's built-in cleanup to release the handle for you when your process exits. Windows will automatically increment the mutex's counter when your process exits.
The only thing I can think of that mitigates the problem of crashed processes is a kind of “dead man’s switch”: each process needs to update its status in regular intervals. If a process fails to do this, it’s automatically discarded from the list of active processes.
This technique requires that one of the processes acts as a server which keeps tab of whether other processes have updated recently. If the server dies, then another process can take over. This, in turn, requires that each process tests whether there still is a server alive.
Alternatively, each process can be its own server and keep track locally. This may be easier to implement than server-switching.
You can broadcast message and other instances of your application should send some response. You count responses - you get number of instances.
I'm designing a networking framework which uses WSAEventSelect for asynchronous operations. I spawn one thread for every 64th socket due to the max 64 events per thread limitation, and everything works as expected except for one thing:
Threads keep getting spawned uncontrollably by Winsock during connect and disconnect, threads that won't go away.
With the current design of the framework, two threads should be running when only a few sockets are active. And as expected, two threads are running in total. However, when I connect with a few sockets (1-5 sockets), an additional 3 threads are spawn which persist until I close the application. Also, when I lose connection on any of the sockets, 2 more threads are spawned (also persisting until closure). That's 7 threads in total, 5 of which I have no idea what they are there for.
If they are required by Winsock for connecting or whatever and then disappeared, that would be fine. But it bothers me that they persist until I close my application.
Is there anyone who could shed some light on this? Possibly a solution to avoid these threads or force them to close when no connections are active?
(Application is written in C++ with Win32 and Winsock 2.2)
Information from Process Explorer:
Expected threads:
MyApp.exe!WinMainCRTStartup
MyApp.exe!Netfw::NetworkThread::ThreadProc
Unexpected threads:
ntdll.dll!RtlpUnWaitCriticalSection+0x2dc
mswsock.dll+0x7426
ntdll.dll!RtlGetCurrentPeb+0x155
ntdll.dll!RtlGetCurrentPeb+0x155
ntdll.dll!RtlGetCurrentPeb+0x155
All of the unexpected threads have call stacks with calls to functions such as ntkrnlpa.exe!IoSetCompletionRoutineEx+0x46e which probably means it is a part of the notification mechanism.
Download the sysinternals tool process explorer. Install the appropriate debugging tools for windows. In process explorer, set Options -> Symbols path to:
SRV*C:\Websymbols*http://msdl.microsoft.com/download/symbols
Where C:\Websymbols is just a place to store the symbol cache (I'd create a new empty directory for it.)
Now, you can inspect your program with process explorer. Double click the process, go to the threads tab, and it will show you where the threads started, how busy they are, and what their current callstack is.
That usually gives you a very good idea of what the threads are. If they're Winsock internal threads, I wouldn't worry about them, even if there are hundreds.
One direction to look in (just a guess): If these are TCP connections, these may be background threads to handle internal TCP-related timers. I don't know why they would use one thread per connection, but something has to do the background work there.