I have a application using pthreads and prior to C++11 is in use. We have several worker threads assigned for several purposes and tasks get distributed in producer-consumer way through shared circular pool of task data. Posix semaphores have been used to do inter-thread synchronizations in both wait/notify mode as well as mutex locks for shared data to ensure mutual exclusions.
Recently, noticing a strange problem with large volume of data that program seems to hang with signal 1 received. Signal 1 is basically a SIGHUP, that means hang-up, this signal is usually used to report that the user's terminal is disconnected, perhaps because a network or telephone connection was broken.
Can this be caused because the parent terminal time-outing? If so, can nohup help?
This occurs only for large volume of data (didn't notice with smaller volume) and the application is being run from command line from a solaris terminal (telnet session).
Thoughts, welcome.
Related
The following is a programming problem I am trying to solve...
I am creating a (C++ 11) server application that manages operations being performed on other servers. Thus far, I have designed it to do the following:
start a thread for each remote server (with operations on it that are being managed), that polls the server to make sure it's still there/working.
start a thread for each operation that it is managing, that polls the remote server to get info about the current state of the operation.
Where I am running into issues is... The calls to the remote server to verify that it is there/working, and the calls to get the status of the operations being managed on the remote server are "waited" operations. If something stops working (either a remote server stops responding, or I am not able to get the state of an operation running on a remote server, I WAS attempting to just "kill" the thread that monitors that server or operation, and flag it (the server or operation) as dead/non-responsive. Unfortunately, there does not seem to be any clean way to kill a thread.
Note that I was trying to "kill" threads because many of the calls that I am performing are "waited" operations, and I can not have the application hang because it is waiting for completion of an operation (that in some cases may never complete). I have done some research on ways to "kill" an active thread (stop it from a manager thread). One of the only C++ thread management libraries that I could find that even supported a C++ method to "kill" a thread is the "ZThread" library, but using it's kill method doesn't seem to work consistently.
I need to find a way to start executing a segment of code (ie: a thread), that will be performing some waited operation, and kill execution of that code when needed (not wait for the operation to complete.
Suggestions???
I have a problem with my multithreaded networking server program.
I have a main thread that is listening for new client connections. I use Linux epoll to get I/O event notifications. For each incoming event, I create a thread that accept() the new connection and assign a fd to it. Under heavy loading, it can occur that the same fd is assigned twice causing my program to crash.
My question is: how can the system re-assign a fd that is still used by another thread?
Thanks,
Presumably there is a race condition here - but without seeing your code it's hard to diagnose.
You would be better to accept on the Main thread and then pass the accepted socket to the new thread.
If you pass your listening socket to a new thread to then perform the accept - you're going to hit a race condition.
For further information you can look here: https://stackoverflow.com/a/4687952/516138
And this is a good background on networking efficiency (although perhaps a bit out of date).
You should call accept() on the same thread that you are calling epoll() on. Otherwise you are inviting race conditions.
File descriptors are modified in a "per process basis". This means that they are unique for each process. This means that multiple threads can share the same file descriptors in the same process.
Having an accept syscall returning the same file descriptor inside the same process is a very strong indication that some of your threads are closing the previous "version" of the repeated file descriptor.
Issues like this one may be difficult to debug in complex software. A way to identify that in Linux system is to use the strace command. One can run strace -f -e trace=close,accept4,accept,pipe,open <your program>. That's going to output on your screen the respective syscalls specified in the command along with which thread is calling it.
I'm using C++ and I need the equivalent of SIGCHLD for a process I'm aware of (i.e. I know it's pid), but did not spawn.
Is there a well established design pattern to listen/watch/monitor another process's lifespan when it is not your child or in your group or session?
EDIT: I am specifically trying to be aware of abnormal terminations (i.e. seg faults, signals, etc...). I would like to eavesdrop on signals the process in question receives.
I don't know if it follows a specific pattern, per se, but one technique is to have the process establish a connection to the watcher. The watcher monitors the connection, and when it becomes closed, it knows the process has shutdown.
If the watcher wants to know if the watched process is responsive or not, you can use the connection to monitor heartbeat messages that the process is obliged to provide.
If the watcher wants to know whether the watched process is making progress, the heartbeat message could provide state information that would allow the watcher to monitor that.
Different operating systems may provide different ways to achieve the same objective. For example, on Linux, the watcher could use inotify to monitor the /proc entry for that process to determine if the process is up or down. The BSD kqueue has a similar capability. The process could export its heartbeat/state into shared memory, and the watcher could use a timed wait on a semaphore to see if the data is being updated.
If the process is a third-party program, and source is not available, then you would have to resort to some method similar to inotify/kqueue, or as a last resort, poll the kernel state (similar to the way the top utility works).
Given: multithreaded (~20 threads) C++ application under RHEL 5.3.
When testing under load, top shows that CPU usage jumps in range 10-40% every second.
The design mostly pretty simple - most of the threads implement active object design pattern: thread has a thread-safe queue, requests from other queues are pushed to the queue, while the thread only polling on the queue and process incomming requests. Processed request causes to a new request to be pushed to next processing thread.
The process has several TCP/UDP connection over each a data is received/sent in a high load.
I know I did not provided sufficiant data. This is pretty big application, and I'n not familiar well with all it's parts. It's now ported from Windows on Linux over ACE library (used for networking part).
Suppusing the problem is in the application and not external one, what are the techicues/tools/approaches can be used to discover the problem. For example I suspect that this maybe caused by some mutex contention.
I have faced similar problem some time back and here are the steps that helped me.
1) Start with using strace to see where the application is spending the time executing system calls.
2) Use OProfile to profile both the application and the kernel.
3) If you are using an SMP system , look at the numa settings,
In my case that caused a havoc .
/proc/appPID/numa_maps will give a quick look at how the access to the memory is happening.
numa misses can cause the jumps.
4) You have mentioned about TCP connections in your app.
Look at the MTU size and see its set to right value and
Depending upon the type of Data getting transferred use the Nagles Delay appropriately.
Nagles Delay
I'm designing a networking framework which uses WSAEventSelect for asynchronous operations. I spawn one thread for every 64th socket due to the max 64 events per thread limitation, and everything works as expected except for one thing:
Threads keep getting spawned uncontrollably by Winsock during connect and disconnect, threads that won't go away.
With the current design of the framework, two threads should be running when only a few sockets are active. And as expected, two threads are running in total. However, when I connect with a few sockets (1-5 sockets), an additional 3 threads are spawn which persist until I close the application. Also, when I lose connection on any of the sockets, 2 more threads are spawned (also persisting until closure). That's 7 threads in total, 5 of which I have no idea what they are there for.
If they are required by Winsock for connecting or whatever and then disappeared, that would be fine. But it bothers me that they persist until I close my application.
Is there anyone who could shed some light on this? Possibly a solution to avoid these threads or force them to close when no connections are active?
(Application is written in C++ with Win32 and Winsock 2.2)
Information from Process Explorer:
Expected threads:
MyApp.exe!WinMainCRTStartup
MyApp.exe!Netfw::NetworkThread::ThreadProc
Unexpected threads:
ntdll.dll!RtlpUnWaitCriticalSection+0x2dc
mswsock.dll+0x7426
ntdll.dll!RtlGetCurrentPeb+0x155
ntdll.dll!RtlGetCurrentPeb+0x155
ntdll.dll!RtlGetCurrentPeb+0x155
All of the unexpected threads have call stacks with calls to functions such as ntkrnlpa.exe!IoSetCompletionRoutineEx+0x46e which probably means it is a part of the notification mechanism.
Download the sysinternals tool process explorer. Install the appropriate debugging tools for windows. In process explorer, set Options -> Symbols path to:
SRV*C:\Websymbols*http://msdl.microsoft.com/download/symbols
Where C:\Websymbols is just a place to store the symbol cache (I'd create a new empty directory for it.)
Now, you can inspect your program with process explorer. Double click the process, go to the threads tab, and it will show you where the threads started, how busy they are, and what their current callstack is.
That usually gives you a very good idea of what the threads are. If they're Winsock internal threads, I wouldn't worry about them, even if there are hundreds.
One direction to look in (just a guess): If these are TCP connections, these may be background threads to handle internal TCP-related timers. I don't know why they would use one thread per connection, but something has to do the background work there.