Break from hanging poll with gdb - gdb

There is a process (I have the source) which stuck in an invalid state because its poll call waits for an invalid fd to become writable.
As this will never happen, the poll blocks forever.
Is it somehow possible to force this poll operation to quit?
Gdb is attached, and my desire is to see how would continue the app after the poll returns.
#0 0xb673e120 in poll () at ../sysdeps/unix/syscall-template.S:84
Is this something I can achieve without restarting the app?

Is it somehow possible to force this poll operation to quit?
Yes, you can force current stack frame to return prematurely with return command. You can also choose any appropriate value as a function’s return value. See documentation here: https://sourceware.org/gdb/current/onlinedocs/gdb/Returning.html#Returning.

Related

How to wakeup Select call without timeout period from another thread

I am searching solution to wake-up select call in c++, As per application requirement i cant set timeout because of multiple thread using select system call.
Please see below scenario.
i want to wakeup select system call waiting on other thread. I tried to write data on the thread from main thread but still it is not able to wakeup it.
I want to close thread and socket if there is empty data on this thread.
It is wakes up select call if socket connection is close from other process, but not working with thread.
Does any one have idea regarding this
On a recent Linux you can use eventfd, on everything in general - a pipe, usage - register one side of the pipe in selector for readability along with actual socket(s), to wake up a selector - just write one byte to the other end of the pipe. Alternatively (if your libc has it) you can use pselect with a sigmask to catch the ALRM signal and raise that signal whenever you need to wake the selector up. Be very careful with using signals approach in a multithreaded application (as "I would not use"), as if not done right a signal may be delivered to a random thread.
Thanks all for valuable suggestion, I am able to resolve the issue with shutdown() call on socket FD using reference answer present on this link, it will pass wakeup signal to select, which is waiting for action. We should close socket only after select call otherwise select will not able to get wake up signal.

signal handling when process is waiting for another process to terminate

I am just trying to understand the concept of signal handling with respective from kernel and user mode for the running process.
PROCESS-1 --------------------> PROCESS-3
(parent process) <-------------------
^ process-3 sending signals(SIGPIPE-for communication) or
|| SIGSTOP or SIGKILL to process-1
||
||
||process-1 waiting for child process-2
|| using waitpid command.
||
v
PROCESS-2(waiting for resource, page fault happened, etc)
(child process)
I want to know how kernel sends the signal from process-3 to process-1 knowing that process-1 is waiting for process-2 to finish. Would like to know more about the user and kernel communication during the signal handling scenario(PCB,resources,open file descriptors etc.). Please explain related to this context..
Any help given is thankful..!!!
The kernel doesn't really care that process-1 is "waiting for process-2 to finish" (in particular it's not interested in "why" it's in the state it is, merely that it is in some state: in this case, idling in the kernel waiting for some event). For typical1 caught signals, the signal-sender essentially just sets some bit(s) in the signal-receiver's process/thread state, and then if appropriate, schedules that process/thread to run so that it can see those bits. If the receiver is idling in the kernel waiting for some event, that's one of the "schedule to run" cases. (Other common situations include: the receiver is in STOP state, where it stays stopped except for SIGCONT signals; or, the receiver is running in user mode, where it is set up to transition to kernel mode so as to notice the pending signals.)
Both SIGKILL and SIGSTOP cannot be caught or ignored, so, no, you cannot provide a handler for these. (Normally processes are put into stop state via SIGTSTP, SIGTTIN, or SIGTTOU, all of which can be caught or ignored.)
If system calls are set to restart after a user signal handler returns (via the SA_RESTART flag of sigaction()), this is achieved by setting up the "return address" for the sigreturn() operation to, in fact, make the system call over again. That is, if process-1 is in waitpid(), the sequence of operations (from process-1's point of view) from the point of the initial waitpid(), through receiving a caught signal s, and back to more waiting, is:
system call: waitpid()
put self to sleep waiting for an event
awakened: check for awakening event
event is signal and signal is caught, so:
set new signal mask per sigaction() settings (see sigaction())
push signal frame on a stack (see SA_ONSTACK and sigaltstack())
set up for user code (program counter) to enter at "signal trampoline"
return to user code (into trampoline)
(At this point process-1 is back in user mode. The remaining steps are not numbered because I can't make SO start at 9. :-) )
call user handler routine (still on stack chosen above)
when user routine returns, execute sigreturn() system call,
using the frame stored at setup time, possibly modified
by user routine
(At this point the process enters kernel mode, to execute sigreturn() system call)
system call: sigreturn(): set signal mask specified by sigreturn() argument
set other registers, including stack pointer(s) and
program counter, as specified by sigreturn() arguments
return to user code
(the program is now back in user mode, with registers set up to enter waitpid)
system call: waitpid()
At this point the process returns to the same state it had before it received the caught signal: waitpid puts it to sleep waiting for an event (step 2). Once awakened (step 3), either the event it was waiting for has occurred (e.g., the process being waitpid()-ed is done) and it can return normally, or another caught signal has occurred and it should repeat this sequence, or it is being killed and it should clean up, or whatever.
This sequence is why some system calls (such as some read()-like system calls) will "return early" if interrupted by a signal: they've done something irreversible between the "first" entry into the kernel and the time the signal handler is to be run. In this case, the signal frame pushed at step 6 must not have a program-counter value that causes the entire system call to restart. If it did, the irreversible work done before the process went to sleep would be lost. So, it is instead set up to return to the instruction that detects a successful system call, with the register values set up to return the short read() count, or whatever.
When system calls are set up not to restart (SA_RESTART is not set), the signal frame pushed in step 6 is also different. Instead of returning to the instruction that executes the system call, it returns to the instruction that detects a failed system call, with the register values set up to indicate an EINTR error.
(Often, but not always, these are the same instruction, e.g., a conditional branch to test for success/fail. In my original SPARC port, I made them different instructions in most cases. Since leaf routines return to %o6+8 with no register or stack manipulation, I just set a bit indicating that a successful return should return to the leaf routine's return address. So most system calls were just "put syscall number and ret-on-success flag into %g1, then trap to kernel, then jump-to-error-handling because the system call must have failed if we got here.")
1Versus queued signals.

Executing new task based on sigchld() from previous task

I'm currently in the process of building a small shell within C++.
A user may enter a job at the prompt such as exe1 && exe2 &. Similar to the BASH shell, I will only execute exe2 if exe1 exits successfully. In addition, the entire job must be performed in the background (as specified by the trailing & operator).
Right now, I have a jobManager which handles execution of jobs and a job structure which contains the job's executable and their individual arguments / conditions. A job is started by calling fork() and then calling execvp() with the proper arguments. When a job ends, I have a signal handler for SIGCHLD, in which I perform wait() to determine which process has just ended. When exe1 ends, I observe its exit code and make a determination as to whether I should proceed to launch exe2.
My concern is how do I launch exe2. I am concerned that if I use my jobManager start function from the context of my SIGCHLD handler, I could end up with too many SIGCHLD handler functions hanging out on the stack (if there were 10 conditional executions, for instance). In addition, it just doesn't seem like a good idea to be starting the next execution from the signal handler, even if it is occurring indirectly. (I tried doing something similar 1.5 years ago when I was just learning about signal handling -- I seem to recall it failing on me).
All of the above needs to be able to occur in the background and I want to avoid having the jobManager sitting in a busy wait just waiting for exe1 to return. I would also prefer to not have a separate thread sitting around just waiting to start the execution of another process. However, instructing my jobManager to begin execution of the next process from the SIGCHLD handler seems like poor code.
Any feedback is appriciated.
I see two ways:
1)Replace you sighandler with loop that call "sigwait" (see man 3 sigwait)
then in loop
2)before start create pipe, and in mainloop of your program use "select" on pipe handle to wait
events. In signal handler write to pipe, and in mainloop handle situation.
Hmmm that's a good one.
What about forking twice, once per process? The first one runs, and the second one stops. In the parent SIGCHLD handler, send a SIGCONT to the second child, if appropriate, which then goes off and runs the job. Naturally, you SIGKILL the second one if the first one shouldn't run, which should be safe because you won't really have set anything up.
How does that sound? You'll have a process sitting around doing nothing, but it shouldn't be for very long.

Is it possible to detect 'end process' externally?

Is there some way to detect that a program was ended by windows task manager's "end process"?
I know that its kinda impossible to do that from within the application being ended (other than to build your app as a driver and hook ZwTerminateProcess), but I wonder if there is a way to notice it from outside.
I don't want to stop the program from terminating, just to know that it was ended by "end process" (and not by any other way).
There might be a better way - but how about using a simple flag?
Naturally, you'd have to persist this flag somewhere outside of the process/program's memory - like the registry, database, or file system. Essentially, when the app starts up, you set the flag to 'True' when the app shuts down through the normal means, you set the flag to 'False'.
Each time the application starts you can check the flag to see if it was not shut down correctly the previous time it was executed.
Open up a handle to the process with OpenProcess, and then wait on that handle using one of the wait functions such as WaitForSingleObject. You can get the exit status of the process using GetExitCodeProcess. If you need your program to remain responsive to user input while waiting, then make sure to wait on a separate thread (or you can periodically poll using a timeout of zero, but remember the performance consequences of polling -- not recommended).
When you're done, don't forget to call CloseHandle. The process object won't be fully deleted from the OS until all of its handles are closed, so you'll leak resources if you forget to call CloseHandle.
Note that there's no way to distinguish between a process exiting normally or being terminated forcefully. Even if you have a convention that your program only ever exits with a status of 0 (success) or 1 (failure) normally, some other process could call TerminateProcess(YourProcess, 1), and that would be indistinguishable from your ordinary failure mode.
According to the documentation, ExitProcess calls the entry point of all loaded DLLs with DLL_PROCESS_DETACH, whereas TerminateProcess does not. (Exiting the main function results in a call to ExitProcess, as do most unhandled exceptions.)
You might also want to look into Application Recovery and Restart.
One option might be to create a "watchdog" application (installed as a service, perhaps) that monitors WMI events for stopping a process via the ManagementEventWatcher class (in the System.Management namespace).
You could query for the death of your process on an interval or come up with some event driven way to alert of your process's demise.
Here's sort of an example (it's in C# though) that could get you started.

SIGALRM Timeout -- How does it affect existing operations?

I am currently using select() to act as a timer. I have network receive operations which occur within a loop, and every few seconds a processing operation needs to occur.
As a result, the select() statement's timeout constantly changes -- decreasing to zero over time and then restarting at 3.
rv = select(selectmonitor+1, &readnet, NULL, NULL, &helper.timeout());
As things come in on the network, the statement is repeated and the value passed to it by helper.timeout() decreases. Eventually, the value will either be equal to zero or the system will timeout, which will result in the processing function executing. However, I've noticed that this is quite resource intensive -- the value for helper.timeout() must be constantly calculated. When I am trying to receive a few thousand packets a second, the time it takes for this operation to be done results in packet loss.
My new idea is using SIGALRM to resolve this. This would allow me to set a timer once and then react when it is set off. However, I'm confused as to how it will affect my program. When SIGALRM 'goes off', it will run a function which I specify. However, how will it interrupt my existing code? And once the function is done, how will my existing code (within the while statement) resume?
Also, it would appear its impossible to set the SIGALRM signal to call a function within a class? Is this correct? I imagine I can call a function, which can in turn call a function within a class..
Thanks for any assistance ahead of time.
Use your SIGALRM handler to set a flag variable.
Use sigaction instead of signal to set your signal handler. Do not set the SA_RESTART flag. With SA_RESTART not set your select statement will be interrupted by the signal. Your select will return -1 and errno will be EINTR.
Since the signal might happen while your other code is executing you will want to check the flag variable too, probably right before going into the select.
I was just reminded that this pattern can result in missing the signal, if it happens just after checking the flag and just before entering the select.
To avoid that, you need to use sigprocmask to block the SIGALRM signal before entering the while loop. Then you use pselect instead of select. By giving pselect a signal mask with SIGALRM unmasked, the signal will end up always interrupting during the select instead of happening at any other time.