I am trying to make a watchdog for a single-threaded program. The problem is, that we run some foreign so/dlls (the code is available) which means that we pass control there.
The idea is to recompile these with some callback to a sort of a cancellation routine.
Is it possible to let GCC call some callback functions in between of C-transactions or asm-transactions in this compiled foreign code?
What I'm about to suggest does not involve the compiler, but this sounds like a problem you can solve at runtime with POSIX signals or ptrace ...
With a signal you can interrupt the current context, similar to what would happen in kernel mode with an IRQ. You will have to worry about being "signal-safe" (example: your handler can't use malloc because it might interrupt malloc itself while its data structures are in an indeterminate state.)
With ptrace you can step through instructions in another process as if in a debugger.
Tread carefully, as these are difficult mechanisms to use correctly and it's very easy to shoot yourself in the foot.
Related
I use a third library in my c++ program which under certain circumstances emits SIGABRT signal. I know that trying to free non-initialized pointer or something like this can be the cause of this signal. Nevertheless I want to keep running my program after this signal is emitted, to show a message and allow the user to change the settings, in order to cope with this signal.
(I use QT for developing.)
How can I do that?
I use a third library in my c++ program which under certain circumstances emits SIGABRT signal
If you have the source code of that library, you need to correct the bug (and the bug could be in your code).
BTW, probably SIGABRT happens because abort(3) gets indirectly called (perhaps because you violated some conventions or invariants of that library, which might use assert(3) - and indirectly call abort). I guess that in caffe the various CHECK* macros could indirectly call abort. I leave you to investigate that.
If you don't have the source code or don't have the capacity or time to fix that bug in that third party library, you should give up using that library and use something else.
In many cases, you should trust external libraries more than your own code. Probably, you are abusing or misusing that library. Read carefully its documentation and be sure that your own code calling it is using that library correctly and respects its invariants and conventions. Probably the bug is in your own code, at some other place.
I want to keep running my program
This is impossible (or very unreliable, so unreasonable). I guess that your program has some undefined behavior. Be very scared, and work hard to avoid UB.
You need to improve your debugging skills. Learn better how to use the gdb debugger, valgrind, GCC sanitizers (e.g. instrumentation options like -fsanitize=address, -fsanitize=undefined and others), etc...
You reasonably should not try to handle SIGABRT even if in principle you might (but then read carefully signal(7), signal-safety(7) and hints about handling Unix signals in Qt). I strongly recommend to avoid even trying catching SIGABRT.
Unfortunately, you can't.
SIGABRT signal is itself sent right after abort()
Ref:
https://stackoverflow.com/a/3413215/9332965
You can handle SIGABRT, but you probably shouldn't.
The "can" is straightforward - just trap it in the usual way, using signal(). You don't want to return from this signal handler - you probably got here from abort() - possibly originally from assert() - and that function will exit after raising the signal. You could however longjmp() back to a state you set up earlier.
The "shouldn't" is because once SIGABRT has been raised, your data structures (including those of Qt and any other libraries) are likely in an inconsistent state and actually using any of your program's state is likely to be unpredictable at best. Apart from exiting immediately, there's not much you can do other than exec() a replacement program to take over in a sane initial state.
If you just want to show a friendly message, then you perhaps could exec() a small program to do that (or just use xmessage), but beware of exiting this with a success status where you would have had an indication of the SIGABRT otherwise.
Unfortunately there isn't much you can do to prevent SIGABRT from terminating your program. Not without modifying some code that was hopefully written by you.
You would either need to change code to not throw an abort, or you would have to spawn a new process that runs the code instead of the current process. I do not suggest you use a child process to solve this problem. It's most likely caused by misuse of an api or computer resources, such as low memory.
I am implementing a function in library which takes a while (up to a minute). It initialize a device. Now generally any long function should run in its own thread and report to main thread when it completes but I am not sure since this function is in library.
My dilemma is this, even if I implement this in a separate thread, another thread in the application has to wait on it. If so why not let the application run this function in that thread anyways?
I could pass queue or mailbox to the library function but I would prefer a simpler mechanism where the library can be used in VB, VC, C# or other windows platforms.
Alternatively I could pass HWND of the window and the library function can post message to it when it completes instead of signaling any event. That seems like most practical approach if I have to implement the function in its own thread. Is this reasonable?
Currently my function prototype is:
void InitDevice(HANDLE hWait)
When initialization is complete than I signal bWait. This works fine but I am not convinced I should use thread anyways when another secondary thread will have to wait on InitDevice. Should I pass HWNDinstead? That way the message will be posted to the primary thread and it will make better sense with multithreading.
In general, when I write library code, I normally try to stay away from creating threads unless it's really necessary. By creating a thread, you're forcing a particular threading model on the application. Perhaps they wish to use it from a very simplistic command-line tool where a single thread is fine. Or they could use it from a GUI tool where things must be multi-threaded.
So, instead, just give the library user understanding that a function call is a long-term blocking call, some callback mechanism to monitor the progress, and finally a way to immediately halt the operation which could be used by a multi-threaded application.
What you do want to claim is being thread safe. Use mutexes to protect data items if there are other functions they can call to affect the operation of the blocking function.
I wrote a tool in C++ using wxWidgets for the GUI and IBM ILOG Cplex to solve an optimization problem.
In one of the functions called by the wx event handler, I invoke the IBM ILOG Cplex Optimizer which is itself multi-threaded code.
I realize that this causes indererministic bugs with non-sensical memory contents.
Since I have no experince in writing multi-threaded code and would like to get away without spending three weeks learning how to do it, I would like to know:
Is there is some safe, possibly inelegant way to avoid problems here? (More elegant, maybe, than writing a file to disc, calling a different task through the OS and reading the output back in).
Is it a bad idea to launch Cplex threads from a wx thread?
Is it generally a bad idea to use two libraries that might use different libraries internally to implement multi-threading? (I have no idea what there is except pthreads and what is used by either cplex or wx).
Any help and background information is appreciated.
Based on my experience, the rule is:
every wxWdiget function call that change the display must be made in the wxWidget thread
I don't know much about Cplex, but if you say it's multithreaded, chances are you are calling an asynchronous function and you handle the results in a call back. The callback is most definitely not called withing the wxWidget thread. If you then try to display the results within the callback, you are breaking the rule stated above. That's when you'll get nice little bugs, which in my case usually materialize as heap corruption.
To fix that you must pass the results of your callback to the wxWidget thread and display them in that thread. There's many way to do it, but the global mechanism is to trigger a custom event on wxWigdet that get passed to the wxWidget thread.
Check this link, http://wiki.wxwidgets.org/Custom_Events you need to use
wxEvtHandler::AddPendingEvent(wxEvent& event)
This question is more for my personal curiosity than anything important. I'm trying to keep all my code compatible with at least Windows and Mac. So far I've learned that I should base my code on POSIX and that's just great but...
Windows doesn't have a sigaction function so signal is used? According to:
What is the difference between sigaction and signal? there are some problems with signal.
The signal() function does not block other signals from arriving while the current handler is executing; sigaction() can block other signals until the current handler returns.
The signal() function resets the signal action back to SIG_DFL (default) for almost all signals. This means that the signal() handler must reinstall itself as its first action. It also opens up a window of vulnerability between the time when the signal is detected and the handler is reinstalled during which if a second instance of the signal arrives, the default behaviour (usually terminate, sometimes with prejudice - aka core dump) occurs.
If two SIGINT's come quickly then the application will terminate with default behavior. Is there any way to fix this behavior? What other implications do these two issues have on a process that, for instance wants to block SIGINT? Are there any other issues that I'm likely to run across while using signal? How do I fix them?
You really don't want to deal with signal()'s at all.
You want "events".
Ideally, you'll find a framework that's portable to all the main environments you wish to target - that would determine your choice of "event" implementation.
Here's an interesting thread that might help:
Game Objects Talking To Each Other
PS:
The main difference between signal() and sigaction() is that sigaction() is "signal()" on steroids - more options, allows SA_RESTART, etc. I'd discourage using either one unless you really, really need to.
I'm creating a concurrent memory reclamation algorithm in C++. Periodically, the stacks of executing mutator threads need to be inspected, so that I can see what references the threads are currently holding. In the process of doing this, I need to also check the registers of the mutator thread to check any references that might be in there.
Clearly many JVM's and C# vm's have no problem doing this as part of their garbage collection cycles. However, I haven't been able to find a definitive solution to this issue.
I can't quite tease apart what is going on in the Bohem garbage collector in order to inspect the root set, if you can (or know how its done), I'd really like to know.
Ideally I would be able to cause the mutator thread to be interrupted, and execute a piece of handler code which would report it's PC and also flush any register-based references into the stack, and then perhaps help finish the collection cycle. I believe that most compilers in most systems will automatically flush the registers when interrupt or signal handlers are called, but I'm not clear on the specifics, or how to access that data. It seems that separate stacks might be used for interrupt and signal handlers. Additionally, I can't find any information about how to target a particular thread, or how to send a signal. Windows does not seem to support this form of signaling anyway, and I would like my system to run on both Linux and Windows on x86-64 processors.
Edit: SuspendThread() is used in some situations, although safepoints seem to be preferred. Any ideas on why? Is there any way to deal with long-lasting I/O waits or other waits for kernel code to return?
I thought this was a very interesting question, so I dug into it a bit. It turns out that the Hotspot JVM uses a mechanism called "safepoints" which cause the threads of the JVM to cooperatively all stop themselves so that the GC can begin. In other words, the thread initiating GC doesn't forcibly stop the other threads, the other threads voluntarily suspend themselves by various clever mechanisms.
I don't believe the JVM scans registers, because a safepoint is defined such that all roots are known (I presume this means in memory).
For more information see:
HotSpot Glossary -- which defines safepoints
safepoint.cpp -- the source in HotSpot that implements safepoints
A slide deck that describes safepoints in some detail (look 10 slides or so in)
In regards to your desire to "interrupt" all threads, according to the slide deck I referenced above, thread suspension is "unreliable on Solaris and Linux, e.g., spurious signals." I'm not sure what mechanism even exists for thread suspension that the slides would be referring to.
On windows you should be able to get this done use SuspendThread (and ResumeThread) along with GetThreadContext (as Hans mentioned). All of these functions take handles to the specific thread you intend to target.
To get a list of all threads in the current process, see this(toolhlp32 works on x64, despite its bad naming scheme...).
As a point of interest, one way to flush registers to the stack on x86 is to use the PUSHAD assembly instruction.