How to log the segmentation faults and run time errors which can crash the program, through a remote logging library? - c++

What is the technique to log the segmentation faults and run time errors which crash the program, through a remote logging library?
The language is C++.

Here is the solution for printing backtrace, when you get a segfault, as an example what you can do when such an error happens.
That leaves you a problem of logging the error to the remote library. I would suggest keeping the signal handler, as simple, as possible and logging to the local file, because you cannot assume, that previously initialized logging library works correctly, when segmentation fault occured.

What is the technique to log the segmentation faults and run time errors which crash the program, through a remote logging library?
From my experience, trying to log (remotely or into file) debugging messages while program is crashing might not be very reliable, especially if APP takes system down along with it:
With TCP connection you might lose last several messages while system is crashing. (TCP maintains data packet order and uses error correction, AFAIK. So if app just quits, some data can be lost before being transmitted)
With UDP connection you might lose messages because of the nature of UDP and receive them out-of-order
If you're writing into file, OS might discard most recent changes (buffers not flushed, journaled filesystem reverting to earlier state of the file).
Flushing buffers after every write or sending messages via TCP/UDP might induce performance penalties for a program that produces thousands of messages per second.
So as far as I know, the good idea is to maintain in-memory plaintext log-file and write a core dump once program has crashed. This way you'll be able to find contents of log file within core dump. Also writing into in-memory log will be significantly faster than writing into file or sending messages over network. Alternatively, you could use some kind of "dual logging" - write every debug message immediately into in-memory log, and then send them asynchronously (in another thread) into log file or over the network.
Handling of exceptions:
Platform-specific. On windows platform you can use _set_se_handlers and use it to either generate backtrace or to translate platform exceptions into c++ exceptions.
On linux I think you should be able to create a handler for SIGSEGV signal.
While catching segfault sounds like a decent idea, instead of trying to handle it from within the program it makes sense to generate core dump and bail. On windows you can use MiniDumpWriteDump from within the program and on linux system can be configured to produce core dumps in shell (ulimit -c, I think?).

I'd like to give some solutions:
using core dump and start a daemon to monitor and collect core dumps and send to your host.
GDB (with GdbServer), you can debug remotely and see backtrace if crashed.

To catch the segfault signal and send a log accordingly, read this post:
Is there a point to trapping "segfault"?
If it turns out that you wont be able to send the log from a signal handler (maybe the crash occurred before the logger has been intitialized), then you may need to write the info to file and have an external entity send it remotely.
EDIT: Putting back some original info to be able to send the core file remotely too
To be able to send the core file remotely, you'll need an external entity (a different process than the one that crashed) that will "wait" for core files and send them remotely as they appear. (possibly using scp) Additionally, the crashing process could catch the segfault signal and notify the monitoring process that a crash has occurred and a core file will be available soon.

Related

Kernel module to intercept system calls causes issues in execution of userspace programs

I've been trying to write a kernel module (using SystemTap) that would intercept system calls, capture its information and add it to a system call buffer region that is kmalloc'd. I have implemented a mmap file operation so that a user space process can access this kmalloc'd region and read from it.
Note: For now, the module only intercepts the memfd_create system call. To test this out I have compiled a test application that calls memfd_create twice.
SystemTap script/kernel module code
In addition to the kernel module, I also wrote a user space application that would periodically read the system calls of this buffer, determine whether the system call is legit or malicious and then adds a response to a response buffer region (also included in the kmalloc'd region and can be accessed using mmap) indicating whether to let the system call proceed or to terminated the calling process.
User space application code
The kernel module also has a timer that kicks every few milliseconds to check the response buffer for responses added by the user space. Depending on the response the kernel module would either terminate the calling process or let it proceed.
The issue I am facing is that after intercepting a few system calls (I keep executing test application) and processing them properly, I start facing some issues executing normal commands in the userspace. For example: A simple command like ls:
[virsec#redhat7 stap-test]$ ls
Segmentation fault
[virsec#redhat7 stap-test]$ strace ls
execve("/usr/bin/ls", ["ls"], [/* 30 vars */]) = -1 EFAULT (Bad address)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++
Segmentation fault
This happens with every terminal command that I run. The dmesg output shows nothing but the printk debug outputs of the kernel module. There is no kernel panic, the kernel module and the userspace application are still running and waiting for the next system call to intercept. What do you think could be the issue? Let me know if you need any more information. I was not able to post any code snippets because I think the code would make much more sense as a whole.

How does Windows close a program when shutting down the computer?

My application throws some strange errors if you shut down the computer while my application is running.
Sometimes the message is (address) memory can not be "read", sometimes can not be "write".
Shutting down the application in the normal way doesn't generate such messages.
How can I simulate the "windows shutdown" so that I can debug my application? How can I find out what the application is trying to do that it cannot?
When Windows wants to shutdown, it sends a series of events to the application; such as WM_ENDSESSION and WM_QUIT. You can process these in the message handler you are using; in general the application will need to respond appropriately and quickly to these messages else the OS will just terminate the application anyway. I'm not sure what default processing wxwidgets offers in this regard. Hooking into these would help in diagnosing the application error itself.
There are a few things you could attempt to do;
The shutdown sequence will not be easy to simulate (if at all) - a lot happens during shutdown; the exact state and situation is difficult to simulate in it's entirety.
In terms of diagnosing the state of the application just before shutdown, you could try to process the WM_QUERYENDSESSION and respond with a FALSE to prevent it from shutting down (with newer versions of Windows you can no longer prevent the shutdown, so it may not work depending on the platform you are on).
You could also try to test the application's immediate response to WM_ENDSESSION message by sending it the WM_ENDSESSION (e.g. via a PostMessage) with the appropriate data as detailed on MSDN.
For terminal based applications;
You can also hook in the signals (SIGKILL I believe) if required. See this Microsoft reference for more detail. You can also the the SetConsoleCtrlHandler hook. But since you using a toolkit, it would be better to use the messages sent to the application already.

catch all errors and exception related my program

I am currently working on a c++ daemon program that listen on a port for incoming requests.
I would like to catch all the errors related to the program for that I implemented a logger in my program and catched some eventual ones, but other errors remain uncatchable using those methodes, example the Segfault or if the program was stopped because of memory shortage.
I had the idea of using the 'bmesg' which contains logs of different process and then take what I need from there. the problem with this approach is that the logs from 'bmesg' don't contain human readable information more than that, the logs are not dated, so i used 'gdb' on my program, now i logs are more elaborated and contain better information but i can't catch the message of 'gdb'
my questions are:
Is my approach to this problem correct? If yes how can i continue from where i am now
Is there another way to listen to errors better than this.
I will need somthing similar in a C program do you have a suggestion.
EDIT
after some research i think i will use another deamon to check every 5 min or so if my other deamon is running or not in order to re-launch it if its down. with this setteled i now need to record the error. this is where i am stuck

How to report correctly the abrupt end of another process in Linux?

I'm working on a embedded solution where two apps are working: one is the user interface and the other runs in the background providing data for the UI.
Recently I came across with a memory leak or similar error that is making Linux kill the secondary process, leaving the UI in a stopped situation without telling anything for the user about what is going on. I reached the problem by reading Linux's message log file and the software's print on terminal "Kill -myapp".
My question is: how could I notice such an event (and other similar) coming from the secondary software so I could properly report it to the user and log it? I mean, it's easy to have a look time to time in the process 'tree' to see if the secondary app is running and, if it's not, report a "some event happened" in the UI and it's also plausible to have a error-handler system inside the secondary app that makes it write in a log file what just happened and make the UI read that file for new entries from time to time, but how could the UI app knows with better details what is going on in such more abrupt events? (in this case, "Linux killed process", but it could be a "segmentation pipe" or any other) (and if there is another, better solution that this "constant read a log file produced by the secondary app", I'ld also like to know)
Notes: the UI is written in C++/Qt and the secondary app is in C. Although a solution using the Qt library would be welcomed, I think it would be better for the entire programming community if a more generalized solution was given.
You can create a signal handler for POSIX signals such as SIGKILL in the backend process and notify the ui using for example another signal with sigqueue. Any IPC mechanism should work, as long as it's async safe. Read more about signals: tutorial and manual
It may still be a good idea to check from the ui side periodically because the handler might not succeed.
As for a better way to check if process is alive compared to reading the log file:
Check if process exists given its pid

What happens to a named pipe if server crashes?

i know little about pipes but have used one to connect two processes in my code in visual C++. The pipe is working well, but I need to add error handling to the same, hence wanted to know what will happen to a pipe if the server creating it crashed and how do I recognize it from client process?
Also what will happen if the client process tried accessing the same pipe, after the server crash, if no error handling is put in place?
Edit:
What impact will be there on the memory if i keep creating new pipes (say by using system time as pipe name) while the previous was broken because of a server crash? Will these broken pipes be removed from the memory?
IIRC the ReadFile or WriteFile function will return FALSE and GetLastError() will return STATUS_PIPE_DISCONNECTED
I guess this kind of handling is implemented in your code, if not you should better add it ;-)
I just want to throw this out there.
If you want a survivable method for transferring data between two applications, you might consider using MSMQ or even bringing in BizTalk or another message platform.
There are several things to consider:
what happens if the server is rebooted or loses power?
What happens if the server application becomes unresponsive?
What happens if the server application is killed or goes away completely?
What is the appropriate response of a client application in each of the above?
Each of those contexts represent a potential loss of data. If the data loss is unacceptable then named pipes is not the mechanism you should be using. Instead you need to persist the messages somehow.
MSMQ, storing to a database, or even leveraging Biztalk can take care of the survivability of the message itself.
If 1 or 3 happens, then the named pipe goes away and must be recreated by a new instance of your server application. If #2 happens, then the pipe won't go away until someone either reboots the server or kills the server app and starts it again.
Regardless, the client application needs to handle the above issues. They boil down to connection failed problems. Depending on what the client does you might have it move into a wait state and let it ping the server every so often to see if it has come back again.
Without knowing the nature of the data and communication processes involved its hard to recommend a proper approach.