Break loop when SIGSEGV received C/C++

Break loop when SIGSEGV received C/C++ - c++

I have to use c++ classes which is not properly written - there is no information if one function in loop is executed properly or not.
If it is not, I receive segmentation fault and I'm loosing everything what was calculated. I would like to convert SIGSEGV signal to break loop. Is there any possibility?
Using signal handlers from #include <csignal> doesn't help.

A segmentation fault may happen in two ways:
uncontrolled segfaults where a process is accessing addresses for which this access is not well defined.
#JSB this is the case you're dealing with and there's little you can do about it, other then getting the offending code fixed.
When an uncontrolled segfault happens, which is the case for buggy code in 99.999% of all cases the only reasonable thing to do is cover your losses (you may write to already opened files from a SIGSEGV handler) and terminate the process.
#JSB the following does not apply to you! This is just for completenes!
"controlled" segfaults where a process accesses addresses which are allocated by the process, but read/write/execute access is disabled.
A controlled segfault may be induced in the following way
size_t const sz_p = pagesize;
char *p = mmap(NULL, sz_p, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
strcpy(p, "sigsegv");
So why is this a controlled segfault? Because you can actually react to it in a sensible way. In the SIGSEGV handler you can set the memory protection of the pages which access caused the segfault to allow the access
void sigsegv_handler(int, siginfo_t *info, void *)
{
if( ((char*)info->si_addr - p) < sz_p
&& ((char*)info->si_addr - p) >= 0 ) {
mprotect(p, sz_p, PROT_READ | PROT_WRITE);
}
}
It is important to understand that this kind of SIGSEGV handler is well behaved and defined only if the segfault was caused by access to an actually allocated memory objects and if the signal handler action only sets memory protection flags on memory objects owned by the process. You can't use it to make broken code magically work!
So why would one actually do this? One example would be client side implementation of APIs that allow network distribution and also allow to map objects into memory, like OpenGL, which has the API functions glMapBuffer / glUnmapBuffer. To avoid unneccesary round trips and transfer you'd want to transfer only those parts of the buffer actually read from and/or modified. For this you have to somehow detect which pages a program touches. Some OSs (like Windows) have a dedicated API for this, but in *nix-es you have to work with mmap + mprotect + SIGSEGV handler tricks to implement this.

Related

How to trap memory reads and writes using sigsegv?

How do I trick linux into thinking a memory read/write was successful? I am writing a C++ library such that all reads/writes are redirected and handled transparently to the end user. Anytime a variable is written or read from, the library will need to catch that request and shoot it off to a hardware simulation which will handle the data from there.
Note that my library is platform dependent on:
Linux ubuntu 3.16.0-39-generic #53~14.04.1-Ubuntu SMP x86_64 GNU/Linux
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Current Approach: catch SIGSEGV and increment REG_RIP
My current approach involves getting a memory region using mmap() and shutting off access using mprotect(). I have a SIGSEGV handler to get the info containing the memory address, export the read/write elsewhere, then increment context REG_RIP.
void handle_sigsegv(int code, siginfo_t *info, void *ctx)
{
void *addr = info->si_addr;
ucontext_t *u = (ucontext_t *)ctx;
int err = u->uc_mcontext.gregs[REG_ERR];
bool is_write = (err & 0x2);
// send data read/write to simulation...
// then continue execution of program by incrementing RIP
u->uc_mcontext.gregs[REG_RIP] += 6;
}
This works for very simple cases, such as:
int *num_ptr = (int *)nullptr;
*num_ptr = 10; // write segfault
But for anything even slightly more complex, I receive a SIGABRT:
30729 Illegal instruction (core dumped) ./$target
Using mprotect() within SIGSEGV handler
If I were to not increment REG_RIP, handle_sigsegv() will be called over and over again by the kernel until the memory region becomes available for reading or writing. I could run mprotect() for that specific address, but that has multiple caveats:
Subsequent memory access will not trigger a SIGSEGV due to the memory region now having PROT_WRITE ability. I have tried to create a thread that continuously marks the region as PROT_NONE, but that does not elude the next point:
mprotect() will, at the end of the day, perform the read or write into memory, invalidating the use case of my library.
Writing a device driver
I have also attempted to write a device module such that the library can call mmap() on the char device, where the driver will handle the reads and writes from there. This makes sense in theory, but I have not been able to (or do not have the knowledge to) catch every load/store the processor issues to the device. I have attempted overwrite the mapped vm_operations_struct and/or the inode's address_space_operations struct, but that will only call reads/writes when a page is faulted or a page is flushed into backing store.
Perhaps I could use mmap() and mprotect(), like explained above, on the device that writes data nowhere (similar to /dev/null), then have a process that recognizes the reads/writes and routes the data from there (?).
Utilize syscall() and provide a restorer assembly function
The following was pulled from the segvcatch project1 that converts segfaults into exceptions.
#define RESTORE(name, syscall) RESTORE2(name, syscall)
#define RESTORE2(name, syscall)\
asm(\
".text\n"\
".byte 0\n"\
".align 16\n"\
"__" #name ":\n"\
" movq $" #syscall ", %rax\n"\
" syscall\n"\
);
RESTORE(restore_rt, __NR_rt_sigreturn)
void restore_rt(void) asm("__restore_rt") __attribute__
((visibility("hidden")));
extern "C" {
struct kernel_sigaction {
void (*k_sa_sigaction)(int, siginfo_t *, void *);
unsigned long k_sa_flags;
void (*k_sa_restorer)(void);
sigset_t k_sa_mask;
};
}
// then within main ...
struct kernel_sigaction act;
act.k_sa_sigaction = handle_sigegv;
sigemptyset(&act.k_sa_mask);
act.k_sa_flags = SA_SIGINFO|0x4000000;
act.k_sa_restorer = restore_rt;
syscall(SYS_rt_sigaction, SIGSEGV, &act, NULL, _NSIG / 8);
But this ends up functioning no different than a regular sigaction() configuration. If I do not set the restorer function the signal handler is not called more than once, even when the memory region is still not available. Perhaps there is some other trickery I could do with the kernel signal here.
Again, the entire objective of the library is to transparently handle reads and writes to memory. Perhaps there is a much better way of doing things, maybe with ptrace() or even updating the kernel code that generates the segfault signal, but the important part is that the end-user's code does not require changes. I have seen examples using setjmp() and longjmp() to continue after a segfault, but that would require adding those calls to every memory access. The same goes for converting a segfault to a try/catch.
1 segvcatch project

You can use mprotect and avoid the first problem you note by also having the SIGSEGV handler set the T flag in the flags register. Then, you add a SIGTRAP handler that restores the mprotected memory and clears the T flag.
The T flag causes the processor to single step, so when the SEGV handler returns it will execute that single instruction, and then immediately TRAP.
This still leaves you with your second problem -- the read/write instruction will actually occur. You may be able to get around that problem by carefully modifying the memory before and/or after the instruction in the two signal handlers...

mprotect: how to get the instruction which causes protection violation?

I am using mprotect to set some memory pages as write protected. When any writing is tried in that memory region, the program gets a SIGSEGV signal. From the signal handler I know in which memory address the write was tried, but I don't know the way how to find out which instruction causes write protection violation. So inside the signal handler I am thinking of reading the program counter(PC) register to get the faulty instruction. Is there a easy way to do this?

If you install your signal handler using sigaction with the SA_SIGINFO flag, the third argument to the signal handler has type void * but points to a structure of type ucontext_t, which in turn contains a structure of type mcontext_t. The contents of mcontext_t are implementation-defined and generally cpu-architecture-specific, but this is where you will find the saved program counter.
It's also possible that the compiler's builtins (__builtin_return_address with a nonzero argument, I think) along with unwinding tables may be able to trace across the signal handler. While this is in some ways more general (it's not visibly cpu-arch-specific), I think it's also more fragile, and whether it actually works may be cpu-arch- and ABI-specific.

Best practices for recovering from a segmentation fault

I am working on a multithreaded process written in C++, and am considering modifying SIGSEGV handling using google-coredumper to keep the process alive when a segmentation fault occurs.
However, this use of google-coredumper seems ripe with opportunities to get stuck in an infinite loop of core dumps unless I somehow reinitialize the thread and the object that may have caused the core dump.
What best practices should I keep in mind when trying to keep a process alive through a core dump? What other 'gotchas' should I be aware of?
Thanks!

It is actually possible in C. You can achieve it in quite a complicated way:
1) Override signal handler
2) Use setjump() and longjmp() to set the place to jump back, and to actually jump back to there.
Check out this code I wrote (idea taken from "Expert C Programming: Deep C Secrets" by Peter Van Der Linden):
#include <signal.h>
#include <stdio.h>
#include <setjmp.h>
//Declaring global jmp_buf variable to be used by both main and signal handler
jmp_buf buf;
void magic_handler(int s)
{
switch(s)
{
case SIGSEGV:
printf("\nSegmentation fault signal caught! Attempting recovery..");
longjmp(buf, 1);
break;
}
printf("\nAfter switch. Won't be reached");
}
int main(void)
{
int *p = NULL;
signal(SIGSEGV, magic_handler);
if(!setjmp(buf))
{
//Trying to dereference a null pointer will cause a segmentation fault,
//which is handled by our magic_handler now.
*p=0xdead;
}
else
{
printf("\nSuccessfully recovered! Welcome back in main!!\n\n");
}
return 0;
}

The best practice is to fix the original issue causing the core dump, recompile and then relaunch the application.
To catch these errors before deploying in the wild, do plenty of peer review and write lots of tests.

Steve's answer is actually a very useful formula. I've used something similar in a piece of complicated embedded software where there was at least one SIGSEGV error in the code that we could not track down by ship time. As long as you can reset your code to have no ill effects (memory or resource leaks) and the error is not something that causes an endless loop it can be a lifesaver (even though its better to fix the bug). FYI in our case it was single thread.
But what is left out is that once you recover from your signal handler, it will not work again unless you unmask the signal. Here is a chunk of code to do that:
sigset_t signal_set;
...
setjmp(buf);
sigemptyset(&signal_set);
sigaddset(&signal_set, SIGSEGV);
sigprocmask(SIG_UNBLOCK, &signal_set, NULL);
// Initialize all Variables...
Be sure to free up your memory, sockets and other resources or you could leak memory when this happens.

My experience with segmentation faults is that it's very hard to catch them portably, and to do it portably in a multithreaded context is next to impossible.
This is for good reason: Do you really expect the memory (which your threads share) to be intact after a SIGSEGV? After all, you've just proven that some addressing is broken, so the assumption that the rest of the memory space is clean is pretty optimistic.
Think about a different concurrency model, e.g. with processes. Processes don't share their memory or only a well-defined part of it (shared memory), and one process can reasonably work on when another process died. When you have a critical part of the program (e.g. the core temperature control), putting it in an extra process protects it from memory corruption by other processes and segmentation faults.

If a segmentation fault occurs, you're better off just ditching the process. How can you know that any of your process's memory is usable after this? If something in your program is messing with memory it shouldn't, why do you believe it didn't mess with some other part of memory that your process actually can access without segfaulting?
I think that doing this will mostly benefit attackers.

From description of coredumper seems it's purpose not what you intending, but just allowing to make snapshots of process memory.
Personally, I wouldn't keep process after it triggered core dump -- it just so many ways it could be broken -- and would employ some persistence to allow data recovery after process is restarted.
And, yes, as parapura has suggested, better yet, find out what causing SIGSEGV and fix it.

C++/Windows: How to report an out-of-memory exception (bad_alloc)?

I'm currently working on an exception-based error reporting system for Windows MSVC++ (9.0) apps (i.e. exception structures & types / inheritance, call stack, error reporting & logging and so on).
My question now is: how to correctly report & log an out-of-memory error?
When this error occurs, e.g. as an bad_alloc thrown by the new op, there may be many "features" unavailable, mostly concerning further memory allocation. Normally, I'd pass the exception to the application if it has been thrown in a lib, and then using message boxes and error log files to report and log it. Another way (mostly for services) is to use the Windows Event Log.
The main problem I have is to assemble an error message.
To provide some error information, I'd like to define a static error message (may be a string literal, better an entry in a message file, then using FormatMessage) and include some run-time info such as a call stack.
The functions / methods necessary for this use either
STL (std::string, std::stringstream, std::ofstream)
CRT (swprintf_s, fwrite)
or Win32 API (StackWalk64, MessageBox, FormatMessage, ReportEvent, WriteFile)
Besides being documented on the MSDN, all of them more (Win32) or less (STL) closed source in Windows, so I don't really know how they behave under low memory problems.
Just to prove there might be problems, I wrote a trivial small app provoking a bad_alloc:
int main()
{
InitErrorReporter();
try
{
for(int i = 0; i < 0xFFFFFFFF; i++)
{
for(int j = 0; j < 0xFFFFFFFF; j++)
{
char* p = new char;
}
}
}catch(bad_alloc& e_b)
{
ReportError(e_b);
}
DeinitErrorReporter();
return 0;
}
Ran two instances w/o debugger attached (in Release config, VS 2008), but "nothing happened", i.e. no error codes from the ReportEvent or WriteFile I used internally in the error reporting. Then, launched one instance with and one w/o debugger and let them try to report their errors one after the other by using a breakpoint on the ReportError line. That worked fine for the instance with the debugger attached (correctly reported & logged the error, even using LocalAlloc w/o problems)! But taskman showed a strange behaviour, where there's a lot of memory freed before the app exits, I suppose when the exception is thrown.
Please consider there may be more than one process [edit] and more than one thread [/edit] consuming much memory, so freeing pre-allocated heap space is not a safe solution to avoid a low memory environment for the process which wants to report the error.
Thank you in advance!

"Freeing pre-allocated heap space...". This was exactly that I thought reading your question. But I think you can try it. Every process has its own virtual memory space. With another processes consuming a lot of memory, this still may work if the whole computer is working.

pre-allocate the buffer(s) you need
link statically and use _beginthreadex instead of CreateThread (otherwise, CRT functions may fail) -- OR -- implement the string concat / i2a yourself
Use MessageBox (MB_SYSTEMMODAL | MB_OK) MSDN mentions this for reporting OOM conditions (and some MS blogger described this behavior as intended: the message box will not allocate memory.)
Logging is harder, at the very least, the log file needs to be open already.
Probably best with FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH, to avoid any buffering attempts. The first one requires that writes and your memory buffers are sector aligned (i.e. you need to query GetDiskFreeSpace, align your buffer by that, and write only to "multiple of sector size" file offsets, and in blocks that are multiples of sector size. I am not sure if this is necessary, or helps, but a system-wide OOM where every allocation fails is hard to simulate.

Please consider there may be more than one process consuming much memory, so freeing pre-allocated heap space is not a safe solution to avoid a low memory environment for the process which wants to report the error.
Under Windows (and other modern operating systems), each process has its own address space (aka memory) separate from every other running process. And all of that is separate from the literal RAM in the machine. The operating system has virtualized the process address space away from the physical RAM.
This is how Windows is able to push memory used by processes into the page file on the hard disk without those processes having any knowledge of what happened.
This is also how a single process can allocate more memory than the machine has physical RAM and yet still run. For instance, a program running on a machine with 512MB of RAM could still allocate 1GB of memory. Windows would just couldn't keep all of it in the RAM at the same time and some of it would be in the page file. But the program wouldn't know.
So consequently, if one process allocates memory, it does not cause another process to have less memory to work with. Each process is separate.
Each process only needs to worry about itself. And so the idea of freeing a pre-allocated chunk of memory is actually very viable.

You can't use CRT or MessageBox functions to handle OOM since they might need memory, as you describe. The only truly safe thing you can do is alloc a chunk of memory at startup you can write information into and open a handle to a file or a pipe, then WriteFile to it when you OOM out.

Most efficient replacement for IsBadReadPtr?

I have some Visual C++ code that receives a pointer to a buffer with data that needs to be processed by my code and the length of that buffer. Due to a bug outside my control, sometimes this pointer comes into my code uninitialized or otherwise unsuitable for reading (i.e. it causes a crash when I try to access the data in the buffer.)
So, I need to verify this pointer before I use it. I don't want to use IsBadReadPtr or IsBadWritePtr because everyone agrees that they're buggy. (Google them for examples.) They're also not thread-safe -- that's probably not a concern in this case, though a thread-safe solution would be nice.
I've seen suggestions on the net of accomplishing this by using VirtualQuery, or by just doing a memcpy inside an exception handler. However, the code where this check needs to be done is time sensitive so I need the most efficient check possible that is also 100% effective. Any ideas would be appreciated.
Just to be clear: I know that the best practice would be to just read the bad pointer, let it cause an exception, then trace that back to the source and fix the actual problem. However, in this case the bad pointers are coming from Microsoft code that I don't have control over so I have to verify them.
Note also that I don't care if the data pointed at is valid. My code is looking for specific data patterns and will ignore the data if it doesn't find them. I'm just trying to prevent the crash that occurs when running memcpy on this data, and handling the exception at the point memcpy is attempted would require changing a dozen places in legacy code (but if I had something like IsBadReadPtr to call I would only have to change code in one place).

bool IsBadReadPtr(void* p)
{
MEMORY_BASIC_INFORMATION mbi = {0};
if (::VirtualQuery(p, &mbi, sizeof(mbi)))
{
DWORD mask = (PAGE_READONLY|PAGE_READWRITE|PAGE_WRITECOPY|PAGE_EXECUTE_READ|PAGE_EXECUTE_READWRITE|PAGE_EXECUTE_WRITECOPY);
bool b = !(mbi.Protect & mask);
// check the page is not a guard page
if (mbi.Protect & (PAGE_GUARD|PAGE_NOACCESS)) b = true;
return b;
}
return true;
}

a thread-safe solution would be nice
I'm guessing it's only IsBadWritePtr that isn't thread-safe.
just doing a memcpy inside an exception handler
This is effectively what IsBadReadPtr is doing ... and if you did it in your code, then your code would have the same bug as the IsBadReadPtr implementation: http://blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx
--Edit:--
The only problem with IsBadReadPtr that I've read about is that the bad pointer might be pointing to (and so you might accidentally touch) a stack's guard page. Perhaps you could avoid this problem (and therefore use IsBadReadPtr safely), by:
Know what threads are running in your process
Know where the threads' stacks are, and how big they are
Walk down each stack, delberately touching each page of the stack at least once, before you begin to call isBadReadPtr
Also, the some of the comments associated with the URL above also suggest using VirtualQuery.

The reason these functions are bad to use is that the problem can't be solved reliably.
What if the function you're calling returns a pointer to memory that is allocated, so it looks valid, but it's pointing to other, unrelated data, and will corrupt your application if you use it.
Most likely, the function you're calling actually behaves correctly, and you are misusing it. (Not guaranteed, but that is often the case.)
Which function is it?

The fastest solution I can think of is to consult the virtual memory manager using VirtualQuery to see if there is a readable page at the given address, and cache the results (however any caching will reduce the accuracy of the check).
Example (without caching):
BOOL CanRead(LPVOID p)
{
MEMORY_BASIC_INFORMATION mbi;
mbi.Protect = 0;
::VirtualQuery(((LPCSTR)p) + len - 1, &mbi, sizeof(mbi));
return ((mbi.Protect & 0xE6) != 0 && (mbi.Protect & PAGE_GUARD) == 0);
}

Why can't you call the api
AfxIsValidAddress((p), sizeof(type), FALSE));

If the variable is uninitialized you are hosed. Sooner or later it's going to be an address for something you don't want to play with (like your own stack).
If you think you need this, and (uintptr_t)var < 65536 does not suffice (Windows does not allow allocating the bottom 64k), there is no real solution. VirtualQuery, etc. appear to "work" but sooner or later will burn you.

I am afraid you are out of luck - there is no way to reliably check the validity of a pointer. What Microsoft code is giving you bad pointers?

Any implementation of checking the validity of memory is subject to the same constriants that make IsBadReadPtr fail. Can you post an example callstack of where you want to check the validity of memory of a pointer passed to you from Windows? That might help other people (including me) diagnose why you need to do this in the first place.

If you have to resort to checking patterns in data, here are a few tips:
If you mention using IsBadReadPtr, you are probably developing for Windows x86 or x64.
You may be able to range check the pointer. Pointers to objects will be word aligned. In 32-bit windows, user-space pointers are in the range of 0x00401000-0x7FFFFFFF, or for large-address-aware applications, 0x00401000-0xBFFFFFFF instead (edit: 0x00401000-0xFFFF0000 for a 32-bit program on 64-bit windows). The upper 2GB/1GB is reserved for kernel-space pointers.
The object itself will live in Read/Write memory which is not executable. It may live in the heap, or it may be a global variable. If it is a global variable, you can validate that it lives in the correct module.
If your object has a VTable, and you are not using other classes, compare its VTable pointer with another VTable pointer from a known good object.
Range check the variables to see if they are possibly valid. For example, bools can only be 1 or 0, so if you see one with a value of 242, that's obviously wrong. Pointers can also be range checked and checked for alignment as well.
If there are objects contained within, check their VTables and data as well.
If there are pointers to other objects, you can check that the object lives in memory that is Read/Write and not executable, check the VTable if applicable, and range check the data as well.
If you do not have a good object with a known VTable address, you can use these rules to check if a VTable is valid:
While the object lives in Read/Write memory, and the VTable pointer is part of the object, the VTable itself will live in memory that is Read Only and not executable, and will be aligned to a word boundary. It will also belong to the module.
The entries of the VTable are pointers to code, which will be Read Only and Executable, and not writable. There is no alignment restrictions for code addresses. Code will belong to the module.

Here is what I use this just replaces the official microsoft ones by using #define's this way you can use the microsoft ones and not worry about them failing you.
// Check memory address access
const DWORD dwForbiddenArea = PAGE_GUARD | PAGE_NOACCESS;
const DWORD dwReadRights = PAGE_READONLY | PAGE_READWRITE | PAGE_WRITECOPY | PAGE_EXECUTE_READ | PAGE_EXECUTE_READWRITE | PAGE_EXECUTE_WRITECOPY;
const DWORD dwWriteRights = PAGE_READWRITE | PAGE_WRITECOPY | PAGE_EXECUTE_READWRITE | PAGE_EXECUTE_WRITECOPY;
template<DWORD dwAccessRights>
bool CheckAccess(void* pAddress, size_t nSize)
{
if (!pAddress || !nSize)
{
return false;
}
MEMORY_BASIC_INFORMATION sMBI;
bool bRet = false;
UINT_PTR pCurrentAddress = UINT_PTR(pAddress);
UINT_PTR pEndAdress = pCurrentAddress + (nSize - 1);
do
{
ZeroMemory(&sMBI, sizeof(sMBI));
VirtualQuery(LPCVOID(pCurrentAddress), &sMBI, sizeof(sMBI));
bRet = (sMBI.State & MEM_COMMIT) // memory allocated and
&& !(sMBI.Protect & dwForbiddenArea) // access to page allowed and
&& (sMBI.Protect & dwAccessRights); // the required rights
pCurrentAddress = (UINT_PTR(sMBI.BaseAddress) + sMBI.RegionSize);
} while (bRet && pCurrentAddress <= pEndAdress);
return bRet;
}
#define IsBadWritePtr(p,n) (!CheckAccess<dwWriteRights>(p,n))
#define IsBadReadPtr(p,n) (!CheckAccess<dwReadRights>(p,n))
#define IsBadStringPtrW(p,n) (!CheckAccess<dwReadRights>(p,n*2))
This approach is based on my understanding of Raymond Chen's blog post, If I'm not supposed to call IsBadXxxPtr, how can I check if a pointer is bad?

This is an old question but this part:
the code where this check needs to be done is time sensitive so I need
the most efficient check possible that is also 100% effective
VirtualQuery() takes a kernel call, so the simple memcpy() in an exception handler will be faster for the case where the memory is okay to read most of the time.
__try
{
memcpy(dest, src, size);
}__except(1){}
All stays in user mode when there is no exception. Maybe a bit slower for the use case where the memory is bad to read more than it is good (since it fires off a exception which is a round trip through the kernel and back).
You could also extend it with a custom memcpy loop and *size so you could return exactly how many bytes were actually read.

if you are using VC++ then I suggest to use microsoft specific keywords __try __except
to and catch HW exceptions

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js