When does a process get SIGABRT (signal 6)? - c++

What are the scenarios where a process gets a SIGABRT in C++? Does this signal always come from within the process or can this signal be sent from one process to another?
Is there a way to identify which process is sending this signal?

abort() sends the calling process the SIGABRT signal, this is how abort() basically works.
abort() is usually called by library functions which detect an internal error or some seriously broken constraint. For example malloc() will call abort() if its internal structures are damaged by a heap overflow.

SIGABRT is commonly used by libc and other libraries to abort the program in case of critical errors. For example, glibc sends an SIGABRT in case of a detected double-free or other heap corruptions.
Also, most assert implementations make use of SIGABRT in case of a failed assert.
Furthermore, SIGABRT can be sent from any other process like any other signal. Of course, the sending process needs to run as same user or root.

You can send any signal to any process using the kill(2) interface:
kill -SIGABRT 30823
30823 was a dash process I started, so I could easily find the process I wanted to kill.
$ /bin/dash
$ Aborted
The Aborted output is apparently how dash reports a SIGABRT.
It can be sent directly to any process using kill(2), or a process can send the signal to itself via assert(3), abort(3), or raise(3).

It usually happens when there is a problem with memory allocation.
It happened to me when my program was trying to allocate an
array with negative size.

There's another simple cause in case of c++.
std::thread::~thread{
if((joinable ())
std::terminate ();
}
i.e. scope of thread ended but you forgot to call either
thread::join();
or
thread::detach();

The GNU libc will print out information to /dev/tty regarding some fatal conditions before it calls abort() (which then triggers SIGABRT), but if you are running your program as a service or otherwise not in a real terminal window, these message can get lost, because there is no tty to display the messages.
See my post on redirecting libc to write to stderr instead of /dev/tty:
Catching libc error messages, redirecting from /dev/tty

A case when process get SIGABRT from itself:
Hrvoje mentioned about a buried pure virtual being called from ctor generating an abort, i recreated an example for this.
Here when d is to be constructed, it first calls its base class A ctor,
and passes inside pointer to itself.
the A ctor calls pure virtual method before table was filled with valid pointer,
because d is not constructed yet.
#include<iostream>
using namespace std;
class A {
public:
A(A *pa){pa->f();}
virtual void f()=0;
};
class D : public A {
public:
D():A(this){}
virtual void f() {cout<<"D::f\n";}
};
int main(){
D d;
A *pa = &d;
pa->f();
return 0;
}
compile: g++ -o aa aa.cpp
ulimit -c unlimited
run: ./aa
pure virtual method called
terminate called without an active exception
Aborted (core dumped)
now lets quickly see the core file, and validate that SIGABRT was indeed called:
gdb aa core
see regs:
i r
rdx 0x6 6
rsi 0x69a 1690
rdi 0x69a 1690
rip 0x7feae3170c37
check code:
disas 0x7feae3170c37
mov $0xea,%eax = 234 <- this is the kill syscall, sends signal to process
syscall <-----
http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
234 sys_tgkill pid_t tgid pid_t pid int sig = 6 = SIGABRT
:)

In my case, it was due to an input in an array at an index equal to the length of the array.
string x[5];
for(int i=1; i<=5; i++){
cin>>x[i];
}
x[5] is being accessed which is not present.

I will give my answer from a competitive programming(cp) perspective, but it applies to other domains as well.
Many a times while doing cp, constraints are quite large.
For example : I had a question with a variables N, M, Q such that 1 ≤ N, M, Q < 10^5.
The mistake I was making was I declared a 2D integer array of size 10000 x 10000 in C++ and struggled with the SIGABRT error at Codechef for almost 2 days.
Now, if we calculate :
Typical size of an integer : 4 bytes
No. of cells in our array : 10000 x 10000
Total size (in bytes) : 400000000 bytes = 4*10^8 ≈ 400 MB
Your solutions to such questions will work on your PC(not always) as it can afford this size.
But the resources at coding sites(online judges) is limited to few KBs.
Hence, the SIGABRT error and other such errors.
Conclusion:
In such questions, we ought not to declare an array or vector or any other DS of this size, but our task is to make our algorithm such efficient that it works without them(DS) or with less memory.
PS : There might be other reasons for this error; above was one of them.

As "#sarnold", aptly pointed out, any process can send signal to any other process, hence, one process can send SIGABORT to other process & in that case the receiving process is unable to distinguish whether its coming because of its own tweaking of memory etc, or someone else has "unicastly", send to it.
In one of the systems I worked there is one deadlock detector which actually detects if process is coming out of some task by giving heart beat or not. If not, then it declares the process is in deadlock state and sends SIGABORT to it.
I just wanted to share this prospective with reference to question asked.

Regarding the first question: What are the scenarios where a process gets a SIGABRT in C++?
I can think of two special cases where a C++ program is aborted automatically -- not by directly calling std::abort() or std::terminate():
One: Throw an exception while an exception is being handled.
try {
throw "abc";
}
catch (...) {
throw "def"; // abort here
}
Two: An uncaught exception that attempts to propagates outside main().
int main(int argc, char** argv)
{
throw "abc"; // abort here
}
C++ experts could probably name more special cases.
There is also a lot of good info on these reference pages:
https://en.cppreference.com/w/cpp/utility/program/abort
https://en.cppreference.com/w/cpp/error/terminate

For Android native code, here are some reasons abort is called according to https://source.android.com/devices/tech/debug/native-crash :
Aborts are interesting because they are deliberate. There are many different ways to abort (including calling abort(3), failing an assert(3), using one of the Android-specific fatal logging types), but all involve calling abort.

The error munmap_chunk invalid pointer also causes a SIGABRT and in my case it was very hard to debug as I was not using pointers at all. It turned out that it was related to std::sort().
std::sort() requires a compare function that creates a strict weak ordering! That means both comparator(a, b) and comparator(b, a) must return false when a==b holds. (see https://en.cppreference.com/w/cpp/named_req/Compare) In my case I defined the operator< in my struct like below:
bool operator<(const MyStruct& o) const {
return value <= o.value; // Note the equality sign
}
and this was causing the SIGABRT because the function does not create a strict weak ordering. Removing the = solved the problem.

Related

Is signal.h a reliable way to catch null pointers?

i am currently writing a small VM in C/C++. Obviously i can't let the whole VM crash if the user dereferences a null pointer so i have to check every access which is becoming cumbersome as the VM grows and more systems are implemented.
So i had an idea: write a signal handler for sigsegv and let the OS do its thing but instead of closing the program call the VM exception handler.
It seems to work (with my very simple test cases), but i didn't find anything guaranteeing a Sigsegv being thrown on null-derefs nor the handler being called for OS generated signals.
So my question is:
Can i count on signal.h on modern destkop OSes (i don't really care if it's not standard on doesn't work on something other than linux/win: it's a pet project). Are there any non trivial stuff i should be aware of (obscure limitations of signal(...) or longjmp(...) ?)
Thank you !
Here is the pseudo implementation:
/* ... */
jmp_buf env;
/* ... */
void handler(int) {
longjmp(env, VM_NULLPTR);
}
/* ... */
if(setjmp(env)) {
return vm_throw("NullPtrException");
}
switch(opcode) {
/* instructions */
case INVOKE:
*stack_top = vm_call(stack_top->obj); // don't check anything in the case where stack_top or stack_top->obj is null handler() will be called an a "NullPtrException" will be thrown
break;
/* more instructions */
}
/* ... */
Note : i only need to check nulls, garbage (dangling) pointers are handled by the GC and should not happen.
Calling longjmp() from a signal handler is only safe if the signal handler is never invoked from async signal unsafe code. So for example if you might receive SIGSEGV by passing a bad pointer to any of the printf() family of functions, you cannot longjmp() from your signal handler.
Can i count on signal.h on modern destkop OSes
You can count on it in the sense that the header, and the functions within will be available on all standard compliant systems. However, exactly what what signals are thrown and when is not consistent across OSes.
On windows, you may need to compile your program with cygwin or similar environment to get the system to raise a segmentation fault. Programs compiled with visual studio use "structured exceptions" to handle invalid storage access.
Is signal.h a reliable way to catch null pointers?
There are situations where null pointer dereference does not result in the raising a segmentation fault signal, even on a POSIX system.
One case might be that the compiler has optimized the operation away, which is typical for example in the case of dereferencing a null pointer to call a member function that does not access any data members. When there is no invalid memory access, there is no signal either. Of course, in that case there won't be a crash either.
Another case might be that address 0 is in fact valid. That's the case on AIX, which you don't care about. It is also the case on Linux, which you do care about, but not by default and you might choose to not care about the case where it is. See this answer for more details.
Then there is your implementation of the signal handler. longjmp is not async signal safe, so if the signal was raised while another non-safe operation was being performed, the interrupted operation may have left your program in an inconsistent state. See the answer by John Zwinck and the libc documentation for details.

What does SEGV_ACCERR mean?

I am examining a few crashes that all have the signal SIGSEGV with the reason SEGV_ACCERR. After searching for SEGV_ACCERR, the closest thing I have found to a human readable explanation is: Invalid Permissions for object
What does this mean in a more general sense? When would a SEGV_ACCERR arise? Is there more specific documentation on this reason?
This is an error that I have mostly seen on 64 bit iOS devices and can happen if multiple threads read and change a variable under ARC. For example, I fixed a crash today where multiple background threads were reading and using a static NSDate and NSString variable and updating them without doing any kind of locking or queueing.
Using core data objects on multiple threads can also cause this crash, as I have seen many times in my crash logs.
I also use Crittercism, and this particular crash was a SEGV_ACCERR that only affected 64 bit devices.
As stated in the man page of sigaction, SEGV_ACCERR is a signal code for SIGSEGV that specifies Invalid permissions for mapped object. Contrary to SEGV_MAPERR which means that the address is not mapped to a valid object, SEGV_ACCERR means the address matches an object, but for sure it is neither the good one, nor one the process is allowed to access.
I've seen this in cases where code tries to execute from places other than "text".
For eg, if your pointer is pointing to a function in heap or stack and you try to execute that code (from heap or stack), the CPU throws this exception.
It's possible to get a SEGV_ACCERR because of a stack overflow. Specifically, this happened to me on Android ARM64 with the following:
VeryLargeStruct s;
s = {}; // SEGV_ACCERR
It seems that the zero-initialization created a temporary that caused a stack overflow. This only happened with -O0; presumably the temporary was optimized away at higher optimization levels.
On android arm64 if stack.cpp contains:
struct VeryLargeStruct {
int array[4096*4096];
};
int main() {
struct VeryLargeStruct s;
s = {};
}
and typing:
aarch64-linux-android26-clang++ -std=c++20 -g -DANDROID_STL=c++_static -static-libstdc++ stack.cpp -o stack
adb push stack /data/local/tmp
adb shell /data/local/tmp/stack
/data/tombstones/tombstone_01 contains a SEGV_MAPERR, not SEGV_ACCERR:
id: 11363, tid: 11363, name: stack >>> /data/local/tmp/stack <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7ff2d02dd8
I get SEGV_ACCERR, when const.c contains:
int main() {
char *str="hello";
str[0]='H';
}
Then /data/tombstones/tombstone_00 contains:
pid: 9844, tid: 9844, name: consts >>> /data/local/tmp/consts <<<
Signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x55d10674e8

C++ Creating a SIGSEGV for debug purposes

I am working on a lock-free shared variable class, and I want to be able to generate a SIGSEGV fault to see if my implementation works as I planned. I've tried creating a function that modifies a pointer and read it 100 times. I then call this function in both threads and have the threads run infinitely within my program. This doesn't generate the error I want. How should I go about doing this?
edit
I don't handle segfaults at all, but they are generated in my program if I remove locks. I want to use a lock-less design, therefore i created a shared variable class that uses CAS to remain lockless. Is there are way that I can have a piece of code that will generate segfaults, so that i can use my class to test that it fixes the problem?
#include <signal.h>
raise(SIGSEGV);
Will cause an appropriate signal to be raised.
malloc + mprotect + dereference pointer
This mprotect man page has an example.
Derefencing pointer to unallocated memory (at least on my system):
int *a;
*a = 0;

Printing stack trace from a signal handler

I need to print stack trace from a signal handler of 64-bit mutli-threaded C++ application running on Linux. Although I found several code examples, none of them compiles. My blocking point is getting the caller's (the point where the signal was generated) address from the ucontext_t structure. All of the information I could find, points to the EIP register as either ucontext.gregs[REG_EIP] or ucontext.eip. It looks like both of them are x86-specific. I need 64-bit compliant code for both Intel and AMD CPUs. Can anybody help?
there is a glibc function backtrace. The man page lists an example the the call:
#define SIZE 100
void myfunc3(void) {
int j, nptrs;
void *buffer[100];
char **strings;
nptrs = backtrace(buffer, SIZE);
printf("backtrace() returned %d addresses\n", nptrs);
/* The call backtrace_symbols_fd(buffer, nptrs, STDOUT_FILENO)
would produce similar output to the following: */
strings = backtrace_symbols(buffer, nptrs);
if (strings == NULL) {
perror("backtrace_symbols");
exit(EXIT_FAILURE);
}
for (j = 0; j < nptrs; j++)
printf("%s\n", strings[j]);
free(strings);
}
See the man page for more context.
it's difficult to tell if this really is guaranteed to work from a signal handler, since posix lists only a few reentrant functions that are guaranteed to work. Remember: a signal handler may be called while the rest of your process is right in the middle of an malloc call.
My guess is, that this usually works, but it may fail from time to time. For debugging this may be good enough.
The usual way of getting a stack trace is to take the address of a local
variable, then add some magic number to it, depending on how the
compiler generates code (which may depend on the optimization options
used to compile the code), and work back from there. All very system
dependent, but doable if you know what you're doing.
Whether this works in a signal handler is another question. I don't
know about the platform you describe, but a lot of systems install a
separate stack for the signal handlers, with no link back to the
interrupted stack in user accessible memory.

g++ misunderstands c++ semantics

There are two possible solutions to the problem: I don't understand the c++ semantics or g++ does.
I am programming a simple network game now. I have been building a library the game uses to communicate over the network. There is a class designated to handle the connection between the apps. Another class implements server functionality so it possess a method accept(). The method is to return a Connection class.
There are a few way to return the class. I have tried these three:
Connection accept() {
...
return Connection(...);
}
Connection* accept() {
...
return new Connection(...);
}
Connection& accept() {
...
Connection *temp = new Connection(...);
return *temp;
}
All three were accepted by g++. The problem is that the third is somewhat faulty. When you use internal information of the object of type Connection, you will fail. I don't know what is wrong because all fields within the object look like initiasized. My problem is that when I use any function from protocol buffers library my program is terminated by Segmentation fault. The function below fails every it calls the protobuf library.
Annoucement Connection::receive() throw(EmptySocket) {
if(raw_input->GetErrno() != 0) throw EmptySocket();
CodedInputStream coded_input(raw_input);
google::protobuf::uint32 n;
coded_input.ReadVarint32(&n);
char *b;
int m;
coded_input.GetDirectBufferPointer((const void**)&b, &m);
Annoucement ann;
ann.ParseFromArray(b, n);
coded_input.Skip(n);
return ann;
}
I get this every time:
Program received signal SIGSEGV,
Segmentation fault. 0x08062106 in
google::protobuf::io::FileInputStream::CopyingFileInputStream::GetErrno
(this=0x20) at
/usr/include/google/protobuf/io/zero_copy_stream_impl.h:104
When I changed the accept() to the second version, it finnaly worked (the first is good too but I modified conception in the meanwhile).
Have you come across any problem that is similiar to this one? Why the third version of accept() is wrong? How should I debug the program to find such a horrible bug (I thought protobuf need some fix whereas the problem was not there)?
First, returning by reference something allocated on the heap is a sure recipe for a memory leak so I would never suggest actually doing that.
The second case can still result in a leak unless the ownership semantics are very well specified. Have you considered using a smart pointer instead of a raw pointer?
As for why it doesn't work, it probably has to do with ownership semantics and not because you're returning by reference, but I can't see a problem in the posted code.
"How should I debug the program to find such a horrible bug?"
If you are on Linux try running under valgrind - that should pick up any memory scribbling going on.
You overlooked raw_input=0x20 which is obviously an invalid pointer. This is in the helpful message you got in the debugger after the segfault.
For general problems of this type, learn to use Valgrind’s memcheck, which gives you messages about where your program abused memory.
Meanwhile I suggest you make sure you understand pass by value vs pass by reference (both pointer and C++ reference) and know when constructors, copy constructors and destructors are called.