GCC: Random builds causing Segmentation Fault during execution

GCC: Random builds causing Segmentation Fault during execution - c++

I'm currently struggle to unterstand a certain behavior which randomly happens after a software build. The software consists of multiple modules and, after building it, it will exit with a segmentation fault during execution.
I've identified two different steps during execution where this behavior happens.
The software exits with seg fault while executing a certain task.
The software exits with seg fault after completion of the task.
And in some cases, no seg fault occures at all. The problem is, that using a debug build, will only show me the cause of the second case (which is caused by a call to a third-party library). But I'm not really able to identify why this is happing.
It's a 32 bit build using gcc (SUSE Linux) 7.5.0.

You could use valgrind to check your program for any memory issues

Related

Valgrind reporting invalid read on one system but not another

I need to run a rather large software package on a new machine for work. The application is written in C and C++ and I am running on CentOS 6.5.
The program builds fine, but segfaults when I go to run it. Using valgrind, I see the following error reported at the location of the segfault:
==23843== Invalid read of size 4
[stack trace here]
==23843== Address 0x642e7464 is not stack'd, malloc'd or (recently) free'd
So for some reason we are reading from memory we aren't supposed to and are invoking undefined behaviour. When I tar up my source files, take them to another CentOS 6.5 machine (w/ same kernel) and compile them (with same makefiles and same GCC version) the program seems to run fine.
I ran valgrind on that machine as well and expected to see the invalid read again. My thought was that the invalid read would always be present, yet because the behaviour is undefined things just happened to work correctly on one machine and not on the other.
What I found, however, was that valgrind reports no read errors on the second machine. How could this be possible?

Valgrind makes the running environment more deterministic, but it does not eliminate all randomness. Maybe the other machine has bit different versions of libraries installed, or anything external it is using (files, network..) is different, the code execution does not have to be exactly the same.
You should look at the stack trace and analyze the code where the error happens. If it is not obvious from the stack trace alone, you can start valgrind with --vgdb=full parameter. It will pause the execution once the error happens and print out instructions how to attach gdb. Or you can just run the program under debugger directly - you wrote that it crashes even without valgrind.

Different library versions are the best guess, judging from the sparse information you gave. Things to try:
1) Bring both machines up to date via package manager and try again
2) Run ldd [binary] to see all libraries used by the program in question. Run something like md5sum on them on both machines to find out if there are differences.
In general I made the experience that valgrind is really bad at detecting invalid memory access on the stack, so this might be a hidden root cause. If all else fails, you might want to try using clang and address sanitizer. It might find things valgrind doesn't catch, and vice versa.

This could be caused by using different versions of Valgrind.
Some common false positive errors get removed in newer versions. Which would explain why one machine complains about it (older version) and another one doesn't (newer version).

Application segmentation fault, only when compiling on Windows with MinGW

I'm trying to compile one of my games on Windows, but unfortunately, no matter what, I'm getting this segmentation fault every time I run the program.
Compilation is successful, and without any warning.
Program received signal SIGSEGV, Segmentation fault.
__chkstk_ms () at ../../../../../src/gcc-4.8.1/libgcc/config/i386/cygwin.S:172
172 ../../../../../src/gcc-4.8.1/libgcc/config/i386/cygwin.S: No such file or directory.
I've tried:
Compiling on a Windows x86 machine
Compiling on a Windows x64 machine
nuwen.net's MinGW distro
TDM MinGW 4.8.1 SJLJ
MinGW builds x86 SJLJ
MinGW builds x64 SJLJ
MinGW builds x86 DW2
I've built all dependencies from source multiple times, tried linking both statically and dynamically.
Debugging doesn't help either - GDB gives me that error message just upon entering main(). I've used -g3 and -O0 flags.
How can I figure out what's happening?

On Windows, the default stack size is smaller than Linux. __chkstk_ms appears to be a function that crashes if you overflow your stack.
You may try to figure out where in your code you are creating huge stack variables or doing very deep recursion, and fix that.
Alternately, you may be able to add a compile flag to increase the stack size. See http://trac.sagemath.org/ticket/13960.

Try to increase stack size. Don't ask me how, I don't know.
The failing call (__chkstk_ms) looks like internal routine which checks if there is enough stack space for the function about to be executed.

c++ program terminating when one thread has access violation - how to catch this in linux - for win32 I get stacktraces in vs2010

c++ program terminated with no exceptions or stacktrace
I have a multi threaded application
If one of my threads has an access violation with reading out of bounds on an array (or any seg fault condition) my entire application immediately terminates.
If this happens on my windows counter part using visual studio I get a nice stacktrace of where the error was, and what the issue was.
I desperately need this type of debugging environment to be able to succeed at my project. I have too many threads and too many developers running different parts of the project to have one person not handle an exception properly and it destroys the entire project.
I am running Fedora Core 14
I am compiling with gcc 4.5.1
gdb is fedora 7.2-16.fc14
My IDE is eclipse Juno I am using the CDT builder
my toolchain is the cross GCC and my builder is the CDT Internal Builder
Is there ANY setting for the gdb or gcc or eclipse that will help me detect these type of situations?

That's what's supposed to happen. Under Unix, you get a full
core dump (which you can examine in the debugger), provided
you've authorized them. (ulimits -c—traditionally, they
were authorized by default, but Linux seems to have changed
this.)
Of course, to get any useful information from the core dump,
you'll need to have compiled the code with symbol information,
and not stripped it later. (On the other hand, you can copy the
core dump from your users machine onto your development machine,
and see what happened there.)

You're definitely looking for core dumps, as James Kanze wrote.
I would just add that core dumps will show you where the program crashed, which is not necessarily the same place as where the problem (memory corruption etc.) occurred. And of course some out-of-bounds reads/writes may not exhibit themselves by crashing immediately.
You can narrow the search by enabling check for incorrect memory allocations/deallocations in glibc. The simplest way is to set environmental variable MALLOC_CHECK_. Set it to 2 and glibc will check for heap corruption with every memory allocation/deallocation and abort the program when it founds any problem (producing a core dump, if enabled). This often helps to get closer to the real problem.
http://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html

What could be the reason that gdb running my program and just bash running my program show different outputs?

When I debug my c++ program with gdb in linux? I compile with -g and in fact, I see a lot of information in the debugger but it keeps telling me that my program exits normally and doesn't show any errors.
When I just run my program though, it doesn't finish and shows that not everything is alright (one assertion in malloc.c failed).
I also had the case, that gdb and just running the program showed different error messages. errors are alwazys related to wrong pointers, memory accesses.
Same actually goes for valgrind. Is there the possibility that it is not possible to use valgrind? In particular if there are different processes and a shared library included?
Running it with valgrind by: valgrind --trace-children=yes prog1 gives me no errors (which I cannot be true), if I enable the suppressed errors by: valgrind -v --trace-children=yes prog1, I get warnings about redirection conflicts (don't seem like errors either).

The problem with buggy programs is that their behavior is undefined. They work sometimes, and crash at other times unpredictably.
Both Valgrind and GDB affect the program timing, and may hide race conditions (which could happen for both multithreaded and multiprocess programs).
In addition, GDB disables address-space randomization, making addresses in the program repeatable from run to run. That's usually what you want while debugging, but your crash may only manifest itself for particular random layout of shared libraries, and that layout may never happen under GDB.
Your best bet is to enable generation of core dumps (ulimit -c unlimited), run the program outside GDB and have it abort (failing assert calls abort). Once you have a core, debug it with GDB: gdb /path/to/your/executable core.
For the problems you've described, Valgrind is usually a better tool. If multiple processes are involved, you'll want to run valgrind with --trace-children=yes flag.

This could be a multithreading problem and gdb slows it down enough that your threads don't conflict. Also maybe the program is being run optimized? Does valgrind say everything is ok?

I would enable all warnings in the compilation with -Wall, so that gcc will warn you about uninitialized variables, and then run it inside valgrind. One of them should tell you the problem.

c++ what is a good debugger for segmentation errors?

Does anyone know a good debugger for C++ segmentation errors on the Linux environment? It would be good enough if the debugger could trace which function is causing the error.

Also consider some techniques that do require from you code changes:
Run your app via valgrind memcheck tool. It's possible to catch error when you access wrong address (e.g. freed pointer, not initialized) - see here.
If you use extensievly stl/boost, consider compiling with -D_GLIBCXX_DEBUG and -D_GLIBCXX_DEBUG_PEDANTIC (see here). This can catch such errors as using invalidated iterator, accessing incorrect index in vector etc.
tcmalloc (from google per tool). When linking with it's debug enabled version, it may find memory related problems
Even more ...

GDB! what else is available on Linux?
Check this out for starting up with GDB, its a nice, concise and easy to understand tutorial.

GDB is indeed about the only choice. There are some GUI's but they are allmost all wrappers for gdb. Finding a segfault is easy. Make sure you compile with -g -O0 then start gdb with your program as argument.
In gdb type run
To start your program running, gdb will stop it is soon as it hits a segfault and report on which line that was. If you need a full backtrace then just type bt. To get out of gdb enter quit.
BTW gdb has a build in help, just type help.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

GCC: Random builds causing Segmentation Fault during execution - c++

You could use valgrind to check your program for any memory issues

Related

Valgrind reporting invalid read on one system but not another

Application segmentation fault, only when compiling on Windows with MinGW

c++ program terminating when one thread has access violation - how to catch this in linux - for win32 I get stacktraces in vs2010

What could be the reason that gdb running my program and just bash running my program show different outputs?

c++ what is a good debugger for segmentation errors?

Categories

Resources