Corrupt stack problem in C/C++ program - c++

I am running a C/C++ program in linux servers to serve videos. The program's(say named Plugin) core functionality is to convert videos and we fork a separate Plugin process for each video request. But I am having a weird problem for which sometimes server load average gets unexpectedly high. What I see from top command at this stage is that there are some processes which are running for long time and taking some huge CPU's.
When I debug this running program with gdb and backtrace stack,what I found is the corrupt stack: "Previous frame inner to this frame (corrupt stack?)". I have searched the net and found that this occurs if the program gets segmentation fault.
But what I know if the program gets segmentation fault, the program should crash and exit at that point. But surprisingly the program still running after segmentation fault.
What can be the causes of this? I know there must be some big problems in the program but I just can't understand from where to start fixing the problem...It would be great if any of you can show me some lights...
Thanks in advance

Attaching the debugger changes the behavior of the process so you won't get reliable investigation results most probably. Corrupted stack message from the debugger can mean that the particular debugger does not understand text info from the binary.
I would recommend running pstack several time subsequently on the problematic (this is known as "Monte Carlo performance profiling") and also attach strace or truss to the problematic and check what system calls is the process doing when consuming CPU.

Run your program under Valgrind and fix any invalid memory writes that it finds.

Certain optimisations, such as frame pointer omission, can make it harder for the debugger to understand the stack.

If you have the code, compile the program in debug and run Valgrind on it.
If you don't have the code, contact the author/provider of the program.
The corrupt stack message simply means the code is doing something weird with the memory. It does not mean the program has a segmentation fault. Also, the program can still run if it choose to handle the SIGSEGV signal.
If by forking you mean that you have some process which spawn and run other smaller processes, just monitor for such spikes and restart the process. This assumes that you have no access to the fix the program.

There could be some interesting manipulation of the stack taking place through assembly code manipulation, such as true tail-recursion optimization, self-modifying code, non-returning functions, etc. that may cause the debugger to be incapable of properly back-tracing the stack and causing it to trigger a corrupted stack error, but that doesn't necessarily mean that memory is corrupted ... but definitely something non-traditional is happening under the hood.

Related

Determining memory consumption per function call in C++ if program is killed due to OOM

I am running some calculations in a program written in C++. For some input parameters I am running out of memory and my program is killed. I tried to spot the problematic part of my program using gdb (which usually works if I have other issues such as accessing memory at the wrong location), but my program is killed by the OS (Linux) when running out of memory, which makes it impossible to use backtrace on it.
Using valgrind and massif does not help either, after there is no massif-log if the program is killed due to being out of memory.
Therefore, are there other approaches I could use? One approach I can see is to simply print the current function name each time I call it (and then check the log to see which function I called last before I ran out of memory), but that would add a lot of code I don't want to write by hand. Are there "automated" methods doing that for me?

Analysing a Crash

I'm writing a C++ Software Image processing tool.The tool works fine, but suddenly it stops and it never sends any exception or any crash or nothing that can let me which line or which area that does that crash.
How can I determine that faulty code in that situation?
There's a few things you can do:
First of all though, it sounds more like an infinite loop, deadlock, or like you're using all of your system resources and it's just slowing down and taking a very (possibly infinite) long time. If that's the case, you'll have to find it with debugging.
Things that you can try - not necessarily in this order:
Look for shared variables you're using.  Is there a chance that you
have a deadlock with threads and mutexes?  Think about it and try to
fix it.
Check for use of uninitialized variables/pointers.  Sometimes
(rarely) you can get very strange behavior when you invoke undefined
behavior - I'm not a windows C++ dev (I work on Linux), but it
wouldn't be the first time I saw a lockup from a segmentation fault.
Add error output (std::cerr/stderror) to your processing logic so
you can see how far in it crashes. After that, set a condition to
catch it around that point so you can watch it happen in the
debugger and see the state of your variables and what might be
wrong.
Do a stack trace so you can see what calls have been hit most
recently. This will at least let you know the last function chain
that executed.
Have you used stack tracing before?
Look up MSDN documentation on how to use them. They have different type of stack trace depending on your application.
You could
Use a debugger
Add logging code to your program
Cut sections of code from your program until it starts working
Start with the first.

Recover from crash with a core dump

A C++ program crashed on FreeBSD 6.2 and OS was kind enough to create a core dump. Is it possible to amputate some stack frames, reset the instruction pointer and restart the process in gdb, and how?
Is it possible to amputate some stack frames, reset the instruction pointer and restart the process in gdb?
I assume you mean: change the process state, and set it to start executing again (as if it never crashed in the first place).
No. For one thing, how do you propose GDB (if it magically had this capability) would handle your file descriptors (which the kernel automatically closed when your process died)?
Yes, gdb can debug core dumps just as well as running programs. Assuming that a.out is the name of your program's executable and that a.core is the name of your core file, invoke gdb like so:
gdb a.out a.core
And then you can debug like normal, except you cannot continue execution in any way (even if you could, the program would just crash again). You can examine the stack trace, registers, memory, etc.
Possible duplicate of this: Best practices for recovering from a segmentation fault
Summary: It is possible but not recommended. The way to do it is to usse setjmp() and longjmp() from a signal handler. (Please look at complete source code example in duplicate post.

What is the difference between valgrind and regular c++ run

I'm trying to identify a bug in I have in my code where I get seg. fault while trying to assign value to a pointer from a vector (it is describe better in the link). When I run the code using valgrind I don't get the seg.fault.
What does valgrind do differently. I think that I need to consider the memory management differences between valgrind session and regular c++ session but I don't really know
From Valgrind FAQ:
4.4. My program crashes normally, but doesn't under Valgrind, or vice
versa. What's happening?
When a program runs under Valgrind, its environment is slightly
different to when it runs natively. For example, the memory layout is
different, and the way that threads are scheduled is different.
Most of the time this doesn't make any difference, but it can,
particularly if your program is buggy. For example, if your program
crashes because it erroneously accesses memory that is unaddressable,
it's possible that this memory will not be unaddressable when run
under Valgrind. Alternatively, if your program has data races, these
may not manifest under Valgrind.
There isn't anything you can do to change this, it's just the nature
of the way Valgrind works that it cannot exactly replicate a native
execution environment. In the case where your program crashes due to a
memory error when run natively but not when run under Valgrind, in
most cases Memcheck should identify the bad memory operation.
So you can nothing to do with it. Actually you need not worry that you program not crashes under Valgrind. You should read error messages from it and fix them. Start with Invalid read/Invalid write errors. They are almost always indicate the bug in code. In this particular case you can also run your code in infinite loop from simple bash script utill it produces error message. Most likely you are working with invalid iterators and it is Undefined Behaviour in C++.
Maybe the issue might be timing dependent, When you run your code with valgrind it runs a little slower because valgrind collects and diagnoses your code at run time.
Valgrind keeps track of your programs memory usage. This is how it tells you about leaks. What this means is that it hijacks the malloc and such and uses its own so that it can achieve this. This means that probably when you run your code normally you read/write to some data you have freed accidentally causing segfault whereas it could be that valgrind is keeping this memory around to see if it is truly lost etc thereby meaning by (un)luck the memory is still valid. Just a guess.
Valgrind runs you program on a virtual CPU, that is, it executes every assembly instruction in software (apart from kernel calls). Multi-threaded programs get serialized, i.e. only one thread of execution is making progress at one time.
If your application is multi-threaded, when it is executed under valgrind race condition and the lack of synchronization may be masked by the thread serialization, so that the effects of such bugs are not observed.

What causes a Sigtrap in a Debug Session

In my c++ program I'm using a library which will "send?" a Sigtrap on a certain operations when
I'm debugging it (using gdb as a debugger). I can then choose whether I wish to Continue or Stop the program. If I choose to continue the program works as expected, but setting custom breakpoints after a Sigtrap has been caught causes the debugger/program to crash.
So here are my questions:
What causes such a Sigtrap? Is it a leftover line of code that can be removed, or is it caused by the debugger when he "finds something he doesn't like" ?
Is a sigtrap, generally speaking, a bad thing, and if so, why does the program run flawlessly when I compile a Release and not a Debug Version?
What does a Sigtrap indicate?
This is a more general approach to a question I posted yesterday Boost Filesystem: recursive_directory_iterator constructor causes SIGTRAPS and debug problems.
I think my question was far to specific, and I don't want you to solve my problem but help me (and hopefully others) to understand the background.
Thanks a lot.
With processors that support instruction breakpoints or data watchpoints, the debugger will ask the CPU to watch for instruction accesses to a specific address, or data reads/writes to a specific address, and then run full-speed.
When the processor detects the event, it will trap into the kernel, and the kernel will send SIGTRAP to the process being debugged. Normally, SIGTRAP would kill the process, but because it is being debugged, the debugger will be notified of the signal and handle it, mostly by letting you inspect the state of the process before continuing execution.
With processors that don't support breakpoints or watchpoints, the entire debugging environment is probably done through code interpretation and memory emulation, which is immensely slower. (I imagine clever tricks could be done by setting pagetable flags to forbid reading or writing, whichever needs to be trapped, and letting the kernel fix up the pagetables, signaling the debugger, and then restricting the page flags again. This could probably support near-arbitrary number of watchpoints and breakpoints, and run only marginally slower for cases when the watchpoint or breakpoint aren't frequently accessed.)
The question I placed into the comment field looks apropos here, only because Windows isn't actually sending a SIGTRAP, but rather signaling a breakpoint in its own native way. I assume when you're debugging programs, that debug versions of system libraries are used, and ensure that memory accesses appear to make sense. You might have a bug in your program that is papered-over at runtime, but may in fact be causing further problems elsewhere.
I haven't done development on Windows, but perhaps you could get further details by looking through your Windows Event Log?
While working in Eclipse with minGW/gcc compiler, I realized it's reacting very bad with vectors in my code, resulting to an unclear SIGTRAP signal and sometimes even showing abnormal debugger behavior (i.e. jumping somewhere up in the code and continuing execution of the code in reverse order!).
I have copied the files from my project into the VisualStudio and resolved the issues, then copied the changes back to eclipse and voila, worked like a charm. The reasons were like vector initialization differences with reserve() and resize() functions, or trying to access elements out of the bounds of the vector array.
Hope this will help someone else.
I received a SIGTRAP from my debugger and found out that the cause was due to a missing return value.
string getName() { printf("Name!");};