Global application slow down after freeing memory while using Valgrind - c++

I'm having an issue with my application that I only observe while I use Valgrind.
My program involves a large simulation. When I unload the simulation portion of the program, while monitoring for errors with Valgrind it results in a permanent slowdown in the application. I would have expected the opposite as unloading basically leaves my application with very little to do... Valgrind reports no errors. This slowdown does not occur (or is not observable) when I don't use Valgrind.
I have tried benchmarking various portions of my application using timers and they all seem to slow down fairly evenly across the board. My application also contains multiple asyncronous threads that all slow down. Processor usage does not seem to increase when viewed through the system monitor...
I'll note that I'm using openGL with fglrx drivers which are known to have some issues with Valgrind.
Should I be concerned about this even though it only occurs with Valgrind? Is it likely that this slow down in caused by freeing a large amount of data while using Valgrind, or is it mostly likely indicative of serious bug in my code?
Basically I am trying to ascertain if this is entirely dependent on Valgrind usage or if Valgrind usage is amplifying the consequences of a bug in my code that otherwise is symptom free (but may later cause me problems).

Related

Software crash after steep rise of process' working set memory

I ask this question, because we're really stuck at finding the cause of a software crash. I know that questions like "Why does the software crash" are not appreciated, but we really don't know how to find the problem.
We currently do a longterm test of our software. To find potential memory leaks, we used the windows tool Performance monitor to track several memory metrics, such as Private bytes, Working set and Virtual bytes.
The software ran quite a long time (about 30 hours) without any problems. It does the same all the time, reading in an image from the harddrive, doing some inspection and showing some results.
Then suddenly it crashes. Inspecting the memory metrics in the performance monitor, we saw that strange steep rising of the working set bytes graph at 10.17AM. We encountered this several times and according to the dumpfiles, the exception code is always 0xc0000005 : "the thread tried to read from or write to a virtual address for which it does not have the appropriate access", but it appears at different positions, where no pointers are used.
Does someone know, what could be the cause of such a steep rise of the working set and why this could cause a software crash? How could we find out, if our software has a bug, when every time, the crash occurs the position of the crash is at another position?
The application is written in C++ and it runs on a windows 7 32bit pc.
It's actually impossible to know from the information that you have provided, but I would suggest that you have some memory corruption (hence the access violation). It could be a buffer-overflow issue... for example there is a missing null character from a string and so something is being appended indefinitely?
Recommended next step is to download the Debugging Tools for Windows suite. Setup WinDbg with your correct symbol files, and analyse the stack trace, to find the general area of the crash. Depending on the cause of the memory corruption this will be more or less useful. You could have corrupted the memory a long time before your crash occurs.
Ideally also run a static analysis tool on the code.
Given information you have now, there is little chance to get an answer. You need more information, more specifically:
Get more intelligence (is there anything specific about that files which cause crash? What about last-but-one file?)
Insert more tracing and logging (as much as you can without making it 2x slower). At least you'll see where it crashes, and then will be able to insert more tracing/logging around that place
As you're on Windows - consider handling c0000005 via _set_se_translator, converting it into C++ exception, and even more logging on the way this exception is unwinded.
There is no silver bullet for this kind of problems, only gathering more information and figuring it out.
P.S. As an unlikely shot - I've seen similar things to be caused by a bug in MS heap; if you're not using LFH yet (not sure, it might be default now) - there is an 1% chance changing your default heap to LFH will help.

how to hunt down memory leak valgrind says doesn't exist?

I have a program that accepts data from a socket, does some quality control and assorted other conditioning to it, then writes it out to a named pipe. I ran valgrind on it and fixed all the memory leaks that originally existed. I then created a 'demo' environment on a system where I had 32 instances of this program running, each being fed unique data and each outputting to it's own pipe. We tested it and everything looked to be fine. Then I tried stress testing it by boosting the rate at which data is sent in to an absurd rate and things looked to be good at first...but my programs kept consuming more and more memory until I had no resources left.
I turned to valgrind and ran the exact same setup except with each program running inside valgrind using leak-check=full. A few odd things happened. First, the memory did leak, but only to the point where each program had consumed .9 % of my memory (previously the largest memory hog had a full 6% of my memory). With valgrind running the CPU cost of the programs shot up and I was now at 100% cpu with a huge load-average, so it's possible the lack of available CPU caused the programs to all run slowly enough that the leak took too long to manifest. When I tried stopping these programs valgrind showed no direct memory leaks, it showed some potential memory leaks but I checked them and I don't think any of them represent real memory leaks; and besides which the possible memory leak only showed as a few kilobytes while the program was consuming over 100 MB. The reachable (non-leaked) memory reported by valgrind was also in the KB range, so valgrind seems to believe that my programs are consuming a fraction of the memory that Top says they are using.
I've run a few other tests and got odd results. A single program, even running at triple the rate my original memory-leak was detected at, never seems to consume more than .9% memory, two programs leak up to 1.9 and 1.3% memory respectively but no more etc, it's as if the amount of memory leaked, and the rate at which it leaks, is somehow dependent on how many instances of my program are running at one time; which makes no sense, each instance should be 100% independent of the others.
I also found if I run 32 instances with only one instance running in valgrind the valgrinded instance (that's a word if I say it is!) leaks memory, but at a slower rate than the ones running outside of valgrind. The valgrind instance will still say I have no direct leaks and reports far less memory consumption then Top shows.
I'm rather stumped as to what could be causing this result, and why valgrind refuses to be aware of the memory leak. I thought it might be an outside library, but I don't really use any external libraries; just basic C++ functions/objects. I also considered it could be the data written to the output pipe to fast causing the buffer to grow indefinitely, but 1) there should be an upper limit that such a buffer can grow and 2) once memory has been leaked if I drop the data input rate to nothing the memory stays consumed rather then slowly dropping back to a reasonable amount.
Can anyone give me a hint as to where I should look from here? I'm totally stumped as to why the memory is behaving this way.
Thanks.
This sounds like a problem I had recently.
If your program accepts data and buffers it internally without any limits, then it may be reading and buffering faster than it can output the data. In that case, memory use will continue to increase without limit.
The more instances of the program that you run, the slower each instance will go, and the faster the buffers will increase.
This may or may not be your problem, but without more information it is the best I can do.
You should first look for soft leak. It happens when some static or singleton gradually increases some buffer or container and collects trash into it. Technically it is not leak but its effects are as bad.
May I suggest you give a try with MemoryScape? This tool does a pretty good job in memory leak detection. It's not free but given the time and energy spent, it is worth trying.

What's the best way of finding a heap corruption that only occurs under a performance test?

The software I work (written in C++) on has a heap corruption problem at the moment. Our perf test team keep getting WER faults when the number of users logged on to the box reaches a certain threshhold but the dumps they've given me just show corruptions in inoncent areas (like when std::string frees it's underlying memory for example).
I've tried using Appverifier and this did throw up a number of issues which I've now fixed. However I'm now in the situation where the testers can load up the machine as much as possible with Appverifier and have a clean run but still get heap corruption when running without Appverifier (I guess since they can get more users on etc without). This has meant I've been unable to get a dump which actually shows the problem.
Does anyone have any other ideas for useful techniques or technologies I can use? I've done as much analysis as I can on the heap corruption dumps without appverifier but I can't see any common themes. No threads doing anything intersting at the same time as the crash, and the thread which crashes is innocent which makes me think the corruption occured some time before.
The best tool is Appverifier in combination with gFlags but there are many other solutions that may help.
For example, you could specify a heap check every 16 malloc, realloc, free, and _msize operations with the following code:
#include <crtdbg.h>
int main( )
{
int tmp;
// Get the current bits
tmp = _CrtSetDbgFlag(_CRTDBG_REPORT_FLAG);
// Clear the upper 16 bits and OR in the desired freqency
tmp = (tmp & 0x0000FFFF) | _CRTDBG_CHECK_EVERY_16_DF;
// Set the new bits
_CrtSetDbgFlag(tmp);
}
You have my sympathies: a very difficult problem to track down.
As you say normally these occur some time prior to the crash, generally as the result of a misbehaving write (e.g. writing to deleted memory, running off the end of an array, exceeding the allocated memory in a memcpy, etc).
In the past (on Linux, I gather you're on Windows) I've used heap-checking tools (valgrind, purify, intel inspector) but as you've observed these often affect the performance and thus obscure the bug. (You don't say whether its a multi-threaded app, or processing a variable dataset such as incoming messages).
I have also overloaded the new and delete operators to detect double deletes, but this is quite a specific situation.
If none of the available tools help, then you're on you're own and its going to be a long debugging process.
The best advice for this I can offer is to work on reducing the test scenario which will reproduce it. Then attempt to reduce the amount of code being exercised, i.e. stubbing out parts of functionality. Eventually you'll zero-in on the problem, but I've seen very good guys spend 6 weeks or more tracking these down on a large application (~1.5 million LOC).
All the best.
You should elaborate further on what your software actually does. Is it multi-threaded? When you talk about "number of users logged on to the box" does each user open a different instance of your software in a different session? Is your software a web service? Do instances talk to eachother (like with named pipes)?
If your error ONLY occurs at high load and does NOT occur when AppVerifier is running. The only two possibilities (without more information) that I can think of is a concurrency issue with how you've implemented multi-threading OR the test machine has a hardware issue that only manifests under heavy load (have your testers used more than one machine?).

Finding heap corruption

This is an extension of my previous question, Application crash with no explanation.
I have a lot of crashes that are presumably caused by heap corruption on an application server. These crashes only occur in production; they cannot be reproduced in a test environment.
I'm looking for a way to track down these crashes.
Application Verifier was suggested, and it would be fine, but it's unusable with our production server. When we try to start it in production with application verifier, it becomes so slow that it's completely unusable, even though this is a fairly powerful server (64-bit application, 16 GB memory, 8 processors). Running it without application verifier, it only uses about 1 GB of memory and no more than 10-15% of any processor's cycles.
Are there any other tools that will help find heap corruption, without adding a huge overhead?
Use the debug version of the Microsoft runtime libraries. Turn on red-zoning and get your heap automatically checked every 128 (say) heap operations by calling _CrtSetDbgFlag() once during initialisation.
_CRTDBG_DELAY_FREE_MEM_DF can be quite useful for finding memory-used-after-free bugs, but your heap size grows monitonically while using it.
Would there be any benefit in running it virtualized and taking scheduled snapshots, so that you hopefully can get a snapshot just a little before it actually crashes? Then take the pre-crash snapshot and start it in a lab environment. If you can get it to crash again there, restart the snapshot and start inspecting your server process.
Mudflap with GCC. It does code instrumentation for production code.
You have to compile your soft with -fmudflap. It will check any wrong pointer access (heap/stack/static). It is designed to work for production code with a little slowdown (between x1.5 to x5). You can also disable check at read access for speedup.

Identifying Major Page Fault cause

I've been asked to look at an internal application written in C++ and running on Linux thats having some difficulties.
Periodically it will have a large amount of major page faults (~200k), which cause the wall clock run time to increase by x10+, then on some runs it will have none.
I've tried isolating different pieces of the code but am struggling to repeat the page fault errors when testing it.
Does anyone have any suggestions for getting any more information out of the application/Linux on major page faults? All I have really is a total.
You may like to consider Valgrid, described on the home page as:
Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools.
Specifically Valgrind contains a tool called Massif, for which the following (paraphrased) overview is given in the manual:
Massif is a heap profiler. It measures how much heap memory your program uses. [..]
Heap profiling can help you reduce the amount of memory your program uses. On modern machines with virtual memory, this provides the following benefits:
It can speed up your program -- a smaller program will interact better with your machine's caches and avoid paging.
If your program uses lots of memory, it will reduce the chance that it exhausts your machine's swap space.