Identifying Major Page Fault cause - c++

I've been asked to look at an internal application written in C++ and running on Linux thats having some difficulties.
Periodically it will have a large amount of major page faults (~200k), which cause the wall clock run time to increase by x10+, then on some runs it will have none.
I've tried isolating different pieces of the code but am struggling to repeat the page fault errors when testing it.
Does anyone have any suggestions for getting any more information out of the application/Linux on major page faults? All I have really is a total.

You may like to consider Valgrid, described on the home page as:
Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools.
Specifically Valgrind contains a tool called Massif, for which the following (paraphrased) overview is given in the manual:
Massif is a heap profiler. It measures how much heap memory your program uses. [..]
Heap profiling can help you reduce the amount of memory your program uses. On modern machines with virtual memory, this provides the following benefits:
It can speed up your program -- a smaller program will interact better with your machine's caches and avoid paging.
If your program uses lots of memory, it will reduce the chance that it exhausts your machine's swap space.

Related

Troubleshoot C++ program memory usage issue

I am authoring a C++ program and find it consumes too much memory. I would like to know which part of the program consumes the most number of memory, ideally, I would like to know how much percentage of memory are consumed by what kind of C++ objects the program is using at a particular moment.
In Java, I know tools like Eclipse Memory Analyzer (https://www.eclipse.org/mat/) which could take a heap dump and show/visualize such memory usage, and I wonder if this can be done for a C++ program. For example, I expect to use a tool/approach letting me know a particular vector<shared_ptr<MyObject>> is holding 30% of the memory.
Note:
I develop the program mainly on macOS (compile using Apple Clang), so it will be better if the approach works on macOS. But I do deploy to Linux as well (compile using gcc) so approaches/tools on Linux is okay.
I tried using Apple's Intruments for such purpose, but so far I can only use it to find memory allocation issue. I have no idea how to figure out the memory consumption of the program at a particular moment (the memory consumption should be related with C++ objects in the program so that I can do some action to reduce it accordingly).
I don't find an easy way to visualize/summarize each part of my program's memory yet. So far, the best tool/approach that I found is Apple's Instruments (if you are on macOS).
By using Instruments, you can use Allocations profiling template. When using this profiling template, you can choose File ==> Recording Options ==> Check Discard events for freed memory option
And you will be able to figure out the un-free memory (aka. the data that are still in the memory) during allocation recording. If you have your program's debug symbol loaded, you can see which function leads to this result.
Although this doesn't address all the issues, it does help to identify part of the problem.

Checking all sorts of memory usage during the runtime of a C++ Application

I'm using CentOS 7 and I'm running a C++ Application. Recently I switched to a newer version of a library which the application was using for various MySQL C API functions. But after integrating the new library, I saw a tremendous increase in memory usage of the program i.e. the application crashes if left running for more than a day or two. Precisely, what happens is the memory usage for the application starts increasing upto a point where the application alone is using 74.9% of total memory of the system and then it is forcefully shut down by the system.
Is there any way of how to track memory usage of the whole application including the static variables as well. I've already tried valgrind's tool Massif.
Can anyone tell me what could be the possible reasons for the increased memory usage or any tools that can give me a deep insight of how the memory is being allocated (both static and dynamic). Is there any tool which can tell us about Memory Allocation for a C++ Application running in a linux environment?
Thanks in advance!
Static memory is allocate when the program starts. Are you seeing memory growth or a startup increase?
Since it takes 'a day or two to crash', the trouble is likely a memory leak or unbounded growth of a data structure. Valgrind should be able to help with both. If valgrind shows a big leak with the --leak-check-full option then you will likely have found the issue.
To check for unbounded growth, put a preemptive _exit() in the program at a point where you suspect the heap has grown. For example, put a timer on the main loop and have the program _exit after 10 minutes. If the valgrind shows a big 'in use at exit' then you likely have unbounded growth of a data structure but not a leak. Massif can help track this down. The ms_print gives details of allocations with function stack.
If you find an issue, try switching back to the older version of your library. If the problem goes away, check and make sure you are using the API properly in the new version. If you don't have the source code then you are a bit stuck in terms of a fix.
If you want to go the extra mile, you can write a shared library interposer for malloc/free to see what is happening. Here is a good start. Linux has the backtrace functionality that can help with determining the exact stack.
Finally, if you must use the 3rd party library and find the heap growing without bound or leaking then you can use the shared library interposer to directly call free/delete. This is a risky last-ditch unrecommended strategy but I've used in production to limp a process along.

Finding memory consumed using core file

I am analysing high memory consumption problem in our software. I have a core file corresponding to this high memory consumption(this core file is generated by killing our application which generates core file). But I am not able to view the actual memory consumption using this core file. I used Totalview and gdb...using these two I am not getting a snapshot of the total memory consumed by my process and which library is eating up all the memory.
This memory consumption is hitting us over 10 to 20 days of time and hence I am trying to find out what has caused this high memory consumption.
Can valgrind help me in analysing this core file?
Any input/suggestion is highly appreciated.
#suresh,
Hi, I'm the product manager for TotalView at Rogue Wave Software.
Can you describe the scenario a bit more? Is the program running along with "normal memory consumption" for a long time and then suddenly the memory consumption goes through the roof? Or is the program slowly and steadily consuming memory till it exhausts available resources?
When it is crashing is it crashing because it literally runs out of memory or are you killing it because it has started swapping and being unresponsive?
In general I'd recommend running it under MemoryScape (in TotalView or the Standalone version) and when it starts to show unexpected memory consumption you want to pause it and run a memory leak report. It is likely that this will point right to the problem.
It is possible that the memory use isn't a "classical" leak because you still have pointers referencing the data -- but you are simply over-allocating. In this case you won't see anything on the leak report, but you may be able to pick out which allocation is "gone bad" by watching which allocations are growing. There are a number of ways to do this.
Pause periodically and note how the heap usage breaks down in the Heap Status Source Code report. For example you may note that the number of allocations associated with a specific source code file just keeps increasing.
If you are using TotalView you can use the "set heap baseline" functionality when the program seems be behaving well then filter against this baseline. Again you may want to use the source code report (though the graphical or backtrace reports support filtering too).
Or you can use the "export memory data" feature to store an image of what the "normal" heap status is. This creates a binary heap status file. Then let the program run till you get into the state where your program has high memory consumption. At that point you pause your live app, load the stored heap data file and you can do a comparison.
Wow, this is getting long. One final thought. You said you are getting core files. Under the debugger you should get a chance to examine the running program before it gets cleaned up. If this doesn't happen let me know. If you really want to work via core files (for example, this is happening in a production environment and you don't want to run the debugger there) let us know -- there are techniques where we can instrument the application using the HIA and then enable you to do offline analysis of your heap status.
Good luck!
Chris Gotbrath
Principal Product Manager for TotalView and ThreadSpotter, Rogue Wave Software
email: first . last at roguewave . com

Global application slow down after freeing memory while using Valgrind

I'm having an issue with my application that I only observe while I use Valgrind.
My program involves a large simulation. When I unload the simulation portion of the program, while monitoring for errors with Valgrind it results in a permanent slowdown in the application. I would have expected the opposite as unloading basically leaves my application with very little to do... Valgrind reports no errors. This slowdown does not occur (or is not observable) when I don't use Valgrind.
I have tried benchmarking various portions of my application using timers and they all seem to slow down fairly evenly across the board. My application also contains multiple asyncronous threads that all slow down. Processor usage does not seem to increase when viewed through the system monitor...
I'll note that I'm using openGL with fglrx drivers which are known to have some issues with Valgrind.
Should I be concerned about this even though it only occurs with Valgrind? Is it likely that this slow down in caused by freeing a large amount of data while using Valgrind, or is it mostly likely indicative of serious bug in my code?
Basically I am trying to ascertain if this is entirely dependent on Valgrind usage or if Valgrind usage is amplifying the consequences of a bug in my code that otherwise is symptom free (but may later cause me problems).

How to profile memory usage?

I am aware of Valgrind, but it just detects memory management issues. What I am searching is a tool that gives me an overview, which parts of my program do consume how much memory. A graphical representation with e.g. a tree map (as KCachegrind does for Callgrind) would be cool.
I am working on a Linux machine, so windows tools will not help me very much.
Use massif, which is part of the Valgrind tools. massif-visualizer can help you graph the data or you can just use the ms_print command.
Try out the heap profiler delivered with gperftools, by Google. I've always built it from sources, but it's available as a precompiled package under several Linux distros.
It's as simple to use as linking a dynamic library to your executables and running the program. It collects information about every dynamic memory allocation (as far as I've seen) and save to disk a memory dump every time one of the following happens:
HEAP_PROFILE_ALLOCATION_INTERVAL bytes have been allocated by the program (default: 1Gb)
the high-water memory usage mark increases by HEAP_PROFILE_INUSE_INTERVAL bytes (default: 100Mb)
HEAP_PROFILE_TIME_INTERVAL seconds have elapsed (default: inactive)
You explicitly call HeapProfilerDump() from your code
The last one, in my experience, is the most useful because you can control exactly when to have a snapshot of the heap usage and then compare two different snapshots and see what's wrong.
Eventually, there are several possible output formats, like textual or graphical (in the form of a directed graph):
Using this tool I've been able to spot incorrect memory usages that I couldn't find using Massif.
A "newer" option is HeapTrack. Contrary to massif, it is an instrumented version of malloc/free that stores all the calls and dumps a log.
The GUI is nice (but requires Qt5 IIRC) and the results timings (because you may want to track time as well) are less biased than valgrind (as they are not emulated).
Use callgrind option with valgrind