I was reading through this article detecting memory leak using windbg. I am trying to find a way to print userstack for all userptrs that appear when heap is filtered for a block of memory for a particular size. Is this possible ?
I am looking to achieve something like:
foreach(userPtr)
dump_to_a_file !heap -p -a userPtr
where userPtr is: UserPtr as under
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
003360e0 03f0 0000 [07] 003360e8 01f64 - (busy)
00338060 03f0 03f0 [07] 00338068 01f64 - (busy)
00339fe0 03f0 03f0 [07] 00339fe8 01f64 - (busy)
I am trying to do this in order to avoid a manual checking for thousands of such UserPtr. Thanks for any help that you could give.
This is the output of the !heap -flt s xxx command which contains a lot of text before and after the heap entry table. Let's get rid of that additional text by doing the hack
.shell -ci "!heap -flt s xxx" find "["
Now it's quite stable output, which can be used in a foreach loop:
.foreach (userptr {.shell -ci "!heap -flt s xxx" find "["}) { .echo ${userptr}}
See how it splits each line. Let's get rid of the first 4 tokens (entry, size, prev, flags) and last 3 tokens (usersize, -, state) using /pS 4 /ps 7.
.foreach /pS 4 /ps 7 (userptr {.shell -ci "!heap -flt s xxx" find "["}) { .echo ${userptr}}
Now that you have the pure addresses, do something useful with it, which is !heap -p -a
.foreach /pS 4 /ps 7 (userptr {.shell -ci "!heap -flt s xxx" find "["}) { !heap -p -a ${userptr}}
To dump it into a file, surround it by a log (.logopen and .logclose):
.logopen d:\debug\logs\heap.log; .foreach /pS 4 /ps 7 (userptr {.shell -ci "!heap -flt s xxx" find "["}) { !heap -p -a ${userptr}}; .logclose
There you go.
You can use umdh.exe for this. Umdh can dump all allocations, or a delta between two snapshots of the same process, which is the most convenient way to locate memory leaks. You can find the tool in the location where you install Windows debugger tools.
The catch when using umdh.exe that you need to know, is that is only resolves symbols when performing delta operation, i.e. comparing two snapshots of the process. If you really-really need every callstack, just make a fist snapshot at the very beginning of process execution.
Umdh.exe also aggregates allocations with the same callstack into buckets, so in diff output you will see something like this:
+ 18f0 ( 2354 - a64) 11 allocs BackTrace113457DC
+ c ( 11 - 5) BackTrace113457DC allocations
ntdll!RtlAllocateHeap+38CB9
msvcrt!_calloc_impl+134
msvcrt!_calloc_crt+16
msvcrt!_CRTDLL_INIT+FC
ntdll!LdrxCallInitRoutine+16
ntdll!LdrpCallInitRoutine+43
ntdll!LdrpInitializeThread+106
ntdll!_LdrpInitialize+6A
ntdll!LdrInitializeThunk+10
which is an example of call stack for 11 allocations, with number of allocations for this call stack increasing from 5 to 11 between snapshots, and memory consumed by these allocations from 0xa64 to 0x2354 bytes.
Sample steps to show how to use umdh.exe:
Set up _NT_SYMBOL_PATH environment variable, required for umdh. Assuming you are in directory where you have your private symbols (%CD%)
set _NT_SYMBOL_PATH=%CD%;srv*http://msdl.microsoft.com/download/symbols
Start your process under debugger, stop it at breakpoint or wherever you need.
Create your first snapshot:
umdh -p: -f:MyFirstSnapshot.txt
Resume execution of your process, stop it second time in place you need.
Create your second snapshot:
umdh -p: -f:MySecondSnapshot.txt
Create a diff with symbols resolved:
umdh MyFirstSnapshot.txt MySecondSnapshot.txt -f:MyDiff.txt
Related
I have a fairly robust CI test for a C++ library, these tests (around 50) run over the same docker image but in different machines.
In one machine ("A") all the memcheck (valgrind) tests pass (i.e. no memory leaks).
In the other ("B"), all tests produce the same valgrind error below.
51/56 MemCheck #51: combinations.cpp.x ....................***Exception: SegFault 0.14 sec
valgrind: m_libcfile.c:66 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed.
Cannot find memory tester output file: /builds/user/boost-multi/build/Testing/Temporary/MemoryChecker.51.log
The machines are very similar, both are intel i7.
The only difference I can think of is that one is:
A. Ubuntu 22.10, Linux 5.19.0-29, docker 20.10.16
and the other:
B. Fedora 37, Linux 6.1.7-200.fc37.x86_64, docker 20.10.23
and perhaps some configuration of docker I don't know about.
Is there some configuration of docker that might generate the difference? or of the kernel? or some option in valgrind to workaround this problem?
I know for a fact that in real machines (not docker) valgrind doesn't produce any memory error.
The options I use for valgrind are always -leak-check=yes --num-callers=51 --trace-children=yes --leak-check=full --track-origins=yes --gen-suppressions=all.
Valgrind version in the image is 3.19.0-1 from the debian:testing image.
Note that this isn't an error reported by valgrind, it is an error within valgrind.
Perhaps after all, the only difference is that Ubuntu version of valgrind is compiled in release mode and the error is just ignored. (<-- this doesn't make sense, valgrind is the same in both cases because the docker image is the same).
I tried removing --num-callers=51 or setting it at 12 (default value), to no avail.
I found a difference between the images and the real machine and a workaround.
It has to do with the number of file descriptors.
(This was pointed out briefly in one of the threads on valgind bug issues on Mac OS https://bugs.kde.org/show_bug.cgi?id=381815#c0)
Inside the docker image running in Ubuntu 22.10:
ulimit -n
1048576
Inside the docker image running in Fedora 37:
ulimit -n
1073741816
(which looks like a ridiculous number or an overflow)
In the Fedora 37 and the Ubuntu 22.10 real machines:
ulimit -n
1024
So, doing this in the CI recipe, "solved" the problem:
- ulimit -n # reports current value
- ulimit -n 1024 # workaround neededed by valgrind in docker running in Fedora 37
- ctest ... (with memcheck)
I have no idea why this workaround works.
For reference:
$ ulimit --help
...
-n the maximum number of open file descriptors
First off, "you are doing it wrong" with your Valgrind arguments. For CI I recommend a two stage approach. Use as many default arguments as possible for the CI run (--trace-children=yes may well be necessary but not the others). If your codebase is leaky then you may need to check for leaks, but if you can maintain a zero leak policy (or only suppressed leaks) then you can tell if there are new leaks from the summary. After your CI detects an issue you can run again with the kitchen sink options to get full information. Your runs will be significantly faster without all those options.
Back to the question.
Valgrind is trying to dup() some file (the guest exe, a tempfile or something like that). The fd that it fets is higher than what it thinks the nofile rlimit is, so it is asserting.
A billion files is ridiculous.
Valgrind will try to call prlimit RLIMIT_NOFILE, with a fallback call to rlimit, and a second fallback to setting the limit to 1024.
To realy see what is going on you need to modify the Valgrind source (m_main.c, setup_file_descriptors, set local show to True). With this change I see
fd limits: host, before: cur 65535 max 65535
fd limits: host, after: cur 65535 max 65535
fd limits: guest : cur 65523 max 65523
Otherwise with strace I see
2049 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=65535, rlim_max=65535}) = 0
2049 prlimit64(0, RLIMIT_NOFILE, {rlim_cur=65535, rlim_max=65535}, NULL) = 0
(all the above on RHEL 7.6 amd64)
EDIT: Note that the above show Valgrind querying and setting the resource limit. If you use ulimit to lower the limit before running Valgrind, then Valgrind will try to honour that limit. Also note that Valgrind reserves a small number (8) of files for its own use.
I add my program (load a file and do some computation) into the app of TizenRT on ARTIK053. The program can run successfully in the first time, but the data abort failure will be met when running it second time. The specific error info is as follows:
arm_dataabort:
Data abort. PC: 040d25a0 DFAR: 00000011 DFSR: 0000080d
up_assert: Assertion failed at file:armv7-r/arm_dataabort.c line: 111 task: ghsom_test
up_dumpstate: Current sp: 020c3eb0
up_dumpstate: User stack:
up_dumpstate: base: 020c3fd0
up_dumpstate: size: 00000fd4
up_dumpstate: used: 00000220
up_dumpstate: User Stack
up_stackdump: 020c3ea0: 00000003 020c3eb0 040c9638 041d38b8 00000000 040c9644 00000011 0000080
.....
.....
up_taskdump: Idle Task: PID=0 Stack Used=1024 of 1024
up_taskdump: hpwork: PID=1 Stack Used=164 of 2028
up_taskdump: lpwork: PID=2 Stack Used=164 of 2028
up_taskdump: logm: PID=3 Stack Used=300 of 2028
up_taskdump: LWIP_TCP/IP: PID=4 Stack Used=228 of 4068
up_taskdump: tash: PID=6 Stack Used=948 of 4076
up_taskdump: ghsom_test: PID=10 Stack Used=616 of 4052
I checked the remaining free RAM space, it is enough for my program. And I added some printing info into my main function to check on which line the error come out. I found that if I commented some lines before the line that the error come out, in the next time I running the program, the error line will move downward some lines. It seems like I released some stack space. So I guess it might be an issue related with the stack size that I can assign to a single proc. Anyone knows the reason, and how to solve the issue? To be mentioned, it only happens for the second time and thereafter I running the program.
With the stackdump you can almost always figure out where the error originated from.
Since you have the image file for your build you can do
arm-none-eabi-addr2line -f -p -i -b build/out/bin/tinyara 0xADDR
where ADDR would be the addr is one of the relevant addresses in the stack dump.
You can usually check the "current sp" (stack pointer) but often it points to the arm_dataabort shown in the failure above.
Then you can check the PC address and also look for addresses in the stack dump (starting from the back of it) that looks like the PC in value.
In your case it could be addresses like (in that order): 040c9644, 041d38b8, 040c9638
So basically:
arm-none-eabi-addr2line -f -p -i -b build/out/bin/tinyara 0x040c9644
notice the 0x in front of the address.
The command will give you a good indication for where this address is coming from in your binary like:
up_idlepm_static at /home/user/tizenrt/os/arch/arm/src/chip/s5j_idle.c:111
(inlined by) up_idle at /home/user/tizenrt/os/arch/arm/src/chip/s5j_idle.c:254
if the address is not pointing to code lines then it will look like:
?? ??:0
hope that helps
I am investigating a native memory leak through WinDbg.
With consecutive dumps, !heap -s returns more heaps every dump. Number of heaps returned in the first dump: 1170, second dump: 1208.
There are three sizes of heaps that are returning a lot:
0:000> !heap -stat -h 2ba60000
heap # 2ba60000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
1ffa 1 - 1ffa (35.44)
1000 1 - 1000 (17.73)
a52 1 - a52 (11.44)
82a 1 - 82a (9.05)
714 1 - 714 (7.84)
64c 1 - 64c (6.98)
Most blocks refer to the same callstack:
777a5887 ntdll!RtlAllocateHeap
73f9f1de sqlsrv32!SQLAllocateMemory
73fc0370 sqlsrv32!SQLAllocConnect
73fc025d sqlsrv32!SQLAllocHandle
74c6a146 odbc32!GetInfoForConnection
74c6969d odbc32!SQLInternalDriverConnectW
74c6bc24 odbc32!SQLDriverConnectW
74c63141 odbc32!SQLDriverConnect
When will a new heap will be created, and how you would dive deeper into this to find the root cause?
If you are able to do live debugging, you can try to set a break:
bp ntdll!RtlCreateHeap "kc;gc"
will display the call stack and continue. Maybe you see the culprit.
Do also the same with ntdll!RtlDebugCreateHeap
I'm trying to analyze a managed process memory dump an suspect if for native memory leaks. In order to be able to use windbg (and use !heap extension from there) I activated user mode call stacks for the server process
I see a lot of blocks of size 68. And among those blocks (those that I could manually verify using !heap -p -a) there are many call-stacks of the form
!heap -p -a 000000003ca5cfd0
address 000000003ca5cfd0 found in
_HEAP # 1ea0000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
000000003ca5cfa0 0009 0000 [00] 000000003ca5cfd0 00068 - (busy)
7766bbed ntdll! ?? ::FNODOBFM::`string'+0x000000000001913b
7fef7b76a57 msvcr120!malloc+0x000000000000005b
7fef7b76967 msvcr120!operator new+0x000000000000001f
7fe9a5cdaf8 +0x000007fe9a5cdaf8
Do you have any idea what are these allocations because they take hundreds of MB on my dump file ?
EDIT
lm shows the following around the area 7fe9a5cdaf8 (truncated)
start end module name
00000000`773b0000 00000000`774cf000 kernel32 (pdb symbols)
00000000`774d0000 00000000`775ca000 user32 (deferred)
00000000`775d0000 00000000`77779000 ntdll (pdb symbols)
00000000`77790000 00000000`77797000 psapi (deferred)
00000000`777a0000 00000000`777a3000 normaliz (deferred)
00000001`3f810000 00000001`3f818000 ManagedService (deferred)
000007fe`dd2d0000 000007fe`de398000 System_Web_ni (deferred)
I assume that there was no native image created for your application (using NGen). In that case the module (DLL) only contains IL code which will never be executed. So, from native point of view, there won't be any stacks pointing to inside the module.
Instead the IL code will be JIT compiled to another location in memory, e.g. 7fe9a5cdaf8 in your case. That's where real code is executed, so that's what you see from native side.
To revert a JIT compiled instruction into its .NET method descriptor, do the following:
0:000> .symfix
0:000> .loadby sos mscorwks ; *** .NET 2
0:000> .loadby sos clr ; *** .NET 4
0:000> !ip2md 7fe9a5cdaf8
The output should then show the .NET method name (example here, since I don't have your dump):
MethodDesc: 000007ff00033450
Method Name: ManagedService.Program.Main()
Class: 000007ff00162438
MethodTable: 000007ff00033460
mdToken: 0600001f
Module: 000007ff00032e30
IsJitted: yes
CodeAddr: 000007ff00170120
I want to write a script for gdb, which will save backtrace (stack) of process every 10 ms. How can I do this?
It can be smth like call graph profiling for 'penniless' (for people, who can't use any sort of advanced profiler).
Yes, there are a lot of advanced profilers. For popular CPUs and for popular OSes. Shark is very impressive and easy to use, but I want to get a basic functionality with such script, working with gdb.
Can you get lsstack? Perhaps you could run that from a script outside your app. Why 10ms? Percentages will be about the same at 100ms or more. If the app is too fast, you could artificially slow it down with an outer loop, and that wouldn't change the percentages either. For that matter, you could just use Ctrl-C to get the samples manually under gdb, if the app runs long enough and if your goal is to find out where the performance problems are.
(1) Manual. Execute the following in a shell. Keep pressing Ctrl+C repeatedly on shell prompt.
gdb -x print_callstack.gdb -p pid
or, (2) send signals to pid repeatedly same number of times on another shell as in below loop
let count=0; \
while [ $count -le 100 ]; do \
kill -INT pid ; sleep 0.10; \
let $count=$count+1; \
done
The source of print_callstack.gdb from (1) is as below:
set pagination 0
set $count = 0
while $count < 100
backtrace
continue
set $count = $count + 1
end
detach
quit
man page of pstack https://linux.die.net/man/1/pstack
cat > gdb.run
set pagination 0
backtrace
continue
backtrace
continue
... as many more backtrace + continue's as needed
backtrace
continue
detach
quit
Of course, omit the duplicate newlines, how do you do single newlines in this forum software? :(
gdb -x gdb.run -p $pid
Then just use do
kill -INT $pid ; sleep 0.01
in a loop in another script.
kill -INT is what the OS does when you hit ctrl-C. Exercise for the reader: make the gdb script use a loop with $n iterations.