Automate gdb: show backtrace every 10 ms - profiling

I want to write a script for gdb, which will save backtrace (stack) of process every 10 ms. How can I do this?
It can be smth like call graph profiling for 'penniless' (for people, who can't use any sort of advanced profiler).
Yes, there are a lot of advanced profilers. For popular CPUs and for popular OSes. Shark is very impressive and easy to use, but I want to get a basic functionality with such script, working with gdb.

Can you get lsstack? Perhaps you could run that from a script outside your app. Why 10ms? Percentages will be about the same at 100ms or more. If the app is too fast, you could artificially slow it down with an outer loop, and that wouldn't change the percentages either. For that matter, you could just use Ctrl-C to get the samples manually under gdb, if the app runs long enough and if your goal is to find out where the performance problems are.

(1) Manual. Execute the following in a shell. Keep pressing Ctrl+C repeatedly on shell prompt.
gdb -x print_callstack.gdb -p pid
or, (2) send signals to pid repeatedly same number of times on another shell as in below loop
let count=0; \
while [ $count -le 100 ]; do \
kill -INT pid ; sleep 0.10; \
let $count=$count+1; \
done
The source of print_callstack.gdb from (1) is as below:
set pagination 0
set $count = 0
while $count < 100
backtrace
continue
set $count = $count + 1
end
detach
quit
man page of pstack https://linux.die.net/man/1/pstack

cat > gdb.run
set pagination 0
backtrace
continue
backtrace
continue
... as many more backtrace + continue's as needed
backtrace
continue
detach
quit
Of course, omit the duplicate newlines, how do you do single newlines in this forum software? :(
gdb -x gdb.run -p $pid
Then just use do
kill -INT $pid ; sleep 0.01
in a loop in another script.
kill -INT is what the OS does when you hit ctrl-C. Exercise for the reader: make the gdb script use a loop with $n iterations.

Related

Shell script to terminate program but causes output file to not be written

I am wanting to run a program in the background that collects some performance data and then run an application in the foreground. When the foreground application finishes it detects this and the closes the application in the background. The issue is that when the background application closes without first closing the file, I'm assuming, the output of the file remains empty. Is there a way to constantly write the output file so that if the background application unexpectedly closes the output is preserved?
Here is my shell script:
./background_application -o=output.csv &
background_pid=$!
./foreground_application
ps -a | grep foreground_application
if pgrep foreground_application > /dev/null
then
result=1
else
result=0
fi
while [ result -ne 0 ]
do
if pgrep RPx > /dev/null
then
result=1
else
result=0
fi
sleep 10
done
kill $background_pid
echo "Finished"
I have access to the source code for the background application written in C++ it is a basic loop and runs fflush(outputfile) every loop iteration.
This would be shorter:
./background_application -o=output.csv &
background_pid=$!
./foreground_application
cp output.csv output_last_look.csv
kill $background_pid
echo "Finished"

How to run record instruction-history and function-call-history in GDB?

(EDIT: per the first answer below the current "trick" seems to be using an Atom processor. But I hope some gdb guru can answer if this is a fundamental limitation, or whether there adding support for other processors is on the roadmap?)
Reverse execution seems to be working in my environment: I can reverse-continue, see a plausible record log, and move around within it:
(gdb) start
...Temporary breakpoint 5 at 0x8048460: file bang.cpp, line 13.
Starting program: /home/thomasg/temp/./bang
Temporary breakpoint 5, main () at bang.cpp:13
13 f(1000);
(gdb) record
(gdb) continue
Continuing.
Breakpoint 3, f (d=900) at bang.cpp:5
5 if(d) {
(gdb) info record
Active record target: record-full
Record mode:
Lowest recorded instruction number is 1.
Highest recorded instruction number is 1005.
Log contains 1005 instructions.
Max logged instructions is 200000.
(gdb) reverse-continue
Continuing.
Breakpoint 3, f (d=901) at bang.cpp:5
5 if(d) {
(gdb) record goto end
Go forward to insn number 1005
#0 f (d=900) at bang.cpp:5
5 if(d) {
However the instruction and function histories aren't available:
(gdb) record instruction-history
You can't do that when your target is `record-full'
(gdb) record function-call-history
You can't do that when your target is `record-full'
And the only target type available is full, the other documented type "btrace" fails with "Target does not support branch tracing."
So quite possibly it just isn't supported for this target, but as it's a mainstream modern one (gdb 7.6.1-ubuntu, on amd64 Linux Mint "Petra" running an "Intel(R) Core(TM) i5-3570") I'm hoping that I've overlooked a crucial step or config?
It seems that there is no other solution except a CPU that supports it.
More precisely, your kernel has to support Intel Processor Tracing (Intel PT). This can be checked in Linux with:
grep intel_pt /proc/cpuinfo
See also: https://unix.stackexchange.com/questions/43539/what-do-the-flags-in-proc-cpuinfo-mean
The commands only works in record btrace mode.
In the GDB source commit beab5d9, it is nat/linux-btrace.c:kernel_supports_pt that checks if we can enter btrace. The following checks are carried out:
check if /sys/bus/event_source/devices/intel_pt/type exists and read the type
do a syscall (SYS_perf_event_open, &attr, child, -1, -1, 0); with the read type, and see if it returns >=0. TODO: why not use the C wrapper?
The first check fails for me: the file does not exist.
Kernel side
cd into the kernel 4.1 source and:
git grep '"intel_pt"'
we find arch/x86/kernel/cpu/perf_event_intel_pt.c which sets up that file. In particular, it does:
if (!test_cpu_cap(&boot_cpu_data, X86_FEATURE_INTEL_PT))
goto fail;
so intel_pt is a pre-requisite.
How I've found kernel_supports_pt
First grep for:
git grep 'Target does not support branch tracing.'
which leads us to btrace.c:btrace_enable. After a quick debug with:
gdb -q -ex start -ex 'b btrace_enable' -ex c --args /home/ciro/git/binutils-gdb/install/bin/gdb --batch -ex start -ex 'record btrace' ./hello_world.out
Virtual box does not support it either: Extract execution log from gdb record in a VirtualBox VM
Intel SDE
Intel SDE 7.21 already has this CPU feature, checked with:
./sde64 -- cpuid | grep 'Intel processor trace'
But I'm not sure if the Linux kernel can be run on it: https://superuser.com/questions/950992/how-to-run-the-linux-kernel-on-intel-software-development-emulator-sde
Other GDB methods
More generic questions, with less efficient software solutions:
call graph: List of all function calls made in an application
instruction trace: Displaying each assembly instruction executed in gdb
At least a partial answer (for the "am I doing it wrong" aspect) - from gdb-7.6.50.20140108/gdb/NEWS
* A new record target "record-btrace" has been added. The new target
uses hardware support to record the control-flow of a process. It
does not support replaying the execution, but it implements the
below new commands for investigating the recorded execution log.
This new recording method can be enabled using:
record btrace
The "record-btrace" target is only available on Intel Atom processors
and requires a Linux kernel 2.6.32 or later.
* Two new commands have been added for record/replay to give information
about the recorded execution without having to replay the execution.
The commands are only supported by "record btrace".
record instruction-history prints the execution history at
instruction granularity
record function-call-history prints the execution history at
function granularity
It's not often that I envy the owner of an Atom processor ;-)
I'll edit the question to refocus upon the question of workarounds or plans for future support.

Efficient variable watching in C/C++

I'm currently writing a multi-threaded, high efficient and scalable algorithm. Because I have to guess a parameter for the code and I'm not sure how the calculation performs on a specific data set, I would like to watch a variable. The test only works with a real world, huge data set. It is possible to analyze the collected data after profiling. Imagine the following, simple code example (real code can contain multiple watch points:
// function get's called by loops of multiple threads
void payload(data_t* data, double threshold) {
double value = calc(data);
// here I want to watch the value
if (value < threshold) {
doSomething(data);
} else {
doSomethingElse(data);
}
}
I thought about the following approaches:
Using cout or other system outputs
Use a binary output (file, network)
Set a breakpoint via gdb/lldb
Use variable watching + logging via gdb/lldb
I'm not happy with the results because: To use 1. and 2. I have to change the code, but this is a debugging/evaluating task. Furthermore 1. requires locking and 1.+2. requires I/O operations, which heavily slows down the entire code and makes testing with real data nearly impossible. 3. is also too slow. To use 4., I have to know the variable address because it's not a global variable, but because threads get created by a dynamic scheduler, this would require breaking + stepping for each thread.
So my conclusion is, that I need a profiler/debugger that works at machine code level and dumps/logs/watches the variable without double->string conversion and is highly efficient, or to sum up with other words: I would like to profile the internal state of my algorithm without heavy slow-down and without doing deep modification. Does anybody know a tool that is able to this?
OK, this took some time but now I'm able to present a solution for my problem. It's called tracepoints. Instead of breaking the program every time, it's more lightweight and (ideally) doesn't change performance/timing too much. It does not require code changes. Here is an explanation how to use them using gdb:
Make sure you compiled your program with debugging symbols (using the -g flag). Now, start the gdb server and provide a network port (e.g. 10000) and the program arguments:
gdbserver :10000 ./program --parameters you --want --to use
Now, switch to a second console and start gdb (program parameters are not required here):
gdb ./program
All following commands are entered in the gdb command line interface. So let's connect to the server:
target remote :10000
After you got the connection confirmation, use trace or ftrace to set a tracepoint to a specific source location (try ftrace first, it should be faster but doesn't work on all platforms):
trace source.c:127
This should create tracepoint #1. Now you can setup an action for this tracepoint. Here I want to collect the data from myVariable
action 1
collect myVariable
end
If expect much data or want to use the data later (after restart), you can set a binary trace file:
tsave trace.bin
Now, start tracing and run the program:
tstart
continue
You can wait for program exit or interrupt your program using CTRL-C (still on gdb console, not on server side). Continue by telling gdb that you want to stop tracing:
tstop
Now we come the tricky part and I'm not really happy with the following code because it's really slow:
set pagination off
set logging file trace.txt
tfind start
while ($trace_frame != -1)
set logging on
printf "%f\n", myVariable
set logging off
tfind
end
This dumps all variable data to a text file. You can add some filter or preparation here. Now you're done and you can exit gdb. This will also shutdown the server:
quit
For detailed documentation especially for explanation of filtering and more advanced tracepoint positions, you can visit the following document: http://sourceware.org/gdb/onlinedocs/gdb/Tracepoints.html
To isolate trace file writing from your program execution, you can use cgroups or another network connected computer. When using another computer, you have to add the host to the port information (e.g. 192.168.1.37:10000). To load a binary trace file later, just start gdb as shown above (forget the server) and change the target:
gdb ./program
target tfile trace.bin
you can set hardware watchpoint using gdb debugger, for example if you have
bool b;
variable and you want to be notified every time the value of it has chenged (by any thread)
you would declare a watchpoint like this:
(gdb) watch *(bool*)0x7fffffffe344
example:
root#comp:~# gdb prog
GNU gdb (GDB) 7.5-ubuntu
Copyright ...
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses...done.
(gdb) watch *(bool*)0x7fffffffe344
Hardware watchpoint 1: *(bool*)0x7fffffffe344
(gdb) start
Temporary breakpoint 2 at 0x40079f: file main.cpp, line 26.
Starting program: /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = true
New value = false
main () at main.cpp:50
50 if (strcmp(mask, "255.0.0.0") != 0) {
(gdb) c
Continuing.
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = false
New value = true
main () at main.cpp:41
41 if (ifa ->ifa_addr->sa_family == AF_INET) { // check it is IP4
(gdb) c
Continuing.
mask:255.255.255.0
eth0 IP Address 192.168.1.5
[Inferior 1 (process 18146) exited normally]
(gdb) q

Executing commands with pipes and timeout in c++ (and reading stdout)

I need your help !
I made a reporting deamon (in c++) which needs to periodicaly execute a bunch of commands on a server.
A simple example command would be : "/bin/ps aux | /usr/bin/wc -l"
The first idea was to fork a child process that runs the command with popen() and set up an alarm() in the parent process to kill the child after 5 seconds if the command has not exited already.
I tried using "sleep 20000" as command, the child process is killed but the sleep command is still running... not good.
The second idea was to use execlp() instead of popen(), it works with simple commands (ie with no pipes) such as "ls -lisa" or "sleep 20000". I can get the result and the processes are killed if they're not done after 5 seconds.
Now I need to execute that "/bin/ps aux | /usr/bin/wc -l" command, obviously it won't work with execlp() directly, so I tried that "hack" :
execlp("sh","sh","-c","/bin/ps aux | /usr/bin/wc -l",NULL);
I works... or so I thought... I tried
execlp("sh","sh","-c","sleep 20000",NULL);
just to be sure and the child process is killed after 5 secs (my timeout) but the sleep command just stays there...
i'm open for suggestions (I'd settle for a hack) !
Thanks in advance !
TLDR;
I need a way to :
execute a "complex" command such as "/bin/ps aux | /usr/bin/wc -l"
get its output
make sure it's killed if it takes more than 5 seconds (the ps command is just and example, actual commands may hang forever)
Use timeout command from coreutils:
/usr/bin/timeout 5 /bin/sh -c "/bin/ps aux | /usr/bin/wc -l"

lockfile purpose in init.d daemon scripts (linux)

When looking at various daemon scripts in /etc/init.d/, I can't seem to understand the purpose of the 'lockfile' variable. It seems like the 'lockfile' variable is not being checked before starting the daemon.
For example, some code from /etc/init.d/ntpd:
prog=ntpd
lockfile=/var/lock/subsys/$prog
start() {
[ "$EUID" != "0" ] && exit 4
[ "$NETWORKING" = "no" ] && exit 1
[ -x /usr/sbin/ntpd ] || exit 5
[ -f /etc/sysconfig/ntpd ] || exit 6
. /etc/sysconfig/ntpd
# Start daemons.
echo -n $"Starting $prog: "
daemon $prog $OPTIONS
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && touch $lockfile
return $RETVAL
}
What is the 'lockfile' variable doing?
Also, when writing my own daemon in C++ (such as following the example at the bottom of http://www.itp.uzh.ch/~dpotter/howto/daemonize), do I put the compiled binary directly in /etc/init.d/ or do I put a script there that calls the binary. (i.e. replacing the 'daemon $prog' in the code above with a call to my binary?)
The whole thing is a very fragile and misguided attempt to keep track of whether a given daemon is running in order to know whether/how to shut it down later. Using pids does not help, because a pid is meaningless to any process except the direct parent of a process; any other use has unsolvable and dangerous race conditions (i.e. you could end up killing another unrelated process). Unfortunately, this kind of ill-designed (or rather undesigned) hackery is standard practice on most unix systems...
There are a couple approaches to solving the problem correctly. One is the systemd approach, but systemd is disliked among some circles for being "bloated" and for making it difficult to use a remote /usr mounted after initial boot. In any case, solutions will involve either:
Use of a master process that spawns all daemons as direct children (i.e. inhibiting "daemonizing" within the individual daemons) and which thereby can use their pids to watch for them exiting, keep track of their status, and kill them as desired.
Arranging for every daemon to inherit an otherwise-useless file descriptor, which it will keep open and atomically close only as part of process termination. Pipes (anonymous or named fifos), sockets, or even ordinary files are all possibilities, but file types which give EOF as soon as the "other end" is closed are the most suitable, since it's possible to block waiting for this status. With ordinary files, the link count (from stat) could be used but there's no way to wait on it without repeated polling .
In any case, the lockfile/pidfile approach is ugly, error-prone, and hardly any better than lazy approaches like simply killall foobard (which of course are also wrong).
What is the 'lockfile' variable doing?
It could be nothing or it could be eg. injected into $OPTIONS by this line
. /etc/sysconfig/ntpd
The daemon takes the option -p pidfile where $lockfile could go. The daemon writes its $PID in this file.
do I put the compiled binary directly in /etc/init.d/ or do I put a script there that calls the binary
The latter. There should be no binaries in /etc, and its customary to edit /etc/init.d scripts for configuration changes. Binaries should go to /(s)bin or /usr/(s)bin.
The rc scripts keep track of whether or not it is running and don't bother stopping what is not running.