Efficient variable watching in C/C++

Efficient variable watching in C/C++ - c++

I'm currently writing a multi-threaded, high efficient and scalable algorithm. Because I have to guess a parameter for the code and I'm not sure how the calculation performs on a specific data set, I would like to watch a variable. The test only works with a real world, huge data set. It is possible to analyze the collected data after profiling. Imagine the following, simple code example (real code can contain multiple watch points:
// function get's called by loops of multiple threads
void payload(data_t* data, double threshold) {
double value = calc(data);
// here I want to watch the value
if (value < threshold) {
doSomething(data);
} else {
doSomethingElse(data);
}
}
I thought about the following approaches:
Using cout or other system outputs
Use a binary output (file, network)
Set a breakpoint via gdb/lldb
Use variable watching + logging via gdb/lldb
I'm not happy with the results because: To use 1. and 2. I have to change the code, but this is a debugging/evaluating task. Furthermore 1. requires locking and 1.+2. requires I/O operations, which heavily slows down the entire code and makes testing with real data nearly impossible. 3. is also too slow. To use 4., I have to know the variable address because it's not a global variable, but because threads get created by a dynamic scheduler, this would require breaking + stepping for each thread.
So my conclusion is, that I need a profiler/debugger that works at machine code level and dumps/logs/watches the variable without double->string conversion and is highly efficient, or to sum up with other words: I would like to profile the internal state of my algorithm without heavy slow-down and without doing deep modification. Does anybody know a tool that is able to this?

OK, this took some time but now I'm able to present a solution for my problem. It's called tracepoints. Instead of breaking the program every time, it's more lightweight and (ideally) doesn't change performance/timing too much. It does not require code changes. Here is an explanation how to use them using gdb:
Make sure you compiled your program with debugging symbols (using the -g flag). Now, start the gdb server and provide a network port (e.g. 10000) and the program arguments:
gdbserver :10000 ./program --parameters you --want --to use
Now, switch to a second console and start gdb (program parameters are not required here):
gdb ./program
All following commands are entered in the gdb command line interface. So let's connect to the server:
target remote :10000
After you got the connection confirmation, use trace or ftrace to set a tracepoint to a specific source location (try ftrace first, it should be faster but doesn't work on all platforms):
trace source.c:127
This should create tracepoint #1. Now you can setup an action for this tracepoint. Here I want to collect the data from myVariable
action 1
collect myVariable
end
If expect much data or want to use the data later (after restart), you can set a binary trace file:
tsave trace.bin
Now, start tracing and run the program:
tstart
continue
You can wait for program exit or interrupt your program using CTRL-C (still on gdb console, not on server side). Continue by telling gdb that you want to stop tracing:
tstop
Now we come the tricky part and I'm not really happy with the following code because it's really slow:
set pagination off
set logging file trace.txt
tfind start
while ($trace_frame != -1)
set logging on
printf "%f\n", myVariable
set logging off
tfind
end
This dumps all variable data to a text file. You can add some filter or preparation here. Now you're done and you can exit gdb. This will also shutdown the server:
quit
For detailed documentation especially for explanation of filtering and more advanced tracepoint positions, you can visit the following document: http://sourceware.org/gdb/onlinedocs/gdb/Tracepoints.html
To isolate trace file writing from your program execution, you can use cgroups or another network connected computer. When using another computer, you have to add the host to the port information (e.g. 192.168.1.37:10000). To load a binary trace file later, just start gdb as shown above (forget the server) and change the target:
gdb ./program
target tfile trace.bin

you can set hardware watchpoint using gdb debugger, for example if you have
bool b;
variable and you want to be notified every time the value of it has chenged (by any thread)
you would declare a watchpoint like this:
(gdb) watch *(bool*)0x7fffffffe344
example:
root#comp:~# gdb prog
GNU gdb (GDB) 7.5-ubuntu
Copyright ...
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses...done.
(gdb) watch *(bool*)0x7fffffffe344
Hardware watchpoint 1: *(bool*)0x7fffffffe344
(gdb) start
Temporary breakpoint 2 at 0x40079f: file main.cpp, line 26.
Starting program: /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = true
New value = false
main () at main.cpp:50
50 if (strcmp(mask, "255.0.0.0") != 0) {
(gdb) c
Continuing.
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = false
New value = true
main () at main.cpp:41
41 if (ifa ->ifa_addr->sa_family == AF_INET) { // check it is IP4
(gdb) c
Continuing.
mask:255.255.255.0
eth0 IP Address 192.168.1.5
[Inferior 1 (process 18146) exited normally]
(gdb) q

Related

How to retain a stacktrace when Cortex-M3 gone in hardfault?

Using the following setup:
Cortex-M3 based µC
gcc-arm cross toolchain
using C and C++
FreeRtos 7.5.3
Eclipse Luna
Segger Jlink with JLinkGDBServer
Code Confidence FreeRtos debug plugin
Using JLinkGDBServer and eclipse as debug frontend, I always have a nice stacktrace when stepping through my code. When using the Code Confidence freertos tools (eclipse plugin), I also see the stacktraces of all threads which are currently not running (without that plugin, I see just the stacktrace of the active thread). So far so good.
But now, when my application fall into a hardfault, the stacktrace is lost.
Well, I know the technique on how to find out the code address which causes the hardfault (as seen here).
But this is very poor information compared to full stacktrace.
Ok, some times when falling into hardfault there is no way to retain a stacktrace, e.g. when the stack is corrupted by the faulty code. But if the stack is healty, I think that getting a stacktrace might be possible (isn't it?).
I think the reason for loosing the stacktrace when in hardfault is, that the stackpointer would be swiched from PSP to MSP automatically by the Cortex-M3 architecture. One idea is now, to (maybe) set the MSP to the previous PSP value (and maybe have to do some additional stack preperation?).
Any suggestions on how to do that or other approaches to retain a stacktrace when in hardfault?
Edit 2015-07-07, added more details.
I uses this code to provocate a hardfault:
__attribute__((optimize("O0"))) static void checkHardfault() {
volatile uint32_t* varAtOddAddress = (uint32_t*)-1;
(*varAtOddAddress)++;
}
When stepping into checkHardfault(), my stacktrace looks good like this:
gdb-> backtrace
#0 checkHardfault () at Main.cxx:179
#1 0x100360f6 in GetOneEvent () at Main.cxx:185
#2 0x1003604e in executeMainLoop () at Main.cxx:121
#3 0x1001783a in vMainTask (pvParameters=0x0) at Main.cxx:408
#4 0x00000000 in ?? ()
When run into the hardfault (at (*varAtOddAddress)++;) and find myself inside of the HardFault_Handler(), the stacktrace is:
gdb-> backtrace
#0 HardFault_Handler () at Hardfault.c:312
#1 <signal handler called>
#2 0x10015f36 in prvPortStartFirstTask () at freertos/portable/GCC/ARM_CM3/port.c:224
#3 0x10015fd6 in xPortStartScheduler () at freertos/portable/GCC/ARM_CM3/port.c:301
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The quickest way to get the debugger to give you the details of the state prior to the hard fault is to return the processor to the state prior to the hard fault.
In the debugger, write a script that takes the information from the various hardware registers and restore PC, LR, R0-R14 to the state just prior to causing the hard fault, then do your stack dump.
Of course, this isn't always helpful when you end up at the hard fault because of popping stuff off of a blown stack or stomping on stuff in memory. You generally tend to corrupt a bunch of the important registers, return back to some crazy spot in memory, and then execute whatever's there. You can end up hard faulting many thousands (millions?) of cycles after your real problem happens.

Consider using the following gdb macro to restore the register contents:
define hfstack
set $frame_ptr = (unsigned *)$sp
if $lr & 0x10
set $sp = $frame_ptr + (8 * 4)
else
set $sp = $frame_ptr + (26 * 4)
end
set $lr = $frame_ptr[5]
set $pc = $frame_ptr[6]
bt
end
document hfstack
set the correct stack context after a hard fault on Cortex M
end

What could cause std::difftime to create a SIGBUS crash?

Today, I had to realize to my horror that my C++ simulation program crashed after running for 12 days, just several lines before its end, leaving me with nothing but a (truncated) core dump.
Analysis of the core dump with gdb revealed, that the
Program terminated with signal SIGBUS, Bus error.
and that the crash occured at the following line of my code:
seconds = std::difftime(stopTime, startTime); // seconds is of type double
The variables stopTime and startTime are of type std::time_t and I was able to extract their values at crash time from the core dump:
startTime: 1426863332
stopTime: 1427977226
The stack trace above the difftime-call looks like this:
#0 0x.. in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#1 0x.. in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2
I wrote a small program to reproduce the error, but without success. Just calling std::difftime(stopTime, startTime) with the above values does not cause a SIGBUS crash. Of course, I don't want that to happen again. I have successfully executed the same program several times before (although with different arguments) with comparable execution times. What could cause this problem and how can I prevent it in the future?
Here is some additional system information.
GCC: (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]
Linux Kernel: 3.11.10-25-desktop, x86_64
C++ standard library: 6.0.18
Edit
Here is some more context. First, the complete stack trace (ellipsis [..] mine):
#0 0x00007f309a4a5bca in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#1 0x00007f309a4ac195 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2
#2 0x0000000000465453 in CStopwatch::getTime (this=0x7fff0db48c60, delimiterHourMinuteSecondsBy="") at [..] CStopwatch.cpp:86
#3 0x00000000004652a9 in CStopwatch::stop (this=0x7fff0db48c60) at [..] CStopwatch.cpp:51
#4 0x0000000000479a0c in main (argc=33, argv=0x7fff0db499c8) at [..] coherent_ofdm_tsync_mse.cpp:998
The problem occurs in an object of class CStopwatch which is created at the beginning of the program. The stopwatch is started in main() at the very top. After the simulation is finished, the function CStopwatch::stop( ) is called.
The constructor of the stopwatch class:
/*
* Initialize start and stop time on construction
*/
CStopwatch::CStopwatch()
{
this->startTime = std::time_t( 0 );
this->stopTime = std::time_t( 0 );
this->isRunning = false;
}
The function CStopwatch::stop( )
/*
* Stop the timer and return the elapsed time as a string
*/
std::string CStopwatch::stop( )
{
if ( this->isRunning ) {
this->stopTime = std::time( 0 );
}
this->isRunning = false;
return getTime( );
}
The function CStopwatch::getTime()
/*
* Return the elapsed time as a string
*/
std::string CStopwatch::getTime( std::string delimiterHourMinuteSecondsBy )
{
std::ostringstream timeString;
// ...some string init
// time in seconds
double seconds;
if ( this->isRunning ){
// return instantaneous time
seconds = std::difftime(time(0), startTime);
} else {
// return stopped time
seconds = std::difftime(stopTime, startTime); // <-- line where the
// program crashed
}
// ..convert seconds into a string
return timeString.str( );
}
At the beginning of the program CStopwatch::start( ) is called
/*
* Start the timer, if watch is already running, this is effectively a reset
*/
void CStopwatch::start( )
{
this->startTime = std::time( 0 );
this->isRunning = true;
}

There are only a few reasons that a program may receive SIGBUS on Linux. Several are listed in answers to this question.
Look in /var/log/messages around the time of the crash, it is likely that you'll find that there was a disk failure, or some other cause for kernel unhappiness.
Another (unlikely) possibility is that someone updated libstdc++.so.6 while your program was running, and has done so incorrectly (by writing over existing file, rather than removing it and creating new file in its place).

It looks like std::difftime is being lazily loaded on its first access; if some of the runtime linker's internal state had been damaged elsewhere in your program, it could cause this.
Note that _dl_runtime_resolve would have to complete before the std::difftime call can begin, so the error is unlikely to be with your time values. You can easily verify by opening the core file in gdb:
(gdb) frame 2 # this is CStopwatch::getTime
(gdb) print this
(gdb) print *this
etc. etc.
If gdb is able to read and resolve the address, and the values look sane, that definitely didn't cause a SIGBUS at runtime. Alternatively, it's possible your stack is smashed; if _dl_fixup is preparing the trampoline jump rather than just handling relocation etc.; we can't be certain without looking at the code, but can check the stack itself:
(gdb) print %rsp
(gdb) x/16xb $rsp-16 # print the top 16 bytes of the stack
The easy workaround to try is setting the LD_BIND_NOW environment variable and forcing symbol resolution at startup. This just hides the problem though, because some memory is still getting damaged somewhere, and we're only hiding the symptom.
As for fixing the problem properly - even if short runs don't exhibit the error, it's possible some memory damage is occurring but is asymptomatic. Try running a shorter simulation under valgrind and fix all warnings and errors unless you're certain they're benign.

Impossible to tell without further context, but:
this could be null or corrupt
startTime could be a null reference
stopTime could be a null reference
I was going to suggest you set a breakpoint on the line and print out stopTime and startTime, but you've already nearly done that by looking at the core file.
It looks as if something is going wrong linking the function in. Might it be that you are compiling against a different set of headers from the standard library you are linking to?
It may just be memory related:
if this is deeply nested, you might simply have a stack overflow.
if this is the first time it's being called, perhaps it is trying to allocate memory for the library, load it in, and link it, and that failed due to hitting a memory limit
If this code path is called many many times, and never crashes elsewhere, maybe it's time to run memtest86 overnight.

changing thread number doesn't affect code

I am trying to learn xeon-phi , and while studying the Intel Xeon-Phi Coprocessor HPC book , I tried to run the code here. (from book)
The code uses openmp and 2 threads.
But the results I am taking are the same as running with 1 thread.
( no use of openmp at all )
I even used in mic different combinations but still the same:
export OMP_NUM_THREADS=2
export MIC_OMP_NUM_THREADS=124
export MIC_ENV_PREFIX=MIC
It seems that somehow openmp is not enabled?Am I missing something here?
The code using only 1 thread is here
I compiled using:
icc -mmic -openmp -qopt-report -O3 hello.c
Thanks!

I am not sure exactly which book you are talking about, but perhaps this will help.
The code you show does not use the offload programming style and must be run natively on the the coprocessor, meaning you copy the executable to the coprocessor and run it there or you use the micnativeloadex utility to run the code from the host processor. You show that you know the code must be run natively because you compiled it with the -mmic option.
If you use micnativeloadex, then the number of omp threads on the coprocessor is set by executing "export MIC_OMP_NUM_THREADS=124" on the host. If you copy the executable to the coprocessor and then log in to run it there, the number of omp threads on the coprocessor is set by executing "export OMP_NUM_THREADS=124" on the coprocessor. If you use "export OMP_NUM_THREADS=2" on the coprocessor, you get only two threads; the MIC_OMP_NUM_THREADS environment variable is not used if you set it directly on the coprocessor.
I don't see any place in the code where it prints out the number of threads, so I don't know for sure how you determined the number of threads actually being used. I suspect you were using a tool like micsmc. However micsmc tells you how may cores are in use, not how many threads are in use.
By default, the omp threads are laid out in order, so that the first core would run threads 0,1,2,3, the second core would run threads 4,5,6,7 and so on. If you are using only two threads, both threads would run on the first core.
So, is that what you are seeing - not that you are using only one thread but instead that you are using only one core?

I was looking at the serial version of the code you are using. For the following lines:
for(j=0; j<MAXFLOPS_ITERS; j++)
{
//
// scale 1st array and add in the 2nd array
// example usage - y = mx + b;
//
for(k=0; k<LOOP_COUNT; k++)
{
fa[k] = a * fa[k] + fb[k];
}
}
I see that here you do not scan the complete array. Instead you keep on updating the first 128 (LOOP_COUNT) elements of the array Fa. If you wish to compare this serial version to the parallel code you are referring to, then you will have to ensure that the program does same amount of work in both versions.
Thanks

I noticed three things in your first program omp:
the total floating point operations should reflect the number of threads doing the work. Therefore,
gflops = (double)( 1.0e-9*LOOP_COUNTMAXFLOPS_ITERSFLOPSPERCALC*numthreads);
You harded code the number of thread = 2. If you want to use the OMP env variable, you should comment out the API "omp_set_num_threads(2);"
After transferring the binary to the coprocessor, to set the OMP env variable in the coprocessor please use OMP_NUM_THREADS, and not MIC_OMP_NUM_THREADS. For example, if you want 64 threads to run your program in the coprocessor:
% ssh mic0
% export OMP_NUM_THREADS=64

Run part of program inside Fortran code for a limited time

I wanted to run a code (or an external executable) for a specified amount of time. For example, in Fortran I can
call system('./run')
Is there a way I can restrict its run to let's say 10 seconds, for example as follows
call system('./run', 10)
I want to do it from inside the Fortran code, example above is for system command, but I want to do it also for some other subroutines of my code. for example,
call performComputation(10)
where performComputation will be able to run only for 10 seconds. The system it will run on is Linux.
thanks!

EDITED
Ah, I see - you want to call a part of the current program a limited time. I see a number of options for that...
Option 1
Modify the subroutines you want to run for a limited time so they take an additional parameter, which is the number of seconds they may run. Then modify the subroutine to get the system time at the start, and then in their processing loop get the time again and break out of the loop and return to the caller if the time difference exceeds the maximum allowed number of seconds.
On the downside, this requires you to change every subroutine. It will exit the subroutine cleanly though.
Option 2
Take advantage of a threading library - e.g. pthreads. When you want to call a subroutine with a timeout, create a new thread that runs alongside your main program in parallel and execute the subroutine inside that thread of execution. Then in your main program, sleep for 10 seconds and then kill the thread that is running your subroutine.
This is quite easy and doesn't require changes to all your subroutines. It is not that elegant in that it chops the legs off your subroutine at some random point, maybe when it is least expecting it.
Imagine time running down the page in the following example, and the main program actions are on the left and the subroutine actions are on the right.
MAIN SUBROUTINE YOUR_SUB
... something ..
... something ...
f_pthread_create(,,,YOUR_SUB,) start processing
sleep(10) ... calculate ...
... calculate ...
... calculate ...
f_pthread_kill()
... something ..
... something ...
Option 3
Abstract out the subroutines you want to call and place them into their own separate executables, then proceed as per my original answer below.
Whichever option you choose, you are going to have to think about how you get the results from the subroutine you are calling - will it store them in a file? Does the main program need to access them? Are they in global variables? The reason is that if you are going to follow options 2 or 3, there will not be a return value from the subroutine.
Original Answer
If you don't have timeout, you can do
call system('./run & sleep 10; kill $!')

Yes there is a way. take a look at the linux command timeout
# run command for 10 seconds and then send it SIGTERM kill message
# if not finished.
call system('timeout 10 ./run')
Example
# finishes in 10 seconds with a return code of 0 to indicate success.
sleep 10
# finishes in 1 second with a return code of `124` to indicate timed out.
timeout 1 sleep 10
You can also choose the type of kill signal you want to send by specifying the -s parameter. See man timeout for more info.

How to run record instruction-history and function-call-history in GDB?

(EDIT: per the first answer below the current "trick" seems to be using an Atom processor. But I hope some gdb guru can answer if this is a fundamental limitation, or whether there adding support for other processors is on the roadmap?)
Reverse execution seems to be working in my environment: I can reverse-continue, see a plausible record log, and move around within it:
(gdb) start
...Temporary breakpoint 5 at 0x8048460: file bang.cpp, line 13.
Starting program: /home/thomasg/temp/./bang
Temporary breakpoint 5, main () at bang.cpp:13
13 f(1000);
(gdb) record
(gdb) continue
Continuing.
Breakpoint 3, f (d=900) at bang.cpp:5
5 if(d) {
(gdb) info record
Active record target: record-full
Record mode:
Lowest recorded instruction number is 1.
Highest recorded instruction number is 1005.
Log contains 1005 instructions.
Max logged instructions is 200000.
(gdb) reverse-continue
Continuing.
Breakpoint 3, f (d=901) at bang.cpp:5
5 if(d) {
(gdb) record goto end
Go forward to insn number 1005
#0 f (d=900) at bang.cpp:5
5 if(d) {
However the instruction and function histories aren't available:
(gdb) record instruction-history
You can't do that when your target is `record-full'
(gdb) record function-call-history
You can't do that when your target is `record-full'
And the only target type available is full, the other documented type "btrace" fails with "Target does not support branch tracing."
So quite possibly it just isn't supported for this target, but as it's a mainstream modern one (gdb 7.6.1-ubuntu, on amd64 Linux Mint "Petra" running an "Intel(R) Core(TM) i5-3570") I'm hoping that I've overlooked a crucial step or config?

It seems that there is no other solution except a CPU that supports it.
More precisely, your kernel has to support Intel Processor Tracing (Intel PT). This can be checked in Linux with:
grep intel_pt /proc/cpuinfo
See also: https://unix.stackexchange.com/questions/43539/what-do-the-flags-in-proc-cpuinfo-mean
The commands only works in record btrace mode.
In the GDB source commit beab5d9, it is nat/linux-btrace.c:kernel_supports_pt that checks if we can enter btrace. The following checks are carried out:
check if /sys/bus/event_source/devices/intel_pt/type exists and read the type
do a syscall (SYS_perf_event_open, &attr, child, -1, -1, 0); with the read type, and see if it returns >=0. TODO: why not use the C wrapper?
The first check fails for me: the file does not exist.
Kernel side
cd into the kernel 4.1 source and:
git grep '"intel_pt"'
we find arch/x86/kernel/cpu/perf_event_intel_pt.c which sets up that file. In particular, it does:
if (!test_cpu_cap(&boot_cpu_data, X86_FEATURE_INTEL_PT))
goto fail;
so intel_pt is a pre-requisite.
How I've found kernel_supports_pt
First grep for:
git grep 'Target does not support branch tracing.'
which leads us to btrace.c:btrace_enable. After a quick debug with:
gdb -q -ex start -ex 'b btrace_enable' -ex c --args /home/ciro/git/binutils-gdb/install/bin/gdb --batch -ex start -ex 'record btrace' ./hello_world.out
Virtual box does not support it either: Extract execution log from gdb record in a VirtualBox VM
Intel SDE
Intel SDE 7.21 already has this CPU feature, checked with:
./sde64 -- cpuid | grep 'Intel processor trace'
But I'm not sure if the Linux kernel can be run on it: https://superuser.com/questions/950992/how-to-run-the-linux-kernel-on-intel-software-development-emulator-sde
Other GDB methods
More generic questions, with less efficient software solutions:
call graph: List of all function calls made in an application
instruction trace: Displaying each assembly instruction executed in gdb

At least a partial answer (for the "am I doing it wrong" aspect) - from gdb-7.6.50.20140108/gdb/NEWS
* A new record target "record-btrace" has been added. The new target
uses hardware support to record the control-flow of a process. It
does not support replaying the execution, but it implements the
below new commands for investigating the recorded execution log.
This new recording method can be enabled using:
record btrace
The "record-btrace" target is only available on Intel Atom processors
and requires a Linux kernel 2.6.32 or later.
* Two new commands have been added for record/replay to give information
about the recorded execution without having to replay the execution.
The commands are only supported by "record btrace".
record instruction-history prints the execution history at
instruction granularity
record function-call-history prints the execution history at
function granularity
It's not often that I envy the owner of an Atom processor ;-)
I'll edit the question to refocus upon the question of workarounds or plans for future support.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js