GDB hardware watchpoint very slow - why?

GDB hardware watchpoint very slow - why? - gdb

On a large C application, I have set a hardware watchpoint on a memory address as follows:
(gdb) watch *0x12F5D58
Hardware watchpoint 3: *0x12F5D58
As you can see, it's a hardware watchpoint, not software, which would explain the slowness.
Now the application running time under debugger has changed from less than ten seconds to one hour and counting. The watchpoint has triggered three times so far, the first time after 15 minutes when the memory page containing the address was made readable by sbrk. Surely during those 15 minutes the watchpoint should have been efficient since the memory page was inaccessible? And that still does not explain, why it's so slow afterwards.
The platform is x86_64 and the GDB versions are Ubuntu 9.10 package:
$ gdb --version
GNU gdb (GDB) 7.0-ubuntu
[...]
and stock GDB 7.1 built from sources:
$ gdb-7.1 --version
GNU gdb (GDB) 7.1
Thanks in advance for any ideas as what might be the cause or how to fix/work around it.
EDIT: removed cast
EDIT: gdb 7.1

I discovered that watching a large character buffer was very slow, whereas watching a character in that buffer was very fast.
e.g.
static char buf[1024];
static char* buf_address = &buf;
watch buf_address - excruciatingly slow.
watch *buf_address - very fast.

I've actually had trouble with hardware watchpoints in GDB 7.x.x., which is not acceptable since watchpoints are a necessity in my job.
On advice from a co-worker, I downloaded the source for 6.7.1 and built it locally. Watchpoints work much better now.
Might be worth a try.

It's most likely because you're casting it each time. Try this:
(gdb) watch *0x12F5D58
Another option is that you have too many hardware watchpoints set, so gdb is forced to use software watchpoints. Try checking how many watchpoints you have using:
(gdb) info break
and see if you can disable some watchpoints.

On x86 you have the following limitation: all your watchpoints can cover no more than four memory addresses, each address of memory can watch for one memory word - this is because hardware watchpoints (the fast ones) use the processors debug registers, an you have four of them, therefore four locations to watch for.

Related

How do GDB rwatch and awatch commands work?

I see it is possible in GDB to set a breakpoint which will fire when a specific memory address will be read or written.
I am wondering how it works. Does GDB have a sort of copy of the process memory and check what has changed between each instruction ? Or is it a syscall or kernel feature for that ?
(Intel x86 32 and 64 bits architecture)

I am wondering how it works.
There are two ways: software watchpoints and hardware watchpoints (only available on some architectures).
Software watchpoints work by single-stepping the application, and checking whether the value has changed after every instruction. These are painfully slow (1000x slower), and in practice aren't usable for anything other than a toy program. They also can't detect access, only change of the value in watched location.
Hardware watchpoints require processor support. Intel x86 chips have debug registers, which could be programmed to watch for access (awatch, rwatch) or change (watch) of a given memory location. When the processor detects that the location of interest has been accessed, it raises debug exception, which the OS translates into a signal, and (as usual) a signal is given to the debugger before the target sees it.
HW watchpoints execute at native speed, but (on x86) you can have only up to 4 distinct addresses (in practice, I've never needed more than 2).
Does execution of current instruction fire a watch read at eip address?
It should. You could trivially answer this yourself. Just try it.
Does push on stack fire a write on stack memory address?
Likewise.

gdb freezing when print variables with tab completion

I compile the c++ project, which is not too large, about 6M binary. When I debug it and want to print some variable, I type the first two characters and press the Tab to complete. Then the gdb read symbols forever freezing. How can I solve this problem. thank you!

I type the first two characters and press the Tab to complete. Then the gdb read symbols forever freezing. How can I solve this problem
Doctor, it hurts when I do that.
Well, don't do that.
Seriously, if you have a very large binary (it's unclear whether your 6MB is the size with debug info or without), and lots of variables, then GDB will necessarily have to spend some time searching for variables matching your two initial characters.
That said,
we routinely debug binaries that are 2GB in size or larger, and
have spent quite a lot of effort improving GDB experience with such binaries
So perhaps your first step should be to take the latest release of GDB, and see if the problem has already been solved for you.
Update:
My binary is 6MB with debug info
That's not large at all. Certainly it should not cause more than a few seconds delay to list all variables in such a binary.
My GDB version is "GNU gdb (GDB) 7.6.2"
That's the latest release.
It's probably safe to conclude that there is a bug in GDB.
If you can construct a minimal test case that shows the problem, then your best bet is to report it as a bug in http://sourceware.org/bugzilla.
If you can't, you'll have to debug GDB yourself. A reasonable place to start is running strace -p <pid-of-hung-gdb> and gdb -p <pid-of-hung-gdb>; (gdb) where to find out exactly where GDB is getting stuck.

If you can update to GDB 7.10, your tab-completion freeze-ups should disappear.
GDB 7.10 (as of August 2015) contains a feature to address this problem.
set max-completions
Set the maximum number of candidates to be considered during
completion. The default value is 200. This limit allows GDB to avoid
generating large completion lists, the computation of which can cause
the debugger to become temporarily unresponsive.
[The above quote is taken from the patch shown on the gitweb site for gdb]
The GDB news release lists the feature as: "The number of candidates to be considered during completion can now be limited."
Updating to GDB 7.10 solved the problem for me. The default value of 200 for max-completions was sufficient. I did not customize it.

valgrind returning an unhandled instruction on Raspberry Pi

I've been trying to debug a segmentation fault recently using valgrind on my raspberry Pi (model b), running Debian GNU/Linux7.0 (wheezy). Every time I run valgrind on a compiled C++ program, I get something like the following:
disInstr(arm): unhandled instruction: 0xF1010200
cond=15(0xF) 27:20=16(0x10) 4:4=0 3:0=0(0x0)
valgrind: Unrecognized instruction at address 0x4843638.
at 0x4843638: ??? (in /usr/lib/arm-linux-gnueabihf/libconfi_rpi.so)
Then the normal valgrind stuff, causing a SIGILL and terminating my program. At first I assumed there was some memory leak in my program that was causing it to execute a piece of non-instruction memory as an instruction, but then I ran the following hello world code, and got the same result.
#include <iostream>
using namespace std;
int main() {
cout<<"Hello World"<<endl;
return 0;
}
There can't possibly be a memory leak/segfault in that, so why is it giving me this error?
I'm pretty new with valgrind, but I ran it with the most basic valgrind ./a.out.

From your code (a simple hello world), It complain about an Unrecognized instruction at address 0x4843638. My guess is:
Since valgrind need to intercept your malloc system call function (c standard library). This allow valgrind to check how many resource you did allocated/free which is used for memory leak detection (for example). If valgrind does not recognize your standard library environement (or the instruction language of your processor), It may does not behave as expecting, which would be the cause of your crash. You should check valgrind version and download the one fitted for your platform.
EDIT :
http://valgrind.org/docs/manual/faq.html
3.3. My program dies, printing a message like this along the way:
vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x2E 0x5
One possibility is that your program has a bug and erroneously jumps
to a non-code address, in which case you'll get a SIGILL signal.
Memcheck may issue a warning just before this happens, but it might
not if the jump happens to land in addressable memory.
Another possibility is that Valgrind does not handle the instruction.
If you are using an older Valgrind, a newer version might handle the
instruction. However, all instruction sets have some obscure, rarely
used instructions. Also, on amd64 there are an almost limitless number
of combinations of redundant instruction prefixes, many of them
undocumented but accepted by CPUs. So Valgrind will still have
decoding failures from time to time. If this happens, please file a
bug report.
EDIT2 :
From wikipedia, the Raspberry Pi CPU:
700 MHz ARM1176JZF-S core (ARM11 family, ARMv6 instruction set)[3]
2.11. Limitations
On ARM, essentially the entire ARMv7-A instruction set is supported,
in both ARM and Thumb mode. ThumbEE and Jazelle are not supported.
NEON, VFPv3 and ARMv6 media support is fairly complete.
Your program/library just happened to have some instruction which is not supported yet.

On Raspberry Pi 3 with NOOBS install of Raspian, implement ayke's answer by doing the following in a terminal window:
cd /etc
sudo nano ld.so.preload
Remove the line that includes "libarmmem.so" (/usr/lib/arm-linux-gnueabihf/libarmmem.so)
Save and exit ld.so.preload
Run valgrind
Place the line back into ld.so.preload after valgrind testing is complete
The pre-loaded "libarmmem.so" contains the instruction "setend" in the "memcmp" function which causes the unhandled instruction error. The standard library (used when the pre-loaded "libarmmem.so" library is not loaded) does not include the "setend" instruction in "memcmp".

TL;DR: remove the package raspi-copies-and-fills if you're using Raspbian. It may also work in some other Linux variations like NOOBS.
As Phong already noted, this instruction is not supported by Valgrind.
There is a bug report which explains the issue:
This keeps cropping up, for example most recently in bug 366464.
Maybe I should explain more why this isn't supported. It's because we
don't have a feasible way to do it. Valgrind's JIT instruments code
blocks as they are first visited, and the endianness of the current
blocks are "baked in" to the instrumentation. So there are two
options:
(1) when a SETEND instruction is executed, throw away all the JITted
code
that Valgrind has created, and JIT new code blocks with the new endianness.
(2) JIT code blocks in an endian-agnostic way and have a runtime test
for each memory access, to decide on whether to call a big or little
endian instrumentation helper function.
(1) gives zero performance overhead for code that doesn't use SETEND
but a gigantic (completely infeasible) hit for code that does.
(2) makes endian changes free, but penalises all memory traffic
regardless of whether SETEND is actually used.
So I don't find either of those acceptable. And I can't think of any
other way to implement it.
In other words, it is hard to implement this instruction in valgrind.
Summarizing the thread: the most common cause of this instruction are some faster memory management functions which the Raspberry Pi ships (memcmp, memset, etc.).
I solved it by (temporarily) removing raspi-copies-and-fills from my Raspbian install.

Valgrind is apparently problematic on Raspberry Pi:
https://web.archive.org/web/20131003042418/http://www.raspberrypisoft.com/tag/valgrind/
I suggest using other tools to find the seg fault.

Debug session freezes when trying to watch an array

I am using ubuntu 12.04. I have used so far anjuta and codelite as IDE's for C++ school projects.
However, with both of them I have encountered one problem:
After starting the debugger, everything works fine till I try to add an array at watches' section. It does not display anything and when I try to continue debugging it freezes and I have to stop the debug session. I have to mention that watching variables works well.
Thank you,
LE: Actually, the debug function freezes only in case of large arrays...it may be a bug of codelite then. Any opinion?

I have to mention that watching variables works well.
When you set the watchpoint on a variable, GDB probably says Hardware watchpoint N (but your IDE may be hiding that message).
When you set a watchpoint on anything larger than 8 bytes on x86 processor, GDB can not set a hardware watchpoint (because x86 hardware doesn't support such watchpoints). GDB sets a software watchpoint instead. Software watchpoints are implemented as follows:
single-step the program
did values change? No -> go to step 1. Yes: stop.
Software watchpoints are really slow. If you watch your system with top, you'll likely discover that GDB is consuming 100% CPU.
If you really need to watch an entire array, this answer shows how that can be done with valgrind.

Breakpoints not working when booting from Flash

In the past, I have been debugging executables loaded in the internal SRAM of my Cortex M3 (STM32F2) without problems. I have recently been loading my executable to Flash (because of size issues).
Ever since, debugging with GDB has not been working. As I understand, when the executable is in Flash, only hardware breakpoint can be used (as opposed to software breakpoints), and I have six hardware breakpoints. However, when setting just one hardware breakpoint GDB yields an error message:
(gdb) break main
Breakpoint 1 at 0x800019a: file src/main.c, line 88.
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.
(gdb) Warning:
Cannot insert hardware breakpoint 1.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.
What could be going wrong? Have my hardware breakpoints be taken in the background?
Note: I used OpenOCD to load the executable through JTAG.

So, there's basically two ways (plus one really bad way) that breakpoints can be implemented on any given debugger/platform combination:
Use some hardware capabilities ("hardware breakpoints") to cause the processor to trap when it hits a particular address. This is typically restricted to just a couple of breakpoints, if it's available at all.
For each breakpoint that's being set, replace the instruction at the breakpoint with a "trap" instruction of some variety (i.e, an instruction that will break into the debugger). When one of the breakpoints is hit, swap the original instruction back in and single-step once to make it run.
Single-step through the whole program. This one doesn't really count, because it's horrifically slow.
It sounds as though your debugger is only using method #2 ("software breakpoints"). The gotcha with this method is that it requires the program to be writable -- and flash memory isn't writable one instruction at a time, so this technique won't work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js