I have debugged QEMU with gdb.
To trace unexpected memory accesses I set a hardware watchpoint at a specific address. However, gdb does not stop while the value in the address is changed. This is the first time I have used hardware watchpoint feature in gdb.
I do not know why this happened, and would like to solve this problem.
The follow is the gdb console output.
$ gdb --args ./qemu-system-x86_64 -m 512 -hda linux-0.2.img
...
(gdb) x 0x7fffbbe8e000
0x7fffbbe8e000: 0x00000000
(gdb) watch *(int *)0x7fffbbe8e000
Hardware watchpoint 1: *(int *)0x7fffbbe8e000
(gdb) c
Continuing.
[Thread 0x7fffc2dad700 (LWP 3162) exited]
[New Thread 0x7fffc2dad700 (LWP 3169)]
[Thread 0x7fffc2dad700 (LWP 3169) exited]
[New Thread 0x7fffc2dad700 (LWP 3173)]
qemu: /home/nutsman/git_repo/M-QEMU/qemu-2.3.1/exec.c:3007: ldl_phys_internal: Assertion `val1 == val' failed.
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffc23ca700 (LWP 3163)]
0x00007ffff61f4cc9 in __GI_raise (sig=sig#entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: no such a file or directory
(gdb) x 0x7fffbbe8e000
0x7fffbbe8e000: 0x6c7cebfa
Thank you, Employed Russian. The memory is user-space and allocated with MAP_PRIVATE, so any other programs may not change the content of it.
Can you let me know alternative tools to find the part of the QEMU which change the value, or system calls which can write to user-space memory?
However, gdb does not stop while the value in the address is changed
GDB can detect when the value is change while the program is running in userspace. It can't (and doesn't) detect changes made by the kernel (e.g. as a result of read(2) or mremap(2) system call). If the address in question is part of a MAP_SHARED mapping, and some other process modifies the memory, GDB will not stop either.
Try using software watchpoints. Do set can-use-hw-watchpoints 0 in GDB before setting the watchpoint. That will make GDB check the value of that memory address after each step. It will be painfully slow, but at least you might catch the unintended modification.
It's possible to map multiple virtual memory addresses (possibly across different processes/paging structures) to the same physical memory address. Building off what Employed Russian said, I'm guessing that the watchpoint looks for writes to the specified virtual memory address, not the physical memory address. If that's true, it won't catch a write to a different virtual memory address that maps to the same physical address.
Related
By control transfer, I mean, after the tracee executing a function and return, which signal is generated so that GDB can wait*() on it and seize control again? It is not SIGTRAP though many people claim that ...
after the tracee executing a function and return, which signal is generated so that GDB can wait*() on it and seize control again?
The tracee is stopped, and control is transferred back to GDB, only when one of "interesting" events happens.
The interesting events are:
A breakpoint fires,
The tracee encounters a signal (e.g. SIGSEGV or SIGFPE as a result of performing invalid memory access or invalid floating-point operation),
The tracee disappears altogether (such as getting SIGKILLed by an outside program),
[There might be other "interesting" events, but I can't think of anything else right now.]
Now, a technically correct answer to "what signal does GDB use ..." is: none at all. The control isn't transferred, unless one of above events happen.
Perhaps your question is: how does control get back to GDB after executing something like finish command (which steps out of the current function)?
The answer to that is: GDB sets a temporary breakpoint on the instruction immediately after the CALL instruction that got us into the current function.
Finally, what causes the kernel to stop tracee and make waitpid in GDB to return upon execution of the breakpoint instruction?
On x86, GDB uses the INT3 (opcode 0xCC) instruction to set breakpoints (there is an alternate mechanism using debug registers, but it is limited to 4 simultaneous breakpoints, and usually reserved for hardware watchpoints instead). When the tracee executes INT3 instruction, SIGTRAP is indeed the signal that the kernel generates (i.e. other answers you've found are correct).
Without knowing what led you to believe it isn't SIGTRAP, it's hard to guess how you convinced yourself that it isn't.
Update:
I try to manually send a SIGTRAP signal to the tracee, trying to causing a spuriously wake-up of GDB, but fail.
Fail in what way?
What I expect you observe is that GDB stops with Program received signal SIGTRAP .... That's because GDB knows where it has placed breakpoints.
When GDB receives SIGTRAP and the tracee instruction pointer matches one of its breakpoints, then GDB "knows" that is's the breakpoint that has fired, and acts accordingly.
But when GDB receives SIGTRAP and the tracee IP doesn't match any of the breakpoints, then GDB treats it as any other signal: prints a message and waits for you to tell it what to do next.
"GDB sets a temporary breakpoint ... that means GDB has to modify tracee's code area, which may be read-only. So, how does GDB cope with that?
You are correct: GDB needs to modify (typically non-writable) .text section to insert any breakpoint using INT3 method. Fortunately, that is one of the "superpowers" granted to it by the kernel via ptrace(POKE_TEXT, ...).
P.S. It's a fun exercise to white a program that checksums code bytes of one of its own functions. You can then perform the checksum before and after placing a breakpoint on the "to be checksummed" function, and observe that the checksum differs when a breakpoint is present.
P.P.S. If you are curious about what GDB is doing, setting maintenance debug inferior will provide a lot of clues.
Is there an equivalent command in GDB to that of WinDbg's !process 0 7?
I want to extract all the threads in a dump file along with their backtraces in GDB. info threads doesn't output the stack traces. So, is there a command that does?
Generally, the backtrace is used to get the stack of the current thread, but if there is a necessity to get the stack trace of all the threads, use the following command.
thread apply all bt
Is there a command that does?
thread apply all where
When debugging with several threads, it is also useful to switch to a particular thread number and get the backtrace for that thread only.
From the GNU GDB threads documentation
For debugging purposes, GDB associates its own thread number--a small integer assigned in thread-creation order--with each thread in your program.
Usage:
info threads
Then identify the thread that you want to look at.
thread <thread_id>
Finally, use backtrace for just that thread:
bt
If your process is running:
pstack $pid
This question already has answers here:
Getting a backtrace of other thread
(3 answers)
Closed 6 years ago.
I am looking to understand what is the state of a specific thread in my software, doing it from another thread.
Specifically I'd like to know if it's I/O stuck.
I was thinking of doing it by getting the backtrace(unless someone has another idea?), since I know what function it's supposed to be stuck on..
but I can't figure out how to get the backtrace of that specific thread, without calling the SEGFAULT handler... but gdb is able to do it(I doubt he creates SEGFAULTS..)
Can anyone help? any idea?
[Edit] all 3 answers refer to gdb, I KNOW I can do it from gdb, I wanted to know how to do it from a software(even linking to gdb libs somehow would be an answer, but how ? )
I know what function it's supposed to be stuck on.. but I can't figure
out how to get the backtrace of that specific thread
You can get backtraces of all threads and try to find function which is supposed to be stuck on in backtraces output. Here is how to get all backtraces in gdb:
(gdb) thread apply all bt
(gdb) info threads [will list all the threads and also indicate the thread you are currently backtracing on]
(gdb) thread apply all bt [will show backtrace of all threads so that you can see which thread is stuck on the function you are interested in before switching to that thread]
(gdb) thread #threadno [will switch the backtrace to the specific thread you are interested in and a bt will show its backtrace.]
Ref http://www.delorie.com/gnu/docs/gdb/gdb_25.html
Since you know which function you think you are getting stuck on, you could set a break point at the begining of that function. GDB allows you to attach a series of commands to a break point that are automatically executed when the breakpoint is hit, allowing you to print the backtrace for the thread that was executing when the breakpoint was hit.
(gdb) break filename:line
(gdb) commands
Type commands for breakpoint(s) 1, one per line
End with a line saying just "end"
>info threads
>bt
>continue
>end
The above will give you the list of threads, with the * by the active thread for the breakpoint, followed by the backtrace.
I can't get into the specifics, for a variety of reasons, but here's the essential architecture of what I'm working with
I have a C++ framework, which uses C++ object files built by me to execute a dynamic simulation.
The C++ libraries call, among other things, a shared (.so) library, written in Ada.
As best as I can tell, the Ada library (which is a large collection of nontrivial code) is generating exceptions on fringe cases, but I'm having trouble isolating the function that is generating the exception.
Here's what I'm using:
CentOS 4.8 (Final)
gcc 3.4.6 (w/ gnat)
gdb 6.3.0.0-1.162.el4rh
This is the error I get under normal execution:
terminate called without an active exception
raised PROGRAM_ERROR : unhandled signal
I can get gdb to catch the exception as soon as it returns to the C++, but I can't get it to catch inside the Ada code. I've made sure to compile everything with -g, but that doesn't seem to help the problem.
When I try to catch/break on the signal/exception in gdb (which politely tells me Catch of signal not yet implemented), I get this:
[Thread debugging using libthread_db enabled]
[New thread -1208371520 (LWP 14568)]
terminate called without an active exception
Program received signal SIGABRT, Aborted.
[Switching to thread -1208371520 (LWP 14568)]
0x001327a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
I believe the terminate called [...] line is from the framework. When I try to capture that break, then run a backtrace (bt), I get something like this:
#0 0x001327a2 in gdb makes me want to flip tables.
#1 0x00661825 in raise () from /lib/tls/libc.so.6
#2 0x00663289 in abort () from /lib/tls/libc.so.6
#3 0x0061123e in __gnu_cxx: __verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#4 0x0060eed1 in __xac_call_unexpected () from /usr/lib/libstdc++.so.6
#5 0x0060ef06 in std::terminate () from /usr/lib/libstdc++.so.6
#6 0x0060f0a3 in __xax_rethrow () from /usr/lib/libstdc++.so.6
#7 0x001fe526 in cpputil::ExceptionBase::Rethrow (scope=#0xbfe67470) at ExceptionBase.cpp:140
At that point, it's into the framework code.
I've read several guides and tutorials and man pages online, but I'm at a bit of a loss. I'm hoping that someone here can help get me pointed in the right direction.
It sounds like you're able to compile the Ada source code. Assuming that's the case, in the subprogram(s) that are being called through whose execution the exceptions are being raised, add an exception handler at the end that dumps the exception information:
when E : others =>
Ada.Text_IO.Put_Line(Ada.Exceptions.Exception_Information(E));
raise;
You'll also need to add a 'with' of Ada.Exceptions to the package. And Ada.Text_IO if that isn't already present.
I'm not sure exactly what you'll get out from that version of GNAT, but it's probably the invocation addresses which you can then decode using addr2line.
Could you start the C++ framework from an Ada main? If so, and you can propagate the exceptions through the C++ framework to the Ada main, its last chance handler ought to give you a pretty good report with exception, source file and line where it occurred, and a stack dump for addr2line. My experience with these is that the debugger usually isn't needed after that.
I could be off beam here because I haven't used a Gnat anywhere near as old as yours...
I got a core that looks very different from the ones I usually get - most of the threads are in __kernel_vsyscall() :
9 process 11334 0xffffe410 in __kernel_vsyscall ()
8 process 11453 0xffffe410 in __kernel_vsyscall ()
7 process 11454 0xffffe410 in __kernel_vsyscall ()
6 process 11455 0xffffe410 in __kernel_vsyscall ()
5 process 11474 0xffffe410 in __kernel_vsyscall ()
4 process 11475 0xffffe410 in __kernel_vsyscall ()
3 process 11476 0xffffe410 in __kernel_vsyscall ()
2 process 11477 0xffffe410 in __kernel_vsyscall ()
1 process 11323 0x08220782 in MyClass::myfunc ()
What does that mean?
EDIT:
In particular, I usually see a lot of threads in "pthread_cond_wait" and "___newselect_nocancel" and now those are on the second frame in each thread - why is this core different?
__kernel_vsyscal is the method used by linux-gate.so (a part of the Linux kernel) to make a system call using the fastest available method, preferably the sysenter instruction. The thing is properly explained by Johan Petersson.
When you make a system call (like reading from a file, talking to hardware, writing to sockets) you're actually creating an interrupt. The system then handles the interrupt in kernel mode and your call returns with the result. Most of the time it's unusual for you to have a lot of threads in syscall unless you're making blocking calls, in which case it's expected.
More specifically, it means the thread is waiting on a kernel level system call. But that's (unfortunately for my points) already in the name :)
In addition to the already given good link to explanation of what linux-gate.so is, I'd like to answer "why is this core different?". Most recent (newer than 2.5.68) 32-bit Linux systems use VDSO page (aka linux-gate.so.1), and 64-bit systems will soon start as well (64-bit VDSO was introduced in kernel 2.6.24).
If you develop on an older system, or with an old glibc, then you would never see __kernel_vsyscall(), either because the kernel didn't create VDSO at all, or because (old) glibc doesn't use it even when VDSO is present.
As Adam said, the main reason is performance. See this link for some old numbers http://lkml.org/lkml/2002/12/9/13.
If you have a vDSO enabled kernel, you're not using interrupts to run syscalls, as Stefan said, actually was because interrupts was getting slower that the whole vDSO thing was added to the kernel.