I got a core that looks very different from the ones I usually get - most of the threads are in __kernel_vsyscall() :
9 process 11334 0xffffe410 in __kernel_vsyscall ()
8 process 11453 0xffffe410 in __kernel_vsyscall ()
7 process 11454 0xffffe410 in __kernel_vsyscall ()
6 process 11455 0xffffe410 in __kernel_vsyscall ()
5 process 11474 0xffffe410 in __kernel_vsyscall ()
4 process 11475 0xffffe410 in __kernel_vsyscall ()
3 process 11476 0xffffe410 in __kernel_vsyscall ()
2 process 11477 0xffffe410 in __kernel_vsyscall ()
1 process 11323 0x08220782 in MyClass::myfunc ()
What does that mean?
EDIT:
In particular, I usually see a lot of threads in "pthread_cond_wait" and "___newselect_nocancel" and now those are on the second frame in each thread - why is this core different?
__kernel_vsyscal is the method used by linux-gate.so (a part of the Linux kernel) to make a system call using the fastest available method, preferably the sysenter instruction. The thing is properly explained by Johan Petersson.
When you make a system call (like reading from a file, talking to hardware, writing to sockets) you're actually creating an interrupt. The system then handles the interrupt in kernel mode and your call returns with the result. Most of the time it's unusual for you to have a lot of threads in syscall unless you're making blocking calls, in which case it's expected.
More specifically, it means the thread is waiting on a kernel level system call. But that's (unfortunately for my points) already in the name :)
In addition to the already given good link to explanation of what linux-gate.so is, I'd like to answer "why is this core different?". Most recent (newer than 2.5.68) 32-bit Linux systems use VDSO page (aka linux-gate.so.1), and 64-bit systems will soon start as well (64-bit VDSO was introduced in kernel 2.6.24).
If you develop on an older system, or with an old glibc, then you would never see __kernel_vsyscall(), either because the kernel didn't create VDSO at all, or because (old) glibc doesn't use it even when VDSO is present.
As Adam said, the main reason is performance. See this link for some old numbers http://lkml.org/lkml/2002/12/9/13.
If you have a vDSO enabled kernel, you're not using interrupts to run syscalls, as Stefan said, actually was because interrupts was getting slower that the whole vDSO thing was added to the kernel.
Related
I have debugged QEMU with gdb.
To trace unexpected memory accesses I set a hardware watchpoint at a specific address. However, gdb does not stop while the value in the address is changed. This is the first time I have used hardware watchpoint feature in gdb.
I do not know why this happened, and would like to solve this problem.
The follow is the gdb console output.
$ gdb --args ./qemu-system-x86_64 -m 512 -hda linux-0.2.img
...
(gdb) x 0x7fffbbe8e000
0x7fffbbe8e000: 0x00000000
(gdb) watch *(int *)0x7fffbbe8e000
Hardware watchpoint 1: *(int *)0x7fffbbe8e000
(gdb) c
Continuing.
[Thread 0x7fffc2dad700 (LWP 3162) exited]
[New Thread 0x7fffc2dad700 (LWP 3169)]
[Thread 0x7fffc2dad700 (LWP 3169) exited]
[New Thread 0x7fffc2dad700 (LWP 3173)]
qemu: /home/nutsman/git_repo/M-QEMU/qemu-2.3.1/exec.c:3007: ldl_phys_internal: Assertion `val1 == val' failed.
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffc23ca700 (LWP 3163)]
0x00007ffff61f4cc9 in __GI_raise (sig=sig#entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: no such a file or directory
(gdb) x 0x7fffbbe8e000
0x7fffbbe8e000: 0x6c7cebfa
Thank you, Employed Russian. The memory is user-space and allocated with MAP_PRIVATE, so any other programs may not change the content of it.
Can you let me know alternative tools to find the part of the QEMU which change the value, or system calls which can write to user-space memory?
However, gdb does not stop while the value in the address is changed
GDB can detect when the value is change while the program is running in userspace. It can't (and doesn't) detect changes made by the kernel (e.g. as a result of read(2) or mremap(2) system call). If the address in question is part of a MAP_SHARED mapping, and some other process modifies the memory, GDB will not stop either.
Try using software watchpoints. Do set can-use-hw-watchpoints 0 in GDB before setting the watchpoint. That will make GDB check the value of that memory address after each step. It will be painfully slow, but at least you might catch the unintended modification.
It's possible to map multiple virtual memory addresses (possibly across different processes/paging structures) to the same physical memory address. Building off what Employed Russian said, I'm guessing that the watchpoint looks for writes to the specified virtual memory address, not the physical memory address. If that's true, it won't catch a write to a different virtual memory address that maps to the same physical address.
Is there an equivalent command in GDB to that of WinDbg's !process 0 7?
I want to extract all the threads in a dump file along with their backtraces in GDB. info threads doesn't output the stack traces. So, is there a command that does?
Generally, the backtrace is used to get the stack of the current thread, but if there is a necessity to get the stack trace of all the threads, use the following command.
thread apply all bt
Is there a command that does?
thread apply all where
When debugging with several threads, it is also useful to switch to a particular thread number and get the backtrace for that thread only.
From the GNU GDB threads documentation
For debugging purposes, GDB associates its own thread number--a small integer assigned in thread-creation order--with each thread in your program.
Usage:
info threads
Then identify the thread that you want to look at.
thread <thread_id>
Finally, use backtrace for just that thread:
bt
If your process is running:
pstack $pid
This question already has answers here:
Getting a backtrace of other thread
(3 answers)
Closed 6 years ago.
I am looking to understand what is the state of a specific thread in my software, doing it from another thread.
Specifically I'd like to know if it's I/O stuck.
I was thinking of doing it by getting the backtrace(unless someone has another idea?), since I know what function it's supposed to be stuck on..
but I can't figure out how to get the backtrace of that specific thread, without calling the SEGFAULT handler... but gdb is able to do it(I doubt he creates SEGFAULTS..)
Can anyone help? any idea?
[Edit] all 3 answers refer to gdb, I KNOW I can do it from gdb, I wanted to know how to do it from a software(even linking to gdb libs somehow would be an answer, but how ? )
I know what function it's supposed to be stuck on.. but I can't figure
out how to get the backtrace of that specific thread
You can get backtraces of all threads and try to find function which is supposed to be stuck on in backtraces output. Here is how to get all backtraces in gdb:
(gdb) thread apply all bt
(gdb) info threads [will list all the threads and also indicate the thread you are currently backtracing on]
(gdb) thread apply all bt [will show backtrace of all threads so that you can see which thread is stuck on the function you are interested in before switching to that thread]
(gdb) thread #threadno [will switch the backtrace to the specific thread you are interested in and a bt will show its backtrace.]
Ref http://www.delorie.com/gnu/docs/gdb/gdb_25.html
Since you know which function you think you are getting stuck on, you could set a break point at the begining of that function. GDB allows you to attach a series of commands to a break point that are automatically executed when the breakpoint is hit, allowing you to print the backtrace for the thread that was executing when the breakpoint was hit.
(gdb) break filename:line
(gdb) commands
Type commands for breakpoint(s) 1, one per line
End with a line saying just "end"
>info threads
>bt
>continue
>end
The above will give you the list of threads, with the * by the active thread for the breakpoint, followed by the backtrace.
I can't get into the specifics, for a variety of reasons, but here's the essential architecture of what I'm working with
I have a C++ framework, which uses C++ object files built by me to execute a dynamic simulation.
The C++ libraries call, among other things, a shared (.so) library, written in Ada.
As best as I can tell, the Ada library (which is a large collection of nontrivial code) is generating exceptions on fringe cases, but I'm having trouble isolating the function that is generating the exception.
Here's what I'm using:
CentOS 4.8 (Final)
gcc 3.4.6 (w/ gnat)
gdb 6.3.0.0-1.162.el4rh
This is the error I get under normal execution:
terminate called without an active exception
raised PROGRAM_ERROR : unhandled signal
I can get gdb to catch the exception as soon as it returns to the C++, but I can't get it to catch inside the Ada code. I've made sure to compile everything with -g, but that doesn't seem to help the problem.
When I try to catch/break on the signal/exception in gdb (which politely tells me Catch of signal not yet implemented), I get this:
[Thread debugging using libthread_db enabled]
[New thread -1208371520 (LWP 14568)]
terminate called without an active exception
Program received signal SIGABRT, Aborted.
[Switching to thread -1208371520 (LWP 14568)]
0x001327a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
I believe the terminate called [...] line is from the framework. When I try to capture that break, then run a backtrace (bt), I get something like this:
#0 0x001327a2 in gdb makes me want to flip tables.
#1 0x00661825 in raise () from /lib/tls/libc.so.6
#2 0x00663289 in abort () from /lib/tls/libc.so.6
#3 0x0061123e in __gnu_cxx: __verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#4 0x0060eed1 in __xac_call_unexpected () from /usr/lib/libstdc++.so.6
#5 0x0060ef06 in std::terminate () from /usr/lib/libstdc++.so.6
#6 0x0060f0a3 in __xax_rethrow () from /usr/lib/libstdc++.so.6
#7 0x001fe526 in cpputil::ExceptionBase::Rethrow (scope=#0xbfe67470) at ExceptionBase.cpp:140
At that point, it's into the framework code.
I've read several guides and tutorials and man pages online, but I'm at a bit of a loss. I'm hoping that someone here can help get me pointed in the right direction.
It sounds like you're able to compile the Ada source code. Assuming that's the case, in the subprogram(s) that are being called through whose execution the exceptions are being raised, add an exception handler at the end that dumps the exception information:
when E : others =>
Ada.Text_IO.Put_Line(Ada.Exceptions.Exception_Information(E));
raise;
You'll also need to add a 'with' of Ada.Exceptions to the package. And Ada.Text_IO if that isn't already present.
I'm not sure exactly what you'll get out from that version of GNAT, but it's probably the invocation addresses which you can then decode using addr2line.
Could you start the C++ framework from an Ada main? If so, and you can propagate the exceptions through the C++ framework to the Ada main, its last chance handler ought to give you a pretty good report with exception, source file and line where it occurred, and a stack dump for addr2line. My experience with these is that the debugger usually isn't needed after that.
I could be off beam here because I haven't used a Gnat anywhere near as old as yours...
I have an ncurses app that does the following, sometimes instantly after launch, sometimes after some fiddling.
malloc: *** error for object 0x100300400: double free
Program received signal SIGABRT, Aborted
(gdb) where
#0 0x00007fff846a7426 in read ()
#1 0x00007fff83f3d775 in _nc_wgetch ()
#2 0x00007fff83f3de3f in wgetch ()
(and so on into my code)
Does anyone have suggestions for likely things to pursue?
It looks like you are using glibc, likely on an x86_64 Linux system.
The tool to use for any kind of heap corruption on Linux/x86_64 is Valgrind. It will just immediately give you the answer, so there is no point in guessing where the problem might be (and it could be anywhere).