I am currently self-studying 2020 MIT 6.S081: Operating System Engineering https://pdos.csail.mit.edu/6.828/2020/schedule.html. I have followed all the steps for MAC OS to set up the environment correctly.
When I launched gdb and qemu together, I was able to break points normally when debugging the kernel executable.
However, when I attempted to do the same thing for the user executables, I was unable to break any points with the error:
Cannot access memory at address 0x...
It turned out I can set break points for some particular lines, but when I hit continue, another error has shown in the screenshot above.
Any way to get around this? Thank you!
I'm also self-studying 6.S081 and have had the same issue. I solved this just now so try this.
In Makefile, add -gdwarf-2 option to CFLAGS variable:
CFLAGS = -Wall -Werror -O -fno-omit-frame-pointer -ggdb -gdwarf-2
In my Makefile it's on line 94, and I didn't change other options.
I find this solution from here:
http://staff.ustc.edu.cn/~bjhua/courses/ats/2014/hw/hw-interface.html (search 'error reading variable' in the page)
https://www.reddit.com/r/RISCV/comments/plgwyk/comment/hcalnf1/?utm_source=share&utm_medium=web2x&context=3
and one more thing: If you successfully solve the problem with this but fail to watch variables in gdb, try chainging -O option for CFLAGS to -O0. It would prevent your code from being optimized (ref). (For me, this doesn't work. my code start to stuck frequently after changing this option so I just quit watching variables)
You can check out whether your gdb has warned you can't use file .gdbinit to start gdb? If not, you can simply add a line to .gdbinit: set riscv use-compressed-breakpoints yes,like this:
set confirm off
set architecture riscv:rv64
target remote 127.0.0.1:26000
symbol-file kernel/kernel
set disassemble-next-line auto
set riscv use-compressed-breakpoints yes
then use riscv64-unknown-elf-gdb to start gdb.You will see it work well.
But if you have the warning, then you should use
riscv64-unknown-elf-gdb -iex 'add-auto-load-safe-path .' to start gdb additionally.
Related
We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.
I have a few questions:
What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.
Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:
Program terminated with signal 11, Segmentation fault.
#0 0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>
Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure.
Remote debugging is strictly not allowed.
Thanks
What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.
The reason: GDB knows that your executable crashed by calling something in libc.so.6 at address 0x00454ff1, but it doesn't know what code was at that address. So it looks into your copy of libc.so.6 and discovers that this is in select, so it prints that.
But the chances that 0x00454ff1 is also in select in your customers copy of libc.so.6 are quite small. Most likely the customer had some other procedure at that address, perhaps abort.
You can use disas select, and observe that 0x00454ff1 is either in the middle of instruction, or that the previous instruction is not a CALL. If either of these holds, your stack trace is meaningless.
You can however help yourself: you just need to get a copy of all libraries that are listed in (gdb) info shared from the customer system. Have the customer tar them up with e.g.
cd /
tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...
Then, on your system:
mkdir /tmp/from-customer
tar xzf to-you.tar.gz -C /tmp/from-customer
gdb /path/to/binary
(gdb) set solib-absolute-prefix /tmp/from-customer
(gdb) core core # Note: very important to set solib-... before loading core
(gdb) where # Get meaningful stack trace!
We then advice the Customer to run a -g binary so it becomes easier to debug.
A much better approach is:
build with -g -O2 -o myexe.dbg
strip -g myexe.dbg -o myexe
distribute myexe to customers
when a customer gets a core, use myexe.dbg to debug it
You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.
You can indeed get useful information from a crash dump, even one from an optimized compile (although it's what is called, technically, "a major pain in the ass.") a -g compile is indeed better, and yes, you can do so even when the machine on which the dump happened is another distribution. Basically, with one caveat, all the important information is contained in the executable and ends up in the dump.
When you match the core file with the executable, the debugger will be able to tell you where the crash occurred and show you the stack. That in itself should help a lot. You should also find out as much as you can about the situation in which it happens -- can they reproduce it reliably? If so, can you reproduce it?
Now, here's the caveat: the place where the notion of "everything is there" breaks down is with shared object files, .so files. If it is failing because of a problem with those, you won't have the symbol tables you need; you may only be able to see what library .so it happens in.
There are a number of books about debugging, but I can't think of one I'd recommend.
As far as I remember, you dont need to ask your customer to run with the binary built with -g option. What is needed is that you should have a build with -g option. With that you can load the core file and it will show the whole stack trace. I remember few weeks ago, I created core files, with build (-g) and without -g and the size of core was same.
Inspect the values of local variables you see when you walk the stack ? Especially around the select() call. Do this on customer's box, just load the dump and walk the stack...
Also , check the value of FD_SETSIZE on both your DEV and PROD platforms !
Copying the resolution from my question which was considered a duplicate of this.
set solib-absolute-prefix from the accepted solution did not help for me. set sysroot was absolutely necessary to make gdb load locally provided libs.
Here is the list of commands I used to open core dump:
# note: all the .so files obtained from user machine must be put into local directory.
#
# most importantly, the following files are necessary:
# 1. libthread_db.so.1 and libpthread.so.0: required for thread debugging.
# 2. other .so files are required if they occur in call stack.
#
# these files must also be renamed exactly as the symlinks
# i.e. libpthread-2.28.so should be renamed to libpthread.so.0
# load executable file
file ./thedarkmod.x64
# force gdb to forget about local system!
# load all .so files using local directory as root
set sysroot .
# drop dump-recorded paths to .so files
# i.e. load ./libpthread.so.0 instead of ./lib/x86_64-linux-gnu/libpthread.so.0
set solib-search-path .
# disable damn security protection
set auto-load safe-path /
# load core dump file
core core.6487
# print stacktrace
bt
I am trying to set breakpoints in a MIPS32r6 program that computes the Mandelbrot Set in Brainfsck. The program itself is written in C++, compiled with Clang, and I am debugging with LLDB.
The issue that I am having is that when in LLDB, I can set certain breakpoints, mainly on lower line numbers, with no issues. However, after Line #70 in Main.cpp, the breakpoints are coming up as 'unresolved' (even though executing breakpoint list shows them with completely reasonable addresses). That is to say, all breakpoints that I try to set after Line #70 are coming up as unresolved, and all reasonable breakpoints before Line #70 resolve without issue.
I've placed a copy of the binary that I've linked here: http://filebin.ca/2tJzo2LLBJWO/MipsTest.bin
And a copy of Main.cpp here: https://paste.ee/p/WYs8Y
I am building with the following options:
clang -mcompact-branches=always -fasynchronous-unwind-tables -funwind-tables -fexceptions -fcxx-exceptions -mips32r6 -O0 -g -glldb ...
lld --discard-none -znorelro --eh-frame-hdr ...
At this point, I am unsure as to what might be causing this issue.
I'd try doing target modules dump line-table Main.cpp in lldb to see what lldb thinks the line table looks like. Then look at the binary's DWARF line table with something like readelf --debug-dump=decodedline MipsTest.bin (I think that's right - I'm looking at a readelf main page on the web).
Using your sample binary, I get:
(lldb) b s -l 72
Breakpoint 1: where = MipsTest.bin`main + 544 at Main.cpp:72, address = 0x000134a0
So we found an address for the breakpoint. If it is unresolved when you run, that means we weren't able to implement the breakpoint at that address (e.g. for some reason couldn't write the trap into the program memory there.)
Is there any gcc option I can set that will give me the line number of the segmentation fault?
I know I can:
Debug line by line
Put printfs in the code to narrow down.
Edits:
bt / where on gdb give No stack.
Helpful suggestion
I don't know of a gcc option, but you should be able to run the application with gdb and then when it crashes, type where to take a look at the stack when it exited, which should get you close.
$ gdb blah
(gdb) run
(gdb) where
Edit for completeness:
You should also make sure to build the application with debug flags on using the -g gcc option to include line numbers in the executable.
Another option is to use the bt (backtrace) command.
Here's a complete shell/gdb session
$ gcc -ggdb myproj.c
$ gdb a.out
gdb> run --some-option=foo --other-option=bar
(gdb will say your program hit a segfault)
gdb> bt
(gdb prints a stack trace)
gdb> q
[are you sure, your program is still running]? y
$ emacs myproj.c # heh, I know what the error is now...
Happy hacking :-)
You can get gcc to print you a stacktrace when your program gets a SEGV signal, similar to how Java and other friendlier languages handle null pointer exceptions. See my answer here for more details:
how to generate a stacktace when my C++ app crashes ( using gcc compiler )
The nice thing about this is you can just leave it in your code; you don't need to run things through gdb to get the nice debug output.
If you compile with -g and follow the instructions there, you can use a command-line tool like addr2line to get file/line information from the output.
Run it under valgrind.
you also need to build with debug flags on -g
You can also open the core dump with gdb (you need -g though).
If all the preceding suggestions to compile with debugging (-g) and run under a debugger (gdb, run, bt) are not working for you, then:
Elementary: Maybe you're not running under the debugger, you're just trying to analyze the postmortem core dump. (If you start a debug session, but don't run the program, or if it exits, then when you ask for a backtrace, gdb will say "No stack" -- because there's no running program at all. Don't forget to type "run".) If it segfaulted, don't forget to add the third argument (core) when you run gdb, otherwise you start in the same state, not attached to any particular process or memory image.
Difficult: If your program is/was really running but your gdb is saying "No stack" perhaps your stack pointer is badly smashed. In which case, you may be a buffer overflow problem somewhere, severe enough to mash your runtime state entirely. GCC 4.1 supports the ProPolice "Stack Smashing Protector" that is enabled with -fstack-protector-all. It can be added to GCC 3.x with a patch.
There is no method for GCC to provide this information, you'll have to rely on an external program like GDB.
GDB can give you the line where a crash occurred with the "bt" (short for "backtrace") command after the program has seg faulted. This will give you not only the line of the crash, but the whole stack of the program (so you can see what called the function where the crash happened).
The No stack problem seems to happen when the program exit successfully.
For the record, I had this problem because I had forgotten a return in my code, which made my program exit with failure code.
I created a Hello World example in C++ and tried to debug it with lldb from the terminal on Mac OSX.
> lldb a.out
Current executable set to 'a.out' (x86_64).
I can set breakpoints on names (eg. 'main'), but not on line numbers. If I try
breakpoint set --file test.c --line 5
or
b test.c:5
I get
Breakpoint 1: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.
The file 'test.c' is located in the current folder. What goes wrong?
Acording to Dwarf Debugging Information Format, the line number, file location information are stored in the Dwarf format. Such information is what GDB used to set line number as a breakpoint.
Usually, the Dwarf format information is generated by compiler, such as GCC with the -g options. Please try with -g options to see whether it works.
Meanwhile, there also some other debug helpful options in GCC which might be more helpful to you, such as -g3, compiler will generate more detail information for debug, such as macros.
Does anybody know how so use gdb in emacs?
I am using this command to create my program
/home/cdim/Local/gcc-4.9.2/bin/gfortran -ffree-form -g ./utests/test_gdb.f -o test_gdb
I am going to Emacs Tools then Debugger (GDB). I then click on the run button and nothing happens.
What does test_gdb do if you run it outside of gdb? If it sends no output to the screen then this (i.e., no output) is exactly what you will see when you run it inside of gdb - if you have set no breakpoints. Did you set a breakpoint? And how much nothing happens when you click run? Even if test_gdb produces no output, if all is well you should still see gdb display a notification like
[Inferior 1 (process 12345) exited normally]
Consider test.f:
Program p
Integer :: i = 1
Print *, i
End
I would compile this with gfortran -ffree-form -g -ggdb test.f -o test_gdb.
(From https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Debugging-Options.html#Debugging-Options:
-ggdb
Produce debugging information for use by GDB. This means to use the most expressive format available (DWARF 2, stabs, or the native format if neither of those are supported), including GDB extensions if at all possible.
)
Then, as you said, go to Tools -> Debugger (GDB) (or issue M-x gdb) in emacs and make sure the gdb invocation is using the full path to the executable, e.g. Run gdb (like this): gdb -i=mi /foo/bar/test_gdb. Hit return in that minibuffer.
Now, set a breakpoint in the new *gud-test_gdb* buffer:
(gdb) break p
Breakpoint 1 at 0x4007e1: file test.f, line 3.
Then go to menu entry Gud -> Run.
Esc+x then entern gdb ... and input your application file. it will start gdb in emacs
I solved problem by moving to Trisquel 7.0. Might have been a setup problem.