GDB 'find' command terminating early - gdb

I am looking for a specific series of bytes in the memory of a program in GDB.
'find' starting above a certain address (0x104f90) works, but 'find' starting below that address does not:
(gdb) find /w 0x104f90, 0x108fe4, 0x6863203b
0x108e08
0x108e58
0x108ee8
vs
(gdb) find /w 0x104f80, 0x108fe4, 0x6863203b
Pattern not found.
The memory around this address is (seemingly) accessible by GDB:
(gdb) x/12x 0x104f80
0x104f80: 0x00000000 0x00000000 0x00000000 0x00000000
0x104f90: 0x00000000 0x00000000 0x00000000 0x00000000
0x104fa0: 0x00000000 0x00000000 0x00000000 0x00000000
And both of these addresses are on the heap -- info proc mappings says the heap runs from 0xe7000 - 0x109000
Can anyone advise on what I'm missing here? Thank you!

The problem was that I was using gdbserver, and there is a bug in gdbserver where the 'find' function gives up if it doesn't find what it's looking for in 16,000 bytes. See https://sourceware.org/pipermail/gdb-patches/2020-April/167829.html for the official bug report.
The solutions are either update to gdb 10 (which will have a fix), or limit 'find' queries to less than 16,000 bytes

Related

How to tell which pointers are the frame pointers from the GDB x/64x $sp command?

I am having a problem when running a stack trace:
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I found this article online that may help. However, I don't know how this person knew which of the reported pointers where the frame pointers. Here is my output from the x/64x $sp command:
(gdb) x/64x $sp
0xbee06598: 0x0000000b 0x009ce9a4 0x434e7d0f 0x42b48838
0xbee065a8: 0x00000000 0x00000006 0x0140c928 0x42b48838
0xbee065b8: 0xbee065dc 0x00108b5c 0x0140c920 0x0140c980
0xbee065c8: 0x009b4008 0x0140c928 0x00000064 0xbee06a50
0xbee065d8: 0xbee06654 0x0040c904 0x00c224b8 0x009ce9a4
0xbee065e8: 0x000000dc 0x0000000a 0x413e7000 0x00c224b8
0xbee065f8: 0xbee06634 0x4129e4e4 0x000000dc 0x000573aa
0xbee06608: 0x00c22160 0x000003e8 0x0b00000a 0x009ce9a4
0xbee06618: 0xf40dc000 0x40569106 0xd43d8000 0x4069cfa1
0xbee06628: 0x106f40dc 0x41112809 0x9f43a87b 0x41539a12
0xbee06638: 0x00000000 0x0140c920 0x00000000 0xbee06790
0xbee06648: 0x00000064 0xbee06a50 0xbee066fc 0x0075fdd0
0xbee06658: 0xbee0667c 0x41242734 0xbee06674 0x00c22160
0xbee06668: 0x000003e8 0x009c7470 0xbee0668c 0x0b00000a
0xbee06678: 0xbee0669c 0x41274a74 0x000003e8 0x00c22160
0xbee06688: 0xbee0669c 0x00c223c0 0x007a1250 0x009b1d68
Is there a simple way of learning which pointers are the frame pointers?
I don't know how this person knew which of the reported pointers where the frame pointers.
He guessed.
Given that your $sp is 0xbee06598, the likely candidates are all the 0xbee0... ones.
Note: if your code is built by a fairly recent GCC with optimization, and you didn't supply -fno-omit-frame-pointer, there may not be frame pointers at all.

Invalid cast in gdb (armv7)

I have a similar issue like in this question GDB corrupted stack frame - How to debug?, like this:
(gdb) bt
#0 0x76bd6978 in fputs () from /lib/libc.so.6
#1 0x0000b080 in getfunction1 ()
#2 0x0000b080 in getfunction1 ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Chris Dodd wrote an answer to point the top of the stack to the program counter (PC). In a 32bit machine it shall be
(gdb) set $pc = *(void **)$esp
(gdb) set $esp = $esp + 4
However, after run the first line I got invalid cast:
(gdb) set $pc = *(void **)$esp
Invalid cast.
(gdb) set $esp = $esp + 4
Argument to arithmetic operation not a number or boolean.
Why do I get this messages? and how can I make a workaround to figure out where the crash occurs? I work on a armv7 machine with Linux.
ESP does not exist in ARM. It's MSP (Main Stack Pointer) or PSP (Stack pointer).
ARM Registers
As ESP does not exists, that's why you get invalid cast. If you do the same command with another valid ARM register there is no error

Debugging a nasty SIGILL crash: Text Segment corruption

Ours is a PowerPC based embedded system running Linux. We are encountering a random SIGILL crash which is seen for wide variety of applications. The root-cause for the crash is zeroing out of the instruction to be executed. This indicates corruption of the text segment residing in memory. As the text segment is loaded read-only, the application cannot corrupt it. So I am suspecting some common sub-system (DMA?) causing this corruption. Since the problem takes days to reproduce (crash due to SIGILL) it is getting difficult to investigate. So to begin with I want to be able to know if and when the text segment of any application has been corrupted.
I have looked at the stack trace and all the pointers, registers are proper.
Do you guys have any suggestions how I can go about it?
Some Info:
Linux 3.12.19-rt30 #1 SMP Fri Mar 11 01:31:24 IST 2016 ppc64 GNU/Linux
(gdb) bt
0 0x10457dc0 in xxx
Disassembly output:
=> 0x10457dc0 <+80>: mr r1,r11
0x10457dc4 <+84>: blr
Instruction expected at address 0x10457dc0: 0x7d615b78
Instruction found after catching SIGILL 0x10457dc0: 0x00000000
(gdb) maintenance info sections
0x10006c60->0x106cecac at 0x00006c60: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
Expected (from the application binary):
(gdb) x /32 0x10457da0
0x10457da0 : 0x913e0000 0x4bff4f5d 0x397f0020 0x800b0004
0x10457db0 : 0x83abfff4 0x83cbfff8 0x7c0803a6 0x83ebfffc
0x10457dc0 : 0x7d615b78 0x4e800020 0x7c7d1b78 0x7fc3f378
0x10457dd0 : 0x4bcd8be5 0x7fa3eb78 0x4857e109 0x9421fff0
Actual (after handling SIGILL and dumping nearby memory locations):
Faulting instruction address: 0x10457dc0
0x10457da0 : 0x913E0000
0x10457db0 : 0x83ABFFF4
=> 0x10457dc0 : 0x00000000
0x10457dd0 : 0x4BCD8BE5
0x10457de0 : 0x93E1000C
Edit:
One lead that we have is that the corruption is always occurring at an offset that ends with 0xdc0.
For e.g.
Faulting instruction address: 0x10653dc0 << printed by our application after catching SIGILL
Faulting instruction address: 0x1000ddc0 << printed by our application after catching SIGILL
flash_erase[8557]: unhandled signal 4 at 0fed6dc0 nip 0fed6dc0 lr 0fed6dac code 30001
nandwrite[8561]: unhandled signal 4 at 0fed6dc0 nip 0fed6dc0 lr 0fed6dac code 30001
awk[4448]: unhandled signal 4 at 0fe09dc0 nip 0fe09dc0 lr 0fe09dbc code 30001
awk[16002]: unhandled signal 4 at 0fe09dc0 nip 0fe09dc0 lr 0fe09dbc code 30001
getStats[20670]: unhandled signal 4 at 0fecfdc0 nip 0fecfdc0 lr 0fecfdbc code 30001
expr[27923]: unhandled signal 4 at 0fe74dc0 nip 0fe74dc0 lr 0fe74dc0 code 30001
Edit 2: Another lead is that the corruption is always occurring at physical frame number 0x00a4d. I suppose with PAGE_SIZE of 4096 this translates to physical address of 0x00A4DDC0. We are suspecting couple of our kernel drivers and investigating further. Is there any better idea (like putting hardware watchpoint) which could be more efficient? How about KASAN as suggested below?
Any help is appreciated. Thanks.
1.) Text segment is RO, but the permissions could be changed by mprotect, you can check that if you think it is possible
2.) If it is kernel problem:
Run kernel with KASAN and KUBSAN (undefined behaviour) sanitizers
Focus on drivers code not included in mainline
The hint here is one byte corruption. Maybe i'm wrong, but it means that DMA is not to blame. It looks like some kind of invalid store.
3.) Hardware. I think, your problem looks like a hardware problem (RAM issue).
You can try to decrease RAM system frequency in bootloader
Check if this problem reproduces on stable mainline software, that is how you can prove that it's it

How to retain a stacktrace when Cortex-M3 gone in hardfault?

Using the following setup:
Cortex-M3 based µC
gcc-arm cross toolchain
using C and C++
FreeRtos 7.5.3
Eclipse Luna
Segger Jlink with JLinkGDBServer
Code Confidence FreeRtos debug plugin
Using JLinkGDBServer and eclipse as debug frontend, I always have a nice stacktrace when stepping through my code. When using the Code Confidence freertos tools (eclipse plugin), I also see the stacktraces of all threads which are currently not running (without that plugin, I see just the stacktrace of the active thread). So far so good.
But now, when my application fall into a hardfault, the stacktrace is lost.
Well, I know the technique on how to find out the code address which causes the hardfault (as seen here).
But this is very poor information compared to full stacktrace.
Ok, some times when falling into hardfault there is no way to retain a stacktrace, e.g. when the stack is corrupted by the faulty code. But if the stack is healty, I think that getting a stacktrace might be possible (isn't it?).
I think the reason for loosing the stacktrace when in hardfault is, that the stackpointer would be swiched from PSP to MSP automatically by the Cortex-M3 architecture. One idea is now, to (maybe) set the MSP to the previous PSP value (and maybe have to do some additional stack preperation?).
Any suggestions on how to do that or other approaches to retain a stacktrace when in hardfault?
Edit 2015-07-07, added more details.
I uses this code to provocate a hardfault:
__attribute__((optimize("O0"))) static void checkHardfault() {
volatile uint32_t* varAtOddAddress = (uint32_t*)-1;
(*varAtOddAddress)++;
}
When stepping into checkHardfault(), my stacktrace looks good like this:
gdb-> backtrace
#0 checkHardfault () at Main.cxx:179
#1 0x100360f6 in GetOneEvent () at Main.cxx:185
#2 0x1003604e in executeMainLoop () at Main.cxx:121
#3 0x1001783a in vMainTask (pvParameters=0x0) at Main.cxx:408
#4 0x00000000 in ?? ()
When run into the hardfault (at (*varAtOddAddress)++;) and find myself inside of the HardFault_Handler(), the stacktrace is:
gdb-> backtrace
#0 HardFault_Handler () at Hardfault.c:312
#1 <signal handler called>
#2 0x10015f36 in prvPortStartFirstTask () at freertos/portable/GCC/ARM_CM3/port.c:224
#3 0x10015fd6 in xPortStartScheduler () at freertos/portable/GCC/ARM_CM3/port.c:301
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
The quickest way to get the debugger to give you the details of the state prior to the hard fault is to return the processor to the state prior to the hard fault.
In the debugger, write a script that takes the information from the various hardware registers and restore PC, LR, R0-R14 to the state just prior to causing the hard fault, then do your stack dump.
Of course, this isn't always helpful when you end up at the hard fault because of popping stuff off of a blown stack or stomping on stuff in memory. You generally tend to corrupt a bunch of the important registers, return back to some crazy spot in memory, and then execute whatever's there. You can end up hard faulting many thousands (millions?) of cycles after your real problem happens.
Consider using the following gdb macro to restore the register contents:
define hfstack
set $frame_ptr = (unsigned *)$sp
if $lr & 0x10
set $sp = $frame_ptr + (8 * 4)
else
set $sp = $frame_ptr + (26 * 4)
end
set $lr = $frame_ptr[5]
set $pc = $frame_ptr[6]
bt
end
document hfstack
set the correct stack context after a hard fault on Cortex M
end

gdb watch point does not work on gdb load command

I set a watch point on memory location in gdb and load a application through gdb load command. watch point does not hit during load command although memory location is being changed during loading of application.
(gdb) watch *0x1c
Hardware watchpoint 2: *0x1c
(gdb) p/x *0x1c
$2 = 0xffffffff
(gdb) load
Loading section .text, size 0x53b4 lma 0x0
Loading section .eh_frame, size 0x4 lma 0x53b4
Loading section .ARM.exidx, size 0x8 lma 0x53b8
Loading section .rodata, size 0x238 lma 0x53c0
Loading section .data, size 0x520 lma 0x55f8
Start address 0xe4, load size 23320
Transfer rate: 9 KB/sec, 4664 bytes/write.
(gdb) p/x *0x1c
$3 = 0xfdfcdf08
(gdb)
Does watch point work during gdb load command?
I think it would be very unexpected if this triggered. Watchpoints are for changes made by running your program. In this case, your program isn't running, it is being loaded.