Invalid cast in gdb (armv7) - c++

I have a similar issue like in this question GDB corrupted stack frame - How to debug?, like this:
(gdb) bt
#0 0x76bd6978 in fputs () from /lib/libc.so.6
#1 0x0000b080 in getfunction1 ()
#2 0x0000b080 in getfunction1 ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Chris Dodd wrote an answer to point the top of the stack to the program counter (PC). In a 32bit machine it shall be
(gdb) set $pc = *(void **)$esp
(gdb) set $esp = $esp + 4
However, after run the first line I got invalid cast:
(gdb) set $pc = *(void **)$esp
Invalid cast.
(gdb) set $esp = $esp + 4
Argument to arithmetic operation not a number or boolean.
Why do I get this messages? and how can I make a workaround to figure out where the crash occurs? I work on a armv7 machine with Linux.

ESP does not exist in ARM. It's MSP (Main Stack Pointer) or PSP (Stack pointer).
ARM Registers
As ESP does not exists, that's why you get invalid cast. If you do the same command with another valid ARM register there is no error

Related

Why is GDB not able to see my local variable?

Why is GDB not able to see my local variable i ?
I am debugging a multi-process application running on an embedded linux device.
I am having a hard time attaching to a child process (note that I cannot start the child process by itself, it needs the parent to run it. )
Using set follow-fork-mode child did not work (gdb hangs while loading dynamic libraries).
My strategy is to have this in my code to stop the child
#ifdef DEBUG
int i = 0;
while (i == 0)
{
usleep(100000); // sleep for 0.1 seconds
}
#endif // DEBUG
To have time to attach to the child using gdbserver --attach :1234 <child-pid> and then, after setting a break point where I want (line 200 in main.cpp), to issue the following command in gdb : set var i = 1 to break out of the while loop.
This isn't working and I am wondering why. Here is the output from gdb:
(gdb) bt
#0 0x00007ff3feab7ff0 in nanosleep () from target:/lib/libc.so.6
#1 0x00007ff3feae1c14 in usleep () from target:/lib/libc.so.6
#2 0x000055cc4e56908a in main (argc=<optimized out>, argv=<optimized out>)
at platform/aas/intc/ucmclient/src/main.cpp:195
(gdb) break main.cpp:200
Breakpoint 1 at 0x55cc4e56908c: file platform/aas/intc/ucmclient/src/main.cpp, line 200.
(gdb) set var i=1
No symbol "i" in current context.
The backtrace seems to show I am in the while loop. Why then is the variable i not in the locals?

How to retain a stacktrace when Cortex-M3 gone in hardfault?

Using the following setup:
Cortex-M3 based µC
gcc-arm cross toolchain
using C and C++
FreeRtos 7.5.3
Eclipse Luna
Segger Jlink with JLinkGDBServer
Code Confidence FreeRtos debug plugin
Using JLinkGDBServer and eclipse as debug frontend, I always have a nice stacktrace when stepping through my code. When using the Code Confidence freertos tools (eclipse plugin), I also see the stacktraces of all threads which are currently not running (without that plugin, I see just the stacktrace of the active thread). So far so good.
But now, when my application fall into a hardfault, the stacktrace is lost.
Well, I know the technique on how to find out the code address which causes the hardfault (as seen here).
But this is very poor information compared to full stacktrace.
Ok, some times when falling into hardfault there is no way to retain a stacktrace, e.g. when the stack is corrupted by the faulty code. But if the stack is healty, I think that getting a stacktrace might be possible (isn't it?).
I think the reason for loosing the stacktrace when in hardfault is, that the stackpointer would be swiched from PSP to MSP automatically by the Cortex-M3 architecture. One idea is now, to (maybe) set the MSP to the previous PSP value (and maybe have to do some additional stack preperation?).
Any suggestions on how to do that or other approaches to retain a stacktrace when in hardfault?
Edit 2015-07-07, added more details.
I uses this code to provocate a hardfault:
__attribute__((optimize("O0"))) static void checkHardfault() {
volatile uint32_t* varAtOddAddress = (uint32_t*)-1;
(*varAtOddAddress)++;
}
When stepping into checkHardfault(), my stacktrace looks good like this:
gdb-> backtrace
#0 checkHardfault () at Main.cxx:179
#1 0x100360f6 in GetOneEvent () at Main.cxx:185
#2 0x1003604e in executeMainLoop () at Main.cxx:121
#3 0x1001783a in vMainTask (pvParameters=0x0) at Main.cxx:408
#4 0x00000000 in ?? ()
When run into the hardfault (at (*varAtOddAddress)++;) and find myself inside of the HardFault_Handler(), the stacktrace is:
gdb-> backtrace
#0 HardFault_Handler () at Hardfault.c:312
#1 <signal handler called>
#2 0x10015f36 in prvPortStartFirstTask () at freertos/portable/GCC/ARM_CM3/port.c:224
#3 0x10015fd6 in xPortStartScheduler () at freertos/portable/GCC/ARM_CM3/port.c:301
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
The quickest way to get the debugger to give you the details of the state prior to the hard fault is to return the processor to the state prior to the hard fault.
In the debugger, write a script that takes the information from the various hardware registers and restore PC, LR, R0-R14 to the state just prior to causing the hard fault, then do your stack dump.
Of course, this isn't always helpful when you end up at the hard fault because of popping stuff off of a blown stack or stomping on stuff in memory. You generally tend to corrupt a bunch of the important registers, return back to some crazy spot in memory, and then execute whatever's there. You can end up hard faulting many thousands (millions?) of cycles after your real problem happens.
Consider using the following gdb macro to restore the register contents:
define hfstack
set $frame_ptr = (unsigned *)$sp
if $lr & 0x10
set $sp = $frame_ptr + (8 * 4)
else
set $sp = $frame_ptr + (26 * 4)
end
set $lr = $frame_ptr[5]
set $pc = $frame_ptr[6]
bt
end
document hfstack
set the correct stack context after a hard fault on Cortex M
end

gdb - identify the reason of the segmentation fault

I have a server/client system that runs well on my machines. But it core dumps at one of the users machine (OS: Centos 5). Since I don't have access to the user's machine so I built a debug mode binary and asked the user to try it. The crash did happened again after around 2 days of running. And he sent me the core dump file. Loading the core dump file with gdb, it did shows the crash location but I don't understand the reason (sorry, my previous experience is mostly with Windows. I don't have much experience with Linux/gdb). I would like have your input. Thanks!
1. the /var/log/messages at the user's machine shows the segfault:
Jan 16 09:20:39 LPZ08945 kernel: LSystem[4688]: segfault at 0000000000000000 rip 00000000080e6433 rsp 00000000f2afd4e0 error 4
This message indicates that there is a segfault at instruction pointer 80e6433 and stack pointer f2afd4e0. Looks that the program tries to read/write at address 0.
2. load the core dump file into gdb and it shows the crash location:
$gdb LSystem core.19009
GNU gdb (GDB) CentOS (7.0.1-45.el5.centos)
... (many lines of outputs from gdb omitted)
Core was generated by `./LSystem'.
Program terminated with signal 11,
Segmentation fault.
'#0' 0x080e6433 in CLClient::connectToServer (this=0xf2afd898, conn=11) at liccomm/LClient.cpp:214
214 memcpy((char *) & (a4.sin_addr), pHost->h_addr, pHost->h_length);
gdb says the crash occurs at Line 214?
3. Frame information. (at Frame #0)
(gdb) info frame
Stack level 0, frame at 0xf2afd7e0:
eip = 0x80e6433 in CLClient::connectToServer (liccomm/LClient.cpp:214); saved eip 0x80e6701
called by frame at 0xf2afd820
source language c++.
Arglist at 0xf2afd7d8, args: this=0xf2afd898, conn=11
Locals at 0xf2afd7d8, Previous frame's sp is 0xf2afd7e0
Saved registers:
ebx at 0xf2afd7cc, ebp at 0xf2afd7d8, esi at 0xf2afd7d0, edi at 0xf2afd7d4, eip at 0xf2afd7dc
The frame is at f2afd7e0, why it's different than the rsp from Part 1, which is f2afd4e0? I guess the user may have provided me with mismatched core dump file (whose pid is 19009) and /var/log/messages file (which indicates a pid 4688).
4. The source
(gdb) list +
209
210 //pHost is declared as struct hostent* and 'pHost = gethostbyname(serverAddress);'
211 memset( &a4, 0, sizeof(a4) );
212 a4.sin_family = AF_INET;
213 a4.sin_port = htons( nPort );
214 memcpy((char *) & (a4.sin_addr), pHost->h_addr, pHost->h_length);
215
216 aalen = sizeof(a4);
217 aa = (struct sockaddr *)&a4;
I could not see anything wrong with Line 214. And this part of the code must ran many times during the runtime of 2 days.
5. The variables
Since gdb indicated that Line 214 was the culprit. I printed everything.
memcpy((char *) & (a4.sin_addr), pHost->h_addr, pHost->h_length);
(gdb) print a4.sin_addr
$1 = {s_addr = 0}
(gdb) print &(a4.sin_addr)
$2 = (in_addr *) 0xf2afd794
(gdb) print pHost->h_addr_list[0]
$3 = 0xa24af30 "\202}\204\250"
(gdb) print pHost->h_length
$4 = 4
(gdb) print memcpy
$5 = {} 0x2fcf90
So I basically printed everything that's at Line 214. ('pHost->h_addr_list[0]' is 'pHost->h_addr' due to '#define h_addr h_addr_list[0]')
I was not able to catch anything wrong. Did you catch anything fishy? Is it possible the memory has been corrupted somewhere else? I appreciate your help!
[edited] 6. back trace
(gdb) bt
'#0' 0x080e6433 in CLClient::connectToServer (this=0xf2afd898, conn=11) at liccomm/LClient.cpp:214
'#1' 0x080e6701 in CLClient::connectToLMServer (this=0xf2afd898) at liccomm/LClient.cpp:121
... (Frames 2~7 omitted, not relevant)
'#8' 0x080937f2 in handleConnectionStarter (par=0xf3563f98) at LManager.cpp:166
'#9' 0xf7f5fb41 in ?? ()
'#10' 0xf3563f98 in ?? ()
'#11' 0xf2aff31c in ?? ()
'#12' 0x00000000 in ?? ()
I followed the nested calls. They are correct.
The problem with the memcpy is that the source location is not of the same type than the destination.
You should use inet_addr to convert addresses from string to binary
a4.sin_addr = inet_addr(pHost->h_addr);
The previous code may not work depending on the implementation (some my return struct in_addr, others will return unsigned long, but the principle is the same.

Valgrind stack misses a function completely

i have two c files:
a.c
void main(){
...
getvtable()->function();
}
the vtable is pointing to a function that is located in b.c:
void function(){
malloc(42);
}
now if i trace the program in valgrind I get the following:
==29994== 4,155 bytes in 831 blocks are definitely lost in loss record 26 of 28
==29994== at 0x402CB7A: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==29994== by 0x40A24D2: (below main) (libc-start.c:226)
so the call to function is completely ommited on the stack! How is it possible? In case I use GDB, a correct stack including "function" is shown.
Debug symbols are included, Linux, 32-bit.
Upd:
Answering the first question, I get the following output when debugging valgrind's GDB server. The breakpoint is not coming, while it comes when i debug directly with GDB.
stasik#gemini:~$ gdb -q
(gdb) set confirm off
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 11665
[Switching to Thread 11665]
0x040011d0 in ?? ()
(gdb) file /home/stasik/leak.so
Reading symbols from /home/stasik/leak.so...done.
(gdb) break function
Breakpoint 1 at 0x110c: file ../../source/leakclass.c, line 32.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>end
(gdb) continue
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) source thread-frames.py
Stack level 0, frame at 0x42348a0:
eip = 0x404efcb; saved eip 0x4f2f544c
called by frame at 0x42348a4
Arglist at 0x4234898, args:
Locals at 0x4234898, Previous frame's sp is 0x42348a0
Saved registers:
ebp at 0x4234898, eip at 0x423489c
Stack level 1, frame at 0x42348a4:
eip = 0x4f2f544c; saved eip 0x6e492056
called by frame at 0x42348a8, caller of frame at 0x42348a0
Arglist at 0x423489c, args:
Locals at 0x423489c, Previous frame's sp is 0x42348a4
Saved registers:
eip at 0x42348a0
Stack level 2, frame at 0x42348a8:
eip = 0x6e492056; saved eip 0x205d6f66
called by frame at 0x42348ac, caller of frame at 0x42348a4
Arglist at 0x42348a0, args:
Locals at 0x42348a0, Previous frame's sp is 0x42348a8
Saved registers:
eip at 0x42348a4
Stack level 3, frame at 0x42348ac:
eip = 0x205d6f66; saved eip 0x61746144
---Type <return> to continue, or q <return> to quit---
called by frame at 0x42348b0, caller of frame at 0x42348a8
Arglist at 0x42348a4, args:
Locals at 0x42348a4, Previous frame's sp is 0x42348ac
Saved registers:
eip at 0x42348a8
Stack level 4, frame at 0x42348b0:
eip = 0x61746144; saved eip 0x65736162
called by frame at 0x42348b4, caller of frame at 0x42348ac
Arglist at 0x42348a8, args:
Locals at 0x42348a8, Previous frame's sp is 0x42348b0
Saved registers:
eip at 0x42348ac
Stack level 5, frame at 0x42348b4:
eip = 0x65736162; saved eip 0x70616d20
called by frame at 0x42348b8, caller of frame at 0x42348b0
Arglist at 0x42348ac, args:
Locals at 0x42348ac, Previous frame's sp is 0x42348b4
Saved registers:
eip at 0x42348b0
Stack level 6, frame at 0x42348b8:
eip = 0x70616d20; saved eip 0x2e646570
called by frame at 0x42348bc, caller of frame at 0x42348b4
Arglist at 0x42348b0, args:
---Type <return> to continue, or q <return> to quit---
Locals at 0x42348b0, Previous frame's sp is 0x42348b8
Saved registers:
eip at 0x42348b4
Stack level 7, frame at 0x42348bc:
eip = 0x2e646570; saved eip 0x0
called by frame at 0x42348c0, caller of frame at 0x42348b8
Arglist at 0x42348b4, args:
Locals at 0x42348b4, Previous frame's sp is 0x42348bc
Saved registers:
eip at 0x42348b8
Stack level 8, frame at 0x42348c0:
eip = 0x0; saved eip 0x0
caller of frame at 0x42348bc
Arglist at 0x42348b8, args:
Locals at 0x42348b8, Previous frame's sp is 0x42348c0
Saved registers:
eip at 0x42348bc
(gdb) continue
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) continue
Continuing.
I see two possible reasons:
Valgrind is using a different stack unwind method than GDB
The address space layout is different while running your program under the two environments and you're only hitting stack corruption under Valgrind.
We can gain more insight by using Valgrind's builtin gdbserver.
Save this Python snippet to thread-frames.py
import gdb
f = gdb.newest_frame()
while f is not None:
f.select()
gdb.execute('info frame')
f = f.older()
t.gdb
set confirm off
file MY-PROGRAM
break function
commands
silent
end
run
source thread-frames.py
quit
v.gdb
set confirm off
target remote | vgdb
file MY-PROGRAM
break function
commands
silent
end
continue
source thread-frames.py
quit
(Change MY-PROGRAM, function in the scripts above and the commands below as required)
Get details about the stack frames under GDB:
$ gdb -q -x t.gdb
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbffff2f0:
eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
called by frame at 0xbffff310
source language c.
Arglist at 0xbffff2e8, args:
Locals at 0xbffff2e8, Previous frame's sp is 0xbffff2f0
Saved registers:
ebp at 0xbffff2e8, eip at 0xbffff2ec
Stack level 1, frame at 0xbffff310:
eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0xb7e33963
caller of frame at 0xbffff2f0
source language c.
Arglist at 0xbffff2f8, args:
Locals at 0xbffff2f8, Previous frame's sp is 0xbffff310
Saved registers:
ebp at 0xbffff2f8, eip at 0xbffff30c
Get the same data under Valgrind:
$ valgrind --vgdb=full --vgdb-error=0 ./MY-PROGRAM
In another shell:
$ gdb -q -x v.gdb
relaying data between gdb and process 574
0x04001020 in ?? ()
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbe88e2c0:
eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
called by frame at 0xbe88e2e0
source language c.
Arglist at 0xbe88e2b8, args:
Locals at 0xbe88e2b8, Previous frame's sp is 0xbe88e2c0
Saved registers:
ebp at 0xbe88e2b8, eip at 0xbe88e2bc
Stack level 1, frame at 0xbe88e2e0:
eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0x4051963
caller of frame at 0xbe88e2c0
source language c.
Arglist at 0xbe88e2c8, args:
Locals at 0xbe88e2c8, Previous frame's sp is 0xbe88e2e0
Saved registers:
ebp at 0xbe88e2c8, eip at 0xbe88e2dc
If GDB can successfully unwind the stack while connecting to "valgrind --gdb" then it's a problem with Valgrind's stack unwind algorithm. You can inspect the "info frame" output carefully for inline and tail call frames or some other reason that could throw Valgrind off. Otherwise it's probably stack corruption.
Ok, compiling all .so parts and the main program with an explicit -O0 seems to solve the problem. It seems that some of the optimizations of the 'core' program that was loading the .so (so was always compiled unoptimized) was breaking the stack.
This is Tail-call optimization in action.
The function function calls malloc as the last thing it does. The compiler sees this and kills the stack frame for function before it calls malloc. The advantage is that when malloc returns it returns directly to whichever function called function. I.e. it avoids malloc returning to function only to hit yet another return instruction.
In this case the optimization has prevented an unnecessary jump and made stack usage slightly more efficient, which is nice, but in the case of a recursive tail call then this optimization is a huge win as it turns a recursion into something more like iteration.
As you've discovered already, disabling optimization makes debugging much easier. If you want to debug optimized code (for performance testing, perhaps), then, as #Zang MingJie already said, you can disable this one optimization with -fno-optimize-sibling-calls.

Why would cortex-m3 reset to address 0 in gdb?

I am building a cross-compile toolchain for the Stellaris LM3S8962 cortex-m3 chip. The test c++ application I have written will execute for some time then fault. The fault will occur when I try to access a memory-mapped hardware device. At the moment my working hypothesis is that I am missing some essential chip initialization in my startup sequence.
What I would like to understand is why would the execution in gdb get halted and the program counter be set to 0? I have the vector table at 0x0, but the first value is the stack pointer. Shouldn't I end up in one of the fault handlers I specify in the vector table?
(gdb)
187 UARTSend((unsigned char *)secret, 2);
(gdb) cont
Continuing.
lm3s.cpu -- clearing lockup after double fault
Program received signal SIGINT, Interrupt.
0x00000000 in g_pfnVectors ()
(gdb) info registers
r0 0x1 1
r1 0x32 50
r2 0xffffffff 4294967295
r3 0x0 0
r4 0x74518808 1951500296
r5 0xc24c0551 3259762001
r6 0x42052dac 1107635628
r7 0x20007230 536900144
r8 0xf85444a9 4166272169
r9 0xc450591b 3293600027
r10 0xd8812546 3632342342
r11 0xb8420815 3091335189
r12 0x3 3
sp 0x200071f0 0x200071f0
lr 0xfffffff1 4294967281
pc 0x1 0x1 <g_pfnVectors+1>
fps 0x0 0
cpsr 0x60000023 1610612771
The toolchain is based on gcc, gdb, openocd.
GDB happily gave you some clue:
clearing lockup after double fault
Your CPU was in locked state. That means it could not run its "Hard Fault" Interrupt Handler (maybe there is a 0 in its Vector).
I usually get these when I forgot to "power" the periperial, the resulting Bus Error escalates first to "Hard Fault" and then to locked state. Should be mentioned in the manual of your MCU, btw.