TI ARM CLANG wont resolve symbol even though objdump shows its there - c++

I am trying to compile my code on CCS(Code composer studio) using TI ARM CLANG compiler.
I am trying to implement Ethernet fucntionality which uses TI's enet SDK
I call a fucntion in my main which is in the enet SDK but the comiler is throwing error
unresolved symbol Enet_initOsalCfg(EnetOsal_Cfg_s, first referenced in *)
I have added its library in the linker tab
To confirm i am not doing something stuipd I use same compilers objdump to disassemble the library and I think if I am not mistaken the dump clearly shows the symbol is present.
Function I called in main() has the declaration:
void Enet_initOsalCfg(EnetOsal_Cfg *osalCfg);
Following is a snippet from the objdump having same name as my function:
Disassembly of section .text.Enet_initOsalCfg:
00000000 <Enet_initOsalCfg>:
0: 00 48 2d e9 push {r11, lr}
4: 08 d0 4d e2 sub sp, sp, #8
8: 04 00 8d e5 str r0, [sp, #4]
c: 04 00 9d e5 ldr r0, [sp, #4]
10: fe ff ff eb bl #-8 <Enet_initOsalCfg+0x10>
14: 08 d0 8d e2 add sp, sp, #8
18: 00 88 bd e8 pop {r11, pc}
Disassembly of section .rel.text.Enet_initOsalCfg:
00000000 <.rel.text.Enet_initOsalCfg>:
0: 10 00 00 00 andeq r0, r0, r0, lsl r0
4: 1c 9c 00 00 andeq r9, r0, r12, lsl r12
Disassembly of section .ARM.exidx.text.Enet_initOsalCfg:
00000000 <.ARM.exidx.text.Enet_initOsalCfg>:
0: 00 00 00 00 andeq r0, r0, r0
4: 01 00 00 00 andeq r0, r0, r1
Disassembly of section .rel.ARM.exidx.text.Enet_initOsalCfg:
00000000 <.rel.ARM.exidx.text.Enet_initOsalCfg>:
0: 00 00 00 00 andeq r0, r0, r0
4: 2a 72 00 00 andeq r7, r0, r10, lsr #4
What am I missing here?
Excuse me if I am being stupid

Big oops our friend in the comments found my problem. I forgot extern "C" sorry for being stupid I was scratching my head on this since 4 hours, my aplologies :P

Related

Understanding segmentation fault in core dump on NULL pointer check

I am having difficulty understanding how this segmentation fault is possible. The architecture of the machine is armv7l.
The core dump:
Dump of assembler code for function DLL_Disconnect:
0x6cd3a460 <+0>: 15 4b ldr r3, [pc, #84] ; (0x6cd3a4b8 <DLL_Disconnect+88>)
0x6cd3a462 <+2>: 00 21 movs r1, #0
0x6cd3a464 <+4>: 15 4a ldr r2, [pc, #84] ; (0x6cd3a4bc <DLL_Disconnect+92>)
0x6cd3a466 <+6>: 30 b5 push {r4, r5, lr}
0x6cd3a468 <+8>: 83 b0 sub sp, #12
0x6cd3a46a <+10>: 7b 44 add r3, pc
0x6cd3a46c <+12>: 01 91 str r1, [sp, #4]
0x6cd3a46e <+14>: 04 46 mov r4, r0
0x6cd3a470 <+16>: 9d 58 ldr r5, [r3, r2]
=> 0x6cd3a472 <+18>: 28 68 ldr r0, [r5, #0]
0x6cd3a474 <+20>: c0 b1 cbz r0, 0x6cd3a4a8 <DLL_Disconnect+72>
0x6cd3a476 <+22>: 21 46 mov r1, r4
...
0x6cd3a4b6 <+86>: 00 bf nop
0x6cd3a4b8 <+88>: 96 b6 00 00 .word 0x0000b696 <- replaced from objdump, as gdb prints as instruction
0x6cd3a4bc <+92>: 1c 02 00 00 .word 0x0000021c <- also replaced
The registers:
r0 0x0 0
r1 0x0 0
r2 0x21c 540
r3 0x6cd45b04 1825856260
r4 0x0 0
r5 0x1dddc 122332
...
sp 0x62afeb40 0x62afeb40
lr 0x72a3091b 1923287323
pc 0x6cd3a472 0x6cd3a472 <DLL_Disconnect+18>
cpsr 0x600c0030 1611399216
fpscr 0x0 0
The segmentation fault is caused when "ldr r0, [r5, #0]" tries to access the memory address pointed to by r5. In GDB I get a similar message when trying to access it in GDB:
(gdb) print *$r5
Cannot access memory at address 0x1dddc
However, all offending register values are calculated by static values. So I don't understand how the memory address is not accessible.
The source code is loaded and executed via a shared library using dlopen and dlsym:
CClient* gl_pClient = NULL;
extern "C" unsigned long DLL_Disconnect(unsigned long ulHandle)
{
CProtocol* pCProtocol = NULL;
unsigned long ulResult = ACTION_INTERNAL_ERROR;
if (gl_pClient == NULL)
{
return ACTION_API_NOT_INITIALIZED;
}
...
The assembly code resolves the address of global variable gl_pClient using dll relocations, which are loaded using program-counter-relative addressing. Then the code loads from that address and crashes. It looks like the relocations got corrupted, so that the resolved address is invalid.
There isn't much else can be said without a reproduction.
You may like to run your program under valgrind which may report memory corruption.

Reading from flash that's not part of the application

I'm programming bare-metal embedded, so no OS etc. on a STM32L4 (ARM Cortex M4). I have a separate page in flash, which is written by a bootloader (it is not and should not be part of my application binary, this is a must). In this page, I store configuration parameters that will be used in my application. This configuration page may change, but not during runtime, after a change I reset the processor.
How can I access this data in flash most nicely?
My definition of nice is (in this order of priority):
- support for (u)int32_t, (u)int8_t, bool, char[fixed-size]
- little overhead when compared to #define PARAM (1) or constexpr
- typesafe usage (i.e. uint8_t var = CONFIG_CHAR_ARRAY shall issue atleast a warning)
- no RAM copy
- readability of the configuration parameters while debugging (using STM32CubeIDE)
The solution shall scale for all possible 2048 bytes of the flashpage. Code generation is anyhow part of the process.
So far, I have tested two variants (I am coming from plain C but am using (potentially modern) C++ in this project). My current testcase is
if (param) function_call();
but it should also work for other cases such as
for(int i = 0; i < param2; i++)
define with pointer cast
#define CONF_PARAM1 (*(bool*)(CONFIG_ADDRESS + 0x0083))
Which leads to (using -Os):
8008872: 4b1b ldr r3, [pc, #108] ; (80088e0 <_Z16main_applicationv+0xac>)
8008874: 781b ldrb r3, [r3, #0]
8008876: b10b cbz r3, 800887c <_Z16main_applicationv+0x48>
8008878: f7ff ff56 bl 8008728 <_Z10function_callv>
80088e0: 0801f883 .word 0x0801f883
const variable
const bool CONF_PARAM1 = *(bool*)(CONFIG_ADDRESS + 0x0083);
leading to
800887c: 4b19 ldr r3, [pc, #100] ; (80088e4 <_Z16main_applicationv+0xb0>)
800887e: 781b ldrb r3, [r3, #0]
8008880: b10b cbz r3, 8008886 <_Z16main_applicationv+0x52>
8008882: f000 f899 bl 8008728 <_Z10function_callv>
80088e4: 200000c0 .word 0x200000c0
I dislike option 2, as it adds a RAM copy (would not scale well for 2048 bytes of config), option 1 looks like very old c style and does not help while debugging. I struggle to find another option using the linker script, as I do not find a way to not end up with the variable being in the application's binary.
Is there any better way of doing this?
If you make your constant a reference the compiler wont copy it into a variable, it will probably just load the address into a variable. You can then wrap the generation of the references into a templated function to make your code cleaner:
#include <cstdint>
#include <iostream>
template <typename T>
const T& configOption(uintptr_t offset)
{
const uintptr_t CONFIG_ADDRESS = 0x1000;
return *reinterpret_cast<T*>(CONFIG_ADDRESS + offset);
}
auto& CONF_PARAM1 = configOption< bool >(0x0083);
auto& CONF_PARAM2 = configOption< int >(0x0087);
int main()
{
std::cout << CONF_PARAM1 << ", " << CONF_PARAM2 << "\n";
}
GCC optimises this fairly well: https://godbolt.org/z/r27o5Q
As proposed by #old_timer in a comment above, I favour this solution:
In the linker file, I put
CONF_PARAM = _config_start + 0x0083;
In my config.hpp, I put
extern const bool CONF_PARAM;
which then can easily be accessed in any source file
if (CONF_PARAM)
This basically fulfills all "nice"-definitions of mine, as far as I can see.
No need to re-invent the wheel - placing data in flash is a fairly common use-case in embedded systems. When dealing with such data flash, there are some important considerations:
All data must sit at the very same address, with the same type, from case to case. This means that struct is problematic because of padding (and even more so class). If you align all data on 32 bit boundaries, this shouldn't be a problem, so I strongly recommend that you do so. Then the program becomes portable between compilers.
All these variables and pointers to them must be declared with volatile qualifier, otherwise the optimizer might go haywire. Things like (*(bool*)(CONFIG_ADDRESS + 0x0083)) are brittle and might break at any point, unless you add volatile.
You can place data at a fixed location in memory, but how to do so is compiler/linker-specific. And since it isn't standardized, it's always a pain to get right. With gcc-flavoured compilers it might be something like: __attribute__(section(".dataflash")) where .dataflash is your custom segment that you must reserve space for in the linker script. You'll need to take a closer look at how to do this with your specific toolchain (others use #pragmas etc instead), I'll use the __attribute__ here just to illustrate.
If this section gets downloaded together with the executable binary, or only through bootloader, is up to you to define. Linker scripts typically come with a "no init" option.
So you could do something like:
// flashdata.h
typedef struct
{
uint32_t stuff;
uint32_t more_stuff;
...
} flashdata_t;
extern volatile const flashdata_t flash_data __attribute__(section(".dataflash"));
And then declare it as:
// flashdata.c
volatile const flashdata_t flash_data __attribute__(section(".dataflash"));
And now you can use it as any struct, flash_data.stuff.
If you are using C, you can even split up each uint32_t chunk with union, such as typedef union { uint32_t u32; uint8_t u8 [4]; } and similar, but that isn't possible in C++, because it doesn't allow union type punning.
You can isolate the variables in question their own section. There is more than one way to do that. The tools build normally and do all the addressing work. Like using structs across compile domains you need to be extremely careful and probably put checks into the code, but you can build the binary and only load it or all but the other flash contents, then at that time or later you can change the VALUES of the variables in the other section and build and isolate those into their own load.
Testing the theory
vectors.s
.globl _start
_start:
.word 0x20001000
.word reset
.thumb_func
reset:
bl main
b .
.globl dummy
.thumb_func
dummy:
bx lr
so.c
extern volatile unsigned int x;
extern volatile unsigned short y;
extern volatile unsigned char z[7];
extern void dummy ( unsigned int );
int main ( void )
{
dummy(x);
dummy(y);
dummy(z[0]<<z[1]);
return(0);
}
flashvars.c
volatile unsigned int x=1;
volatile unsigned short y=3;
volatile unsigned char z[7]={1,2,3,4,5,6,7};
flash.ld
MEMORY
{
rom0 : ORIGIN = 0x08000000, LENGTH = 0x1000
rom1 : ORIGIN = 0x08002000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom0
.vars : { flashvars.o } > rom1
}
build
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 vectors.s -o vectors.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld vectors.o so.o flashvars.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy -R .vars -O binary so.elf so.bin
examine
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
08000008 <reset>:
8000008: f000 f802 bl 8000010 <main>
800000c: e7fe b.n 800000c <reset+0x4>
0800000e <dummy>:
800000e: 4770 bx lr
08000010 <main>:
8000010: 4b08 ldr r3, [pc, #32] ; (8000034 <main+0x24>)
8000012: b510 push {r4, lr}
8000014: 6818 ldr r0, [r3, #0]
8000016: f7ff fffa bl 800000e <dummy>
800001a: 4b07 ldr r3, [pc, #28] ; (8000038 <main+0x28>)
800001c: 8818 ldrh r0, [r3, #0]
800001e: b280 uxth r0, r0
8000020: f7ff fff5 bl 800000e <dummy>
8000024: 4b05 ldr r3, [pc, #20] ; (800003c <main+0x2c>)
8000026: 7818 ldrb r0, [r3, #0]
8000028: 785b ldrb r3, [r3, #1]
800002a: 4098 lsls r0, r3
800002c: f7ff ffef bl 800000e <dummy>
8000030: 2000 movs r0, #0
8000032: bd10 pop {r4, pc}
8000034: 0800200c stmdaeq r0, {r2, r3, sp}
8000038: 08002008 stmdaeq r0, {r3, sp}
800003c: 08002000 stmdaeq r0, {sp}
Disassembly of section .vars:
08002000 <z>:
8002000: 04030201 streq r0, [r3], #-513 ; 0xfffffdff
8002004: 00070605 andeq r0, r7, r5, lsl #12
08002008 <y>:
8002008: 00000003 andeq r0, r0, r3
0800200c <x>:
800200c: 00000001 andeq r0, r0, r1
that looks good
hexdump -C so.bin
00000000 00 10 00 20 09 00 00 08 00 f0 02 f8 fe e7 70 47 |... ..........pG|
00000010 08 4b 10 b5 18 68 ff f7 fa ff 07 4b 18 88 80 b2 |.K...h.....K....|
00000020 ff f7 f5 ff 05 4b 18 78 5b 78 98 40 ff f7 ef ff |.....K.x[x.#....|
00000030 00 20 10 bd 0c 20 00 08 08 20 00 08 00 20 00 08 |. ... ... ... ..|
00000040
as does that.
arm-none-eabi-objcopy -j .vars -O binary so.elf sovars.bin
hexdump -C sovars.bin
00000000 01 02 03 04 05 06 07 00 03 00 00 00 01 00 00 00 |................|
00000010 47 43 43 3a 20 28 47 4e 55 29 20 39 2e 33 2e 30 |GCC: (GNU) 9.3.0|
00000020 00 41 30 00 00 00 61 65 61 62 69 00 01 26 00 00 |.A0...aeabi..&..|
00000030 00 05 43 6f 72 74 65 78 2d 4d 30 00 06 0c 07 4d |..Cortex-M0....M|
00000040 09 01 12 04 14 01 15 01 17 03 18 01 19 01 1a 01 |................|
00000050 1e 02 |..|
00000052
hah, okay a little more work.
MEMORY
{
rom0 : ORIGIN = 0x08000000, LENGTH = 0x1000
rom1 : ORIGIN = 0x08002000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom0
.vars : { flashvars.o(.data) } > rom1
}
hexdump -C sovars.bin
00000000 01 02 03 04 05 06 07 00 03 00 00 00 01 00 00 00 |................|
00000010
much better.
I strongly recommend against structs across compile domains and this falls into that category as the build for the real data is separate and between the code build and the data build you could get data that doesn't land the same, when I do things like this I put in protections to catch the problem during execution before it goes off the rails (or better at build time). It is not a case of if it is a case of when. Implementation defined means implementation defined.
But thinking about your question this became an easy solution. And yes technically this data is read only, const this or that, but 1) does volatile and const go together? and 2) do you really want/need to do that?
Does it even need to be volatile? Probably not, just banged that out to start with. Switching it to const the tool puts them in .rodata. Well my tool does depends on how you write your linker script and I think the version of binutils.
so.c
extern const unsigned int x;
extern const unsigned short y;
extern const unsigned char z[7];
extern void dummy ( unsigned int );
int main ( void )
{
dummy(x);
dummy(y);
dummy(z[0]<<z[1]);
return(0);
}
flashvars.c
const unsigned int x=1;
const unsigned short y=3;
const unsigned char z[7]={1,2,3,4,5,6,7};
flash.ld
MEMORY
{
rom0 : ORIGIN = 0x08000000, LENGTH = 0x1000
rom1 : ORIGIN = 0x08002000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom0
.vars : { flashvars.o(.rodata) } > rom1
}
output
Disassembly of section .text:
08000000 <_start>:
8000000: 20001000 andcs r1, r0, r0
8000004: 08000009 stmdaeq r0, {r0, r3}
08000008 <reset>:
8000008: f000 f802 bl 8000010 <main>
800000c: e7fe b.n 800000c <reset+0x4>
0800000e <dummy>:
800000e: 4770 bx lr
08000010 <main>:
8000010: 4b08 ldr r3, [pc, #32] ; (8000034 <main+0x24>)
8000012: b510 push {r4, lr}
8000014: 6818 ldr r0, [r3, #0]
8000016: f7ff fffa bl 800000e <dummy>
800001a: 4b07 ldr r3, [pc, #28] ; (8000038 <main+0x28>)
800001c: 8818 ldrh r0, [r3, #0]
800001e: f7ff fff6 bl 800000e <dummy>
8000022: 4b06 ldr r3, [pc, #24] ; (800003c <main+0x2c>)
8000024: 7818 ldrb r0, [r3, #0]
8000026: 785b ldrb r3, [r3, #1]
8000028: 4098 lsls r0, r3
800002a: f7ff fff0 bl 800000e <dummy>
800002e: 2000 movs r0, #0
8000030: bd10 pop {r4, pc}
8000032: 46c0 nop ; (mov r8, r8)
8000034: 0800200c stmdaeq r0, {r2, r3, sp}
8000038: 08002008 stmdaeq r0, {r3, sp}
800003c: 08002000 stmdaeq r0, {sp}
Disassembly of section .vars:
08002000 <z>:
8002000: 04030201 streq r0, [r3], #-513 ; 0xfffffdff
8002004: 00070605 andeq r0, r7, r5, lsl #12
08002008 <y>:
8002008: 00000003 andeq r0, r0, r3
0800200c <x>:
800200c: 00000001 andeq r0, r0, r1
hexdump -C so.bin
00000000 00 10 00 20 09 00 00 08 00 f0 02 f8 fe e7 70 47 |... ..........pG|
00000010 08 4b 10 b5 18 68 ff f7 fa ff 07 4b 18 88 ff f7 |.K...h.....K....|
00000020 f6 ff 06 4b 18 78 5b 78 98 40 ff f7 f0 ff 00 20 |...K.x[x.#..... |
00000030 10 bd c0 46 0c 20 00 08 08 20 00 08 00 20 00 08 |...F. ... ... ..|
00000040
hexdump -C sovars.bin
00000000 01 02 03 04 05 06 07 00 03 00 00 00 01 00 00 00 |................|
00000010

Hardfault on CC2538 (Cortex m3) startup, in __lib_init_array

I'm trying to port an mbed-os (RTX RTOS) project to CC2538 (ARM Cortex M3) which it is compiled using mbed-cli toolchain which integrates arm-none-eabi-gcc. When I try to boot the MCU, I get stuck in Hard Fault error in startup.
00202678 <__libc_init_array>:
202678: b570 push {r4, r5, r6, lr}
20267a: 4e0f ldr r6, [pc, #60] ; (2026b8 <__libc_init_array+0x40>)
20267c: 4d0f ldr r5, [pc, #60] ; (2026bc <__libc_init_array+0x44>)
20267e: 1b76 subs r6, r6, r5
202680: 10b6 asrs r6, r6, #2
202682: bf18 it ne
202684: 2400 movne r4, #0
202686: d005 beq.n 202694 <__libc_init_array+0x1c>
202688: 3401 adds r4, #1
20268a: f855 3b04 ldr.w r3, [r5], #4
20268e: 4798 blx r3
202690: 42a6 cmp r6, r4
202692: d1f9 bne.n 202688 <__libc_init_array+0x10>
202694: 4e0a ldr r6, [pc, #40] ; (2026c0 <__libc_init_array+0x48>)
202696: 4d0b ldr r5, [pc, #44] ; (2026c4 <__libc_init_array+0x4c>)
202698: f004 fec2 bl 207420 <_etext>
20269c: 1b76 subs r6, r6, r5
20269e: 10b6 asrs r6, r6, #2
2026a0: bf18 it ne
2026a2: 2400 movne r4, #0
2026a4: d006 beq.n 2026b4 <__libc_init_array+0x3c>
2026a6: 3401 adds r4, #1
2026a8: f855 3b04 ldr.w r3, [r5], #4
2026ac: 4798 blx r3
2026ae: 42a6 cmp r6, r4
2026b0: d1f9 bne.n 2026a6 <__libc_init_array+0x2e>
2026b2: bd70 pop {r4, r5, r6, pc}
2026b4: bd70 pop {r4, r5, r6, pc}
2026b6: bf00 nop
I got traced the code flow, the final step PC is executing
2026a4: d006 beq.n 2026b4 <__libc_init_array+0x3c>
then
2026b4: bd70 pop {r4, r5, r6, pc}
at this moment, PC get the value 0, then jump to address 0x00000000 and caused
Hard Fault error.
after cpu execute
202678: b570 push {r4, r5, r6, lr}
[register]
R0 =00000000
R1 =00000001
R2 =00000000
R3 =00000002
R4 =00000000
R5 =00000000
R6 =00000000
R7 =00000000
R8 =00000000
R9 =00000000
R10=00000000
R11=00000000
R12=00200F51
SP =200019F0
LR =00200A77
PC =0020267A
[memory]
200019b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200019c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200019d0: f0 09 00 20 00 00 00 00 00 00 00 00 04 0a 00 20
200019e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200019f0: 00 00 00 00 00 00 00 00 00 00 00 00 77 0a 20 00
20001a00: 00 00 00 00 5d 0c 20 00 00 04 00 00 01 01 00 00
before cpu execute
2026b4: bd70 pop {r4, r5, r6, pc}
Debugger dump
[register]
R0 =00000000
R1 =00000001
R2 =00000000
R3 =00000002
R4 =00000000
R5 =00000000
R6 =00000000
R7 =00000000
R8 =00000000
R9 =00000000
R10=00000000
R11=00000000
R12=00200F51
SP =200019C0
LR =0020269D
PC =002026B4
[memory]
200019b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200019c0: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200019d0: 00 00 00 00 9d 26 20 00 02 00 00 00 00 00 00 00
200019e0: 00 00 00 00 00 00 00 00 00 00 00 00 9d 26 20 00
200019f0: 00 00 00 00 00 00 00 00 00 00 00 00 77 0a 20 00
20001a00: 00 00 00 00 5d 0c 20 00 00 04 00 00 01 01 00 00
And if I manually modify StackPointer to 0x200019f0 when pop registers instruction are executed in __libc_init_array.
and I found it will successfully jump to main() at the end.
it seems problem solves.
My question is why stack control goes wrong in __libc_init_array()??
I even can't find the implementation source code of __libc_init_array() function under mbed-os entire project.
attached the .ld file
MEMORY
{
FLASH_FW (rx) : ORIGIN = 0x00200000 + 0,
LENGTH = (0x00200000 + (((((0) << 0 | (512) << 4 | (32) << 16 | ((1) ? 0x01000000 : 0) | ((1) ? 0x02000000 : 0)) & 0x0000FFF0) >> 4) << 10) - 0x0000002C) - (0x00200000 + 0)
FLASH_CCA (RX) : ORIGIN = (0x00200000 + (((((0) << 0 | (512) << 4 | (32) << 16 | ((1) ? 0x01000000 : 0) | ((1) ? 0x02000000 : 0)) & 0x0000FFF0) >> 4) << 10) - 0x0000002C), LENGTH = 0x0000002C
NRSRAM (RWX) : ORIGIN = 0x20000000, LENGTH = 0
FRSRAM (RWX) : ORIGIN = (((((((0) << 0 | (512) << 4 | (32) << 16 | ((1) ? 0x01000000 : 0) | ((1) ? 0x02000000 : 0)) & 0x00FF0000) >> 16) << 10) - ((((((((0) << 0 | (512) << 4 | (32) << 16 | ((1) ? 0x01000000 : 0) | ((1) ? 0x02000000 : 0)) & 0x00FF0000) >> 16) << 10)) < (16384)) ? ((((((0) << 0 | (512) << 4 | (32) << 16 | ((1) ? 0x01000000 : 0) | ((1) ? 0x02000000 : 0)) & 0x00FF0000) >> 16) << 10)) : (16384))) ? 0x20000000 : 0x20004000), LENGTH = (((((0) << 0 | (512) << 4 | (32) << 16 | ((1) ? 0x01000000 : 0) | ((1) ? 0x02000000 : 0)) & 0x00FF0000) >> 16) << 10)
}
/* Linker script to place sections and symbol values. Should be used together
* with other linker script that defines memory regions FLASH and RAM.
* It references following symbols, which must be defined in code:
* Reset_Handler : Entry of reset handler
*
* It defines following symbols, which code can use without definition:
* __exidx_start
* __exidx_end
* __etext
* __data_start__
* __preinit_array_start
* __preinit_array_end
* __init_array_start
* __init_array_end
* __fini_array_start
* __fini_array_end
* __data_end__
* __bss_start__
* __bss_end__
* __end__
* end
* __HeapLimit
* __StackLimit
* __StackTop
* __stack
*/
ENTRY(flash_cca_lock_page)
SECTIONS
{
.text :
{
_text = .;
*(.vectors)
*(.text*)
*(.rodata*)
_etext = .;
} > FLASH_FW= 0
.socdata (NOLOAD) :
{
*(.udma_channel_control_table)
} > FRSRAM
.data : ALIGN(4)
{
_data = .;
*(.data*)
_edata = .;
} > FRSRAM AT > FLASH_FW
_ldata = LOADADDR(.data);
.ARM.exidx :
{
*(.ARM.exidx*)
} > FLASH_FW
.bss :
{
_bss = .;
*(.bss*)
*(COMMON)
_ebss = .;
} > FRSRAM
.heap :
{
__end__ = .;
end = __end__;
*(.heap*)
__HeapLimit = .;
} > RAM
.stack (NOLOAD) :
{
*(.stack)
} > FRSRAM
_heap = .;
_eheap = ORIGIN(FRSRAM) + LENGTH(FRSRAM);
.nrdata (NOLOAD) :
{
_nrdata = .;
*(.nrdata*)
_enrdata = .;
} > NRSRAM
.flashcca :
{
*(.flashcca)
} > FLASH_CCA
}
#notlikethat is right, the problem is link script file, but the root cause is not section overlapping. as I post above. in __libc_init_array()
202678: b570 push {r4, r5, r6, lr}
at this time, stack pointer points to 0x200019F0, but some how when pop operation the stack pointer points to 0x200019C0, and it cause the Hardfault error. I traced the code flow, in __libc_init_array(), it will jump to section <_init> at
202698: f004 fec2 bl 207420 <_etext>
which it is look like that in memory
00207420 <_init>:
207420: b5f8 push {r3, r4, r5, r6, r7, lr}
207422: bf00 nop
00207424 <_fini>:
207426: b5f8 push {r3, r4, r5, r6, r7, lr}
207428: bf00 nop
I'm wondering that this function cause the stack pointer over count, because of the <_init> section only shows push instruction but no pop instruction.
I do more web search about <_init> section, and confirm that is shouldn't be only 2 instructions. and it is effected by linker file.
In the previously linker file, I didn't take care about .init & .fini section.
And then I have done some modified, and it looks like
.text :
{
_text = .;
*(.vectors)
*(.text*)
KEEP(*(.init))
KEEP(*(.fini))
/* .ctors */
*crtbegin.o(.ctors)
*crtbegin?.o(.ctors)
*(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors)
*(SORT(.ctors.*))
*(.ctors)
/* .dtors */
*crtbegin.o(.dtors)
*crtbegin?.o(.dtors)
*(EXCLUDE_FILE(*crtend?.o *crtend.o) .dtors)
*(SORT(.dtors.*))
*(.dtors)
*(.rodata*)
KEEP(*(.eh_frame*))
_etext = .;
} > FLASH_FW= 0
.socdata (NOLOAD) :
After compile then <_init> & <_fini> section have changed as follow.
00207040 <_init>:
207040: b5f8 push {r3, r4, r5, r6, r7, lr}
207042: bf00 nop
207044: bcf8 pop {r3, r4, r5, r6, r7}
207046: bc08 pop {r3}
207048: 469e mov lr, r3
20704a: 4770 bx lr
0020704c <_fini>:
20704c: b5f8 push {r3, r4, r5, r6, r7, lr}
20704e: bf00 nop
207050: bcf8 pop {r3, r4, r5, r6, r7}
207052: bc08 pop {r3}
207054: 469e mov lr, r3
207056: 4770 bx lr
207058: 6465626d .word 0x6465626d
20705c: 73736120 .word 0x73736120
207060: 61747265 .word 0x61747265
207064: 6e6f6974 .word 0x6e6f6974
207068: 69616620 .word 0x69616620
20706c: 3a64656c .word 0x3a64656c
207070: 2c732520 .word 0x2c732520
207074: 6c696620 .word 0x6c696620
207078: 25203a65 .word 0x25203a65
and then jump into main() successfully.

strange behavior when trying to compile a source with tcc against gcc generated .o file

I am trying to compile a source with tcc (ver 0.9.26) against a gcc-generated .o file, but it has strange behavior. The gcc (ver 5.3.0)is from MinGW 64 bit.
More specifically, I have the following two files (te1.c te2.c). I did the following commands on windows7 box
c:\tcc> gcc -c te1.c
c:\tcc> objcopy -O elf64-x86-64 te1.o #this is needed because te1.o from previous step is in COFF format, tcc only understand ELF format
c:\tcc> tcc te2.c te1.o
c:\tcc> te2.exe
567in dummy!!!
Note that it cut off 4 bytes from the string 1234567in dummy!!!\n. Wonder if what could have gone wrong.
Thanks
Jin
========file te1.c===========
#include <stdio.h>
void dummy () {
printf1("1234567in dummy!!!\n");
}
========file te2.c===========
#include <stdio.h>
void printf1(char *p) {
printf("%s\n",p);
}
extern void dummy();
int main(int argc, char *argv[]) {
dummy();
return 0;
}
Update 1
Saw a difference in assembly between te1.o (te1.c compiled by tcc) and te1_gcc.o (te1.c compiled by gcc). In the tcc compiled, I saw lea -0x4(%rip),%rcx, on the gcc compiled, I saw lea 0x0(%rip),%rcx.
Not sure why.
C:\temp>objdump -d te1.o
te1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <dummy>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 81 ec 20 00 00 00 sub $0x20,%rsp
b: 48 8d 0d fc ff ff ff lea -0x4(%rip),%rcx # e <dummy+0xe>
12: e8 fc ff ff ff callq 13 <dummy+0x13>
17: c9 leaveq
18: c3 retq
19: 00 00 add %al,(%rax)
1b: 00 01 add %al,(%rcx)
1d: 04 02 add $0x2,%al
1f: 05 04 03 01 50 add $0x50010304,%eax
C:\temp>objdump -d te1_gcc.o
te1_gcc.o: file format pe-x86-64
Disassembly of section .text:
0000000000000000 <dummy>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 20 sub $0x20,%rsp
8: 48 8d 0d 00 00 00 00 lea 0x0(%rip),%rcx # f <dummy+0xf>
f: e8 00 00 00 00 callq 14 <dummy+0x14>
14: 90 nop
15: 48 83 c4 20 add $0x20,%rsp
19: 5d pop %rbp
1a: c3 retq
1b: 90 nop
1c: 90 nop
1d: 90 nop
1e: 90 nop
1f: 90 nop
Update2
Using a binary editor, I changed the machine code in te1.o (produced by gcc) and changed lea 0(%rip),%rcx to lea -0x4(%rip),%rcx and using the tcc to link it, the resulted exe works fine.
More precisely, I did
c:\tcc> gcc -c te1.c
c:\tcc> objcopy -O elf64-x86-64 te1.o
c:\tcc> use a binary editor to the change the bytes from (48 8d 0d 00 00 00 00) to (48 8d 0d fc ff ff ff)
c:\tcc> tcc te2.c te1.o
c:\tcc> te2
1234567in dummy!!!
Update 3
As requested, here is the output of objdump -r te1.o
C:\temp>gcc -c te1.c
C:\temp>objdump -r te1.o
te1.o: file format pe-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
000000000000000b R_X86_64_PC32 .rdata
0000000000000010 R_X86_64_PC32 printf1
RELOCATION RECORDS FOR [.pdata]:
OFFSET TYPE VALUE
0000000000000000 rva32 .text
0000000000000004 rva32 .text
0000000000000008 rva32 .xdata
C:\temp>objdump -d te1.o
te1.o: file format pe-x86-64
Disassembly of section .text:
0000000000000000 <dummy>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 20 sub $0x20,%rsp
8: 48 8d 0d 00 00 00 00 lea 0x0(%rip),%rcx # f <dummy+0xf>
f: e8 00 00 00 00 callq 14 <dummy+0x14>
14: 90 nop
15: 48 83 c4 20 add $0x20,%rsp
19: 5d pop %rbp
1a: c3 retq
1b: 90 nop
1c: 90 nop
1d: 90 nop
1e: 90 nop
1f: 90 nop
Has nothing to do with tcc or calling conventions. It has to do with different linker conventions for elf64-x86-64 and pe-x86-64 formats.
With PE, the linker will subtract 4 implicitly to calculate the final offset.
With ELF, it does not do this. Because of this, 0 is the correct initial value for PE, and -4 is correct for ELF.
Unfortunately, objcopy does not convert this -> bug in objcopy.
add
extern void printf1(char *p);
to your te1.c file
Or: the compiler will assume argument 32 bit integer since there's no prototype, and pointers are 64-bit long.
Edit: this is still not working. I found out that the function never returns (since calling the printf1 a second time does nothing!). Seems that the 4 first bytes are consumed as return address or something like that. In gcc 32-bit mode it works fine.
Sounds like a calling convention problem to me but still cannot figure it out.
Another clue: calling printf from te1.c side (gcc, using tcc stdlib bindings) crashes with segv.
I disassembled the executable. First part is repeated call from tcc side
40104f: 48 8d 05 b3 0f 00 00 lea 0xfb3(%rip),%rax # 0x402009
401056: 48 89 45 f8 mov %rax,-0x8(%rbp)
40105a: 48 8b 4d f8 mov -0x8(%rbp),%rcx
40105e: e8 9d ff ff ff callq 0x401000
401063: 48 8b 4d f8 mov -0x8(%rbp),%rcx
401067: e8 94 ff ff ff callq 0x401000
40106c: 48 8b 4d f8 mov -0x8(%rbp),%rcx
401070: e8 8b ff ff ff callq 0x401000
401075: 48 8b 4d f8 mov -0x8(%rbp),%rcx
401079: e8 82 ff ff ff callq 0x401000
40107e: e8 0d 00 00 00 callq 0x401090
401083: b8 00 00 00 00 mov $0x0,%eax
401088: e9 00 00 00 00 jmpq 0x40108d
40108d: c9 leaveq
40108e: c3 retq
Second part is repeated (6 times) call to the same function. As you can see the address is different (shifted by 4 bytes, like your data) !!! It kind of works just once because the 4 first instructions are the following:
401000: 55 push %rbp
401001: 48 89 e5 mov %rsp,%rbp
so stack is destroyed if those are skipped!!
40109f: 48 89 45 f8 mov %rax,-0x8(%rbp)
4010a3: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010a7: 48 89 c1 mov %rax,%rcx
4010aa: e8 55 ff ff ff callq 0x401004
4010af: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010b3: 48 89 c1 mov %rax,%rcx
4010b6: e8 49 ff ff ff callq 0x401004
4010bb: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010bf: 48 89 c1 mov %rax,%rcx
4010c2: e8 3d ff ff ff callq 0x401004
4010c7: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010cb: 48 89 c1 mov %rax,%rcx
4010ce: e8 31 ff ff ff callq 0x401004
4010d3: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010d7: 48 89 c1 mov %rax,%rcx
4010da: e8 25 ff ff ff callq 0x401004
4010df: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010e3: 48 89 c1 mov %rax,%rcx
4010e6: e8 19 ff ff ff callq 0x401004
4010eb: 90 nop

G++ 4.6 -std=gnu++0x: Static Local Variable Constructor Call Timing and Thread Safety

void a() { ... }
void b() { ... }
struct X
{
X() { b(); }
};
void f()
{
a();
static X x;
...
}
Assume f is called multiple times from various threads (potentially contended) after the entry of main. (and of course that the only calls to a and b are those seen above)
When the above code is compiled with gcc g++ 4.6 in -std=gnu++0x mode:
Q1. Is it guaranteed that a() will be called at least once and return before b() is called? That is to ask, on the first call to f(), is the constructor of x called at the same time an automatic duration local variable (non-static) would be (and not at global static initialization time for example)?
Q2. Is it guaranteed that b() will be called exactly once? Even if two threads execute f for the first time at the same time on different cores? If yes, by which specific mechanism does the GCC generated code provide synchronization? Edit: Additionally could one of the threads calling f() obtain access to x before the constructor of X returns?
Update: I am trying to compile an example and decompile to investigate mechanism...
test.cpp:
struct X;
void ext1(int x);
void ext2(X& x);
void a() { ext1(1); }
void b() { ext1(2); }
struct X
{
X() { b(); }
};
void f()
{
a();
static X x;
ext2(x);
}
Then:
$ g++ -std=gnu++0x -c -o test.o ./test.cpp
$ objdump -d test.o -M intel > test.dump
test.dump:
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z1av>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: bf 01 00 00 00 mov edi,0x1
9: e8 00 00 00 00 call e <_Z1av+0xe>
e: 5d pop rbp
f: c3 ret
0000000000000010 <_Z1bv>:
10: 55 push rbp
11: 48 89 e5 mov rbp,rsp
14: bf 02 00 00 00 mov edi,0x2
19: e8 00 00 00 00 call 1e <_Z1bv+0xe>
1e: 5d pop rbp
1f: c3 ret
0000000000000020 <_Z1fv>:
20: 55 push rbp
21: 48 89 e5 mov rbp,rsp
24: 41 54 push r12
26: 53 push rbx
27: e8 00 00 00 00 call 2c <_Z1fv+0xc>
2c: b8 00 00 00 00 mov eax,0x0
31: 0f b6 00 movzx eax,BYTE PTR [rax]
34: 84 c0 test al,al
36: 75 2d jne 65 <_Z1fv+0x45>
38: bf 00 00 00 00 mov edi,0x0
3d: e8 00 00 00 00 call 42 <_Z1fv+0x22>
42: 85 c0 test eax,eax
44: 0f 95 c0 setne al
47: 84 c0 test al,al
49: 74 1a je 65 <_Z1fv+0x45>
4b: 41 bc 00 00 00 00 mov r12d,0x0
51: bf 00 00 00 00 mov edi,0x0
56: e8 00 00 00 00 call 5b <_Z1fv+0x3b>
5b: bf 00 00 00 00 mov edi,0x0
60: e8 00 00 00 00 call 65 <_Z1fv+0x45>
65: bf 00 00 00 00 mov edi,0x0
6a: e8 00 00 00 00 call 6f <_Z1fv+0x4f>
6f: 5b pop rbx
70: 41 5c pop r12
72: 5d pop rbp
73: c3 ret
74: 48 89 c3 mov rbx,rax
77: 45 84 e4 test r12b,r12b
7a: 75 0a jne 86 <_Z1fv+0x66>
7c: bf 00 00 00 00 mov edi,0x0
81: e8 00 00 00 00 call 86 <_Z1fv+0x66>
86: 48 89 d8 mov rax,rbx
89: 48 89 c7 mov rdi,rax
8c: e8 00 00 00 00 call 91 <_Z1fv+0x71>
Disassembly of section .text._ZN1XC2Ev:
0000000000000000 <_ZN1XC1Ev>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 48 83 ec 10 sub rsp,0x10
8: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi
c: e8 00 00 00 00 call 11 <_ZN1XC1Ev+0x11>
11: c9 leave
12: c3 ret
I don't see the synchronization mechanism? Or is it added at linktime?
Update2: Ok when I link it I can see it...
400973: 84 c0 test %al,%al
400975: 75 2d jne 4009a4 <_Z1fv+0x45>
400977: bf 98 20 40 00 mov $0x402098,%edi
40097c: e8 1f fe ff ff callq 4007a0 <__cxa_guard_acquire#plt>
400981: 85 c0 test %eax,%eax
400983: 0f 95 c0 setne %al
400986: 84 c0 test %al,%al
400988: 74 1a je 4009a4 <_Z1fv+0x45>
40098a: 41 bc 00 00 00 00 mov $0x0,%r12d
400990: bf a0 20 40 00 mov $0x4020a0,%edi
400995: e8 a6 00 00 00 callq 400a40 <_ZN1XC1Ev>
40099a: bf 98 20 40 00 mov $0x402098,%edi
40099f: e8 0c fe ff ff callq 4007b0 <__cxa_guard_release#plt>
4009a4: bf a0 20 40 00 mov $0x4020a0,%edi
4009a9: e8 72 ff ff ff callq 400920 <_Z4ext2R1X>
4009ae: 5b pop %rbx
4009af: 41 5c pop %r12
4009b1: 5d pop %rbp
It surrounds it with __cxa_guard_acquire and __cxa_guard_release, whatever they do.
Q1. Yes. According to C++11, 6.7/4:
such a variable is initialized the first time control passes through its declaration
so it will be initialised after the first call to a().
Q2. Under GCC, and any compiler that supports the C++11 thread model: yes, initialisation of local static variables is thread safe. Other compilers might not give that guarantee. The exact mechanism is an implementation detail. I believe GCC uses an atomic flag to indicate whether it's initialised, and a mutex to protect initialisation when the flag is not set, but I could be wrong. Certainly, this thread implies that it was originally implemented like that.
UPDATE: your code does indeed contain the initialisation code. You can see it more clearly if you link it, and then disassemble the program, so that you can see which functions are being called. I also used objdump -SC to interleave the source and demangle C++ names. It uses internal locking functions __cxa_guard_acquire and __cxa_guard_release, to make sure only one thread executes the initialisation code.
#void f()
#{
400724: push rbp
400725: mov rbp,rsp
400728: push r13
40072a: push r12
40072c: push rbx
40072d: sub rsp,0x8
# a();
400731: call 400704 <a()>
# static X x;
# if (!guard) {
400736: mov eax,0x601050
40073b: movzx eax,BYTE PTR [rax]
40073e: test al,al
400740: jne 400792 <f()+0x6e>
# if (__cxa_guard_acquire(&guard)) {
400742: mov edi,0x601050
400747: call 4005c0 <__cxa_guard_acquire#plt>
40074c: test eax,eax
40074e: setne al
400751: test al,al
400753: je 400792 <f()+0x6e>
# // initialise x
400755: mov ebx,0x0
40075a: mov edi,0x601058
40075f: call 4007b2 <X::X()>
# __cxa_guard_release(&guard);
400764: mov edi,0x601050
400769: call 4005e0 <__cxa_guard_release#plt>
# } else {
40076e: jmp 400792 <f()+0x6e>
# // already initialised
400770: mov r12d,edx
400773: mov r13,rax
400776: test bl,bl
400778: jne 400784 <f()+0x60>
40077a: mov edi,0x601050
40077f: call 4005f0 <__cxa_guard_abort#plt>
400784: mov rax,r13
400787: movsxd rdx,r12d
40078a: mov rdi,rax
40078d: 400610 <_Unwind_Resume#plt>
# }
# }
# ext2(x);
400792: mov edi,0x601058
400797: call 4007d1 <_Z4ext2R1X>
#}
As far as I know it is guaranteed that b is only called once. However, it is not guaranteed that the initialisation is performed thread safe, which means another thread could potentially work with a half/not initialized x. (That's kind of funny because static mutexes are basicly useless this way.)