Linker generates different code (objdump) - c++

I have a makefile that generates a shared library. Calling make from different shells (VxWorks wrenv and Cygwin) results in different libs. Thy VxWorks version is working the Cygwin version not. The differences are very small and not visible with the readelf program.
Using objdumparm -S *.so shows the difference at the end of the functions. See the following output.
Example 1: (two last line differ)
Cygwin:
00010a14 <_ZN3rag3MD511save_digestEPhPKm>:
10a14: e1a0c00d mov ip, sp
10a18: e92dda30 push {r4, r5, r9, fp, ip, lr, pc}
...
10b18: 00000000 andeq r0, r0, r0
10b1c: 3b6e71a1 blcc 1bad1a8 <_stack+0x1b2d1a8>
10b20: 505c3d7f subspl r3, ip, pc, ror sp
VxWorks:
00010a14 <_ZN3rag3MD511save_digestEPhPKm>:
10a14: e1a0c00d mov ip, sp
10a18: e92dda30 push {r4, r5, r9, fp, ip, lr, pc}
...
10b18: 00000000 andeq r0, r0, r0
10b1c: 00000338 andeq r0, r0, r8, lsr r3
10b20: 0000033c andeq r0, r0, ip, lsr r3
or Example 2: (last line differs)
Cygwin:
00010b7c <_ZN3rag3MD55beginEv>:
10b7c: e1a0c00d mov ip, sp
10b80: e92dda10 push {r4, r9, fp, ip, lr, pc}
...
10bd4: e89daa10 ldm sp, {r4, r9, fp, sp, pc}
10bd8: 00000000 andeq r0, r0, r0
10bdc: 695c7960 ldmdbvs ip, {r5, r6, r8, fp, ip, sp, lr}^
VxWorks:
00010b7c <_ZN3rag3MD55beginEv>:
10b7c: e1a0c00d mov ip, sp
10b80: e92dda10 push {r4, r9, fp, ip, lr, pc}
...
10bd4: e89daa10 ldm sp, {r4, r9, fp, sp, pc}
10bd8: 00000000 andeq r0, r0, r0
10bdc: 000005ec andeq r0, r0, ip, ror #11
The parameters for the linker are identical. Do these differences come from the relocation process?
It possible to say why this happens, the reason?

Hmm, the linker wasn't the bad guy! Found out that the compiler was generating different .sho objects. Using the -fno-builtin option to compile the .sho's did remove the differences...

Related

Does armclang saves all needed register on stack with attribute("IRQ")?

I'm working with Keil ARMCompiler 6.15 (armclang.exe) and I'm in doubt of the correctness of the generated assembler code.
It seems to me that the attribute 'interrupt("IRQ")' is ignored.
For me r1 and r2 should be saved on the stack, too.
When I remove the attribute 'used' my complete function is removed (optimization).
Can anyone see the mistake I made or what I've forgotten?
Originally the code was created for gcc.
Attributes used for interrupt routines:
#define INTERRUPT_PROCEDURE __attribute__((interrupt("IRQ"),used,section(".IsrSection")))
#define ISR_VARIABLE __attribute__((section(".IsrSection")))
#define FAST_SHARED_DATA __attribute__((section(".FastSharedDataSection")))
C++ Code:
uint64_t volatile FAST_SHARED_DATA systick_value = uint64_t(0);
extern "C" {
void INTERRUPT_PROCEDURE SysTick_Handler()
{
systick_value++;
}
}
Assembler Code:
0x08001280 push {r4, r6, r7, lr}
0x08001282 add r7, sp, #8
0x08001284 mov r4, sp
0x08001286 bfc r4, #0, #3
0x0800128a mov sp, r4
0x0800128c movw r0, #8192 ; 0x2000
0x08001290 movt r0, #8192 ; 0x2000
0x08001294 ldrd r1, r2, [r0]
0x08001298 adds r1, #1
0x0800129a adc.w r2, r2, #0
0x0800129e strd r1, r2, [r0]
0x080012a2 sub.w r4, r7, #8
0x080012a6 mov sp, r4
0x080012a8 pop {r4, r6, r7, pc}
0x080012aa movs r0, r0
0x080012ac movs r0, r0
0x080012ae movs r0, r0
You do not need this attribute. It is needed in very rare circumstances when the stack is not aligned to 8 bytes (STKALGN bit is not set) by the hardware and you are going to use functions with 64 bits parameters (like uint64_t). ARM automatically saves R0-R3 + some others registers on the stack when entering the ISR handler. If you use FPU you may want to enable FPU registers stackup as well.

arm vector table pointing one byte after

I have small application that compiles and runs well on my ARM Cortex M4. But when I disassemble binary file, that I flush, here is how first bytes look like:
00000000 <.data>:
0: 20020000 andcs r0, r2, r0
4: 080003b5 stmdaeq r0, {r0, r2, r4, r5, r7, r8, r9}
8: 08000345 stmdaeq r0, {r0, r2, r6, r8, r9}
c: 08000351 stmdaeq r0, {r0, r4, r6, r8, r9}
080003b5 should be the address of Reset handler (I have .word Reset_Handler there), but disassembling ELF shows that Reset handler is actually located at 080003b4, which is 1 byte before:
080003b4 <Reset_Handler>:
80003b4: 2100 movs r1, #0
80003b6: e003 b.n 80003c0 <InitData>
(It's running in THUMB mode, I have 2byte instructions).
Even if I disassemble the binary file, it's located at 080003b4:
000003b4 <.data+0x3b4>:
3b4: 2100 movs r1, #0
3b6: e003 b.n 0x3c0
My question is, why does it point 1 byte after? This code surprisingly works on actual board. Even without disassembling, shouldn't instructions be aligned by 2 byte? how can address be 0x000003b5?
Answer: ARM uses it for switching to THUMB mode.

GDB disassembly output is weird

Device - IPAD running IOS 9.3.3 ( ARM architecture 64 bit - jailbroken)
When i try to disassemble the function , i get weird output with unrecognized instruction set such as vqrshl, svcge. How do i fix it? I would really appreciate your help.
Please see the gdb output below
gdb) disassemble
Dump of assembler code for function mach_msg_trap:
0x21f1f894 <mach_msg_trap+0>: mov r12, sp
0x21f1f898 <mach_msg_trap+4>: push {r4, r5, r6, r8}
0x21f1f89c <mach_msg_trap+8>: ldm r12, {r4, r5, r6}
0x21f1f8a0 <mach_msg_trap+12>: mvn r12, #30 ; 0x1e
0x21f1f8a4 <mach_msg_trap+16>: svc 0x00000080
0x21f1f8a8 <mach_msg_trap+20>: pop {r4, r5, r6, r8}
0x21f1f8ac <mach_msg_trap+24>: bx lr
End of assembler dump.
(gdb) break -[AntiPiracyViewController isJailbroken]
Breakpoint 1 at 0x73314
(gdb) C
Continuing.
warning: Unrecognized osabi 0 in arm_set_osabi_from_host_info
Breakpoint 1, 0x00073314 in -[AntiPiracyViewController isJailbroken] ()
(gdb) disassemble
Dump of assembler code for function -[AntiPiracyViewController isJailbroken]:
0x00073314 <-[AntiPiracyViewController isJailbroken]+0>: vqrshl.s8 <illegal reg q13.5>, q8, <illegal reg q8.5>
0x00073318 <-[AntiPiracyViewController isJailbroken]+4>: svcge 0x00032148
0x0007331c <-[AntiPiracyViewController isJailbroken]+8>: smlabteq r0, r0, r2, pc
0x00073320 <-[AntiPiracyViewController isJailbroken]+12>: adccs pc, r2, r1, asr #4
0x00073324 <-[AntiPiracyViewController isJailbroken]+16>: andeq pc, r0, r0, asr #5
0x00073328 <-[AntiPiracyViewController isJailbroken]+20>: ldrbtmi r4, [r8], #-1145
0x0007332c <-[AntiPiracyViewController isJailbroken]+24>: stmdavs r0, {r0, r3, r11, sp, lr}
0x00073330 <-[AntiPiracyViewController isJailbroken]+28>: cdp 0, 5, cr15, cr8, cr0, {0}
0x00073334 <-[AntiPiracyViewController isJailbroken]+32>: msrcs SPSR_f, r1, asr #4
0x00073338 <-[AntiPiracyViewController isJailbroken]+36>: smlabteq r0, r0, r2, pc
0x0007333c <-[AntiPiracyViewController isJailbroken]+40>: rscscs pc, r2, #268435460 ; 0x10000004
0x00073340 <-[AntiPiracyViewController isJailbroken]+44>: vmvn.i32 q10, #589824 ; 0x00090000
0x00073344 <-[AntiPiracyViewController isJailbroken]+48>: ldrbtmi r0, [r10], #-512
0x00073348 <-[AntiPiracyViewController isJailbroken]+52>: undefined instruction 0xf0006809
0x0007334c <-[AntiPiracyViewController isJailbroken]+56>: vceq.f32 q15, <illegal reg q0.5>, q6
0x00073350 <-[AntiPiracyViewController isJailbroken]+60>: vorr.i32 q9, #0 ; 0x00000000
0x00073354 <-[AntiPiracyViewController isJailbroken]+64>: strmi r0, [r4], -r0, lsl #2
0x00073358 <-[AntiPiracyViewController isJailbroken]+68>: rsbcs pc, lr, r1, asr #4
0x0007335c <-[AntiPiracyViewController isJailbroken]+72>: vmvn.i32 q10, #589824 ; 0x00090000
0x00073360 <-[AntiPiracyViewController isJailbroken]+76>: ldrbtmi r0, [r8]
0x00073364 <-[AntiPiracyViewController isJailbroken]+80>: stmdavs r0, {r0, r3, r11, sp, lr}
0x00073368 <-[AntiPiracyViewController isJailbroken]+84>: cdp 0, 3, cr15, cr12, cr0, {0}
0x0007336c <-[AntiPiracyViewController isJailbroken]+88>: undefined instruction 0xf000463f
0x00073370 <-[AntiPiracyViewController isJailbroken]+92>: strtmi lr, [r2], -r2, asr #28
0x00073374 <-[AntiPiracyViewController isJailbroken]+96>: vmax.s8 d20, d1, d5
0x00073378 <-[AntiPiracyViewController isJailbroken]+100>: vmvn.i32 d18, #2 ; 0x00000002
0x0007337c <-[AntiPiracyViewController isJailbroken]+104>: ldrbtmi r0, [r8]
0x00073380 <-[AntiPiracyViewController isJailbroken]+108>: strtmi r6, [r8], -r1, lsl #16
0x00073384 <-[AntiPiracyViewController isJailbroken]+112>: cdp 0, 2, cr15, cr14, cr0, {0}
0x00073388 <-[AntiPiracyViewController isJailbroken]+116>: strtmi r4, [r8], -r6, lsl #12
0x0007338c <-[AntiPiracyViewController isJailbroken]+120>: cdp 0, 3, cr15, cr0, cr0, {0}
0x00073390 <-[AntiPiracyViewController isJailbroken]+124>: undefined instruction 0xf0004620
0x00073394 <-[AntiPiracyViewController isJailbroken]+128>: rscslt lr, r0, #736 ; 0x2e0
0x00073398 <-[AntiPiracyViewController isJailbroken]+132>: svclt 0x00182800
0x0007339c <-[AntiPiracyViewController isJailbroken]+136>: ldcllt 0, cr2, [r0, #4]!
End of assembler dump.

Assembly, global variable

I have the following source code:
const ClassTwo g_classTwo;
void ClassOne::first()
{
g_classTwo.doSomething(1);
}
void ClassOne::second()
{
g_classTwo.doSomething(2);
}
Which produces the following objdump:
void ClassOne::first()
{
1089c50: e1a0c00d mov ip, sp
1089c54: e92dd800 push {fp, ip, lr, pc}
1089c58: e24cb004 sub fp, ip, #4
1089c5c: e24dd008 sub sp, sp, #8
1089c60: e50b0010 str r0, [fp, #-16]
g_classTwo.doSomething(1);
1089c64: e59f3014 ldr r3, [pc, #20] ; 1089c80 <ClassOne::first()+0x30>
1089c68: e08f3003 add r3, pc, r3
1089c6c: e1a00003 mov r0, r3
1089c70: e3a01001 mov r1, #1
1089c74: ebffffe2 bl 1089c04 <ClassTwo::doSomething(int) const>
}
1089c78: e24bd00c sub sp, fp, #12
1089c7c: e89da800 ldm sp, {fp, sp, pc}
1089c80: 060cd35c .word 0x060cd35c
01089c84 <ClassOne::second()>:
void ClassOne::second()
{
1089c84: e1a0c00d mov ip, sp
1089c88: e92dd800 push {fp, ip, lr, pc}
1089c8c: e24cb004 sub fp, ip, #4
1089c90: e24dd008 sub sp, sp, #8
1089c94: e50b0010 str r0, [fp, #-16]
g_classTwo.doSomething(2);
1089c98: e59f3014 ldr r3, [pc, #20] ; 1089cb4 <ClassOne::second()+0x30>
1089c9c: e08f3003 add r3, pc, r3
1089ca0: e1a00003 mov r0, r3
1089ca4: e3a01002 mov r1, #2
1089ca8: ebffffd5 bl 1089c04 <ClassTwo::doSomething(int) const>
}
1089cac: e24bd00c sub sp, fp, #12
1089cb0: e89da800 ldm sp, {fp, sp, pc}
1089cb4: 060cd328 .word 0x060cd328
Both methods are loading the address of g_classTwo with a pc relative offset: ldr r3, [pc, #20], which translates to 0x060cd35c and 0x060cd328 for the first and second method respectively.
Why are the addresses different even though they are both addressing the same global variable?
How do those addresses relate to the nm output for the same symbol: 07156fcc b g_classTwo?
In ClassOne::first() you have:
1089c64: e59f3014 ldr r3, [pc, #20] ; 1089c80 <ClassOne::first()+0x30>
1089c68: e08f3003 add r3, pc, r3
1089c6c: e1a00003 mov r0, r3
...
1089c80: 060cd35c .word 0x060cd35c
In ClassOne::second() you have:
1089c98: e59f3014 ldr r3, [pc, #20] ; 1089cb4 <ClassOne::second()+0x30>
1089c9c: e08f3003 add r3, pc, r3
1089ca0: e1a00003 mov r0, r3
...
1089cb4: 060cd328 .word 0x060cd328
In both, r0 is the this pointer (g_classTwo). As you can see, after loading an address from the literal pool into r3 it is summed to pc to get r0.
In ClassOne::first(), you get r0 = pc + r3 = 0x01089c70 + 0x060cd35c = 0x07156fcc.
In ClassOne::second(), you get r0 = pc + r3 = 0x01089ca4 + 0x060cd328 = 0x07156fcc.
So for both the this pointer is 0x07156fcc, which is the address of g_classTwo.

GCC generates different code depending on array index value

This code (arm):
void blinkRed(void)
{
for(;;)
{
bb[0x0008646B] ^= 1;
sys.Delay_ms(14);
}
}
...is compiled to folowing asm-code:
08000470: ldr r4, [pc, #20] ; (0x8000488 <blinkRed()+24>) // r4 = 0x422191ac
08000472: ldr r6, [pc, #24] ; (0x800048c <blinkRed()+28>)
08000474: movs r5, #14
08000476: ldr r3, [r4, #0]
08000478: eor.w r3, r3, #1
0800047c: str r3, [r4, #0]
0800047e: mov r0, r6
08000480: mov r1, r5
08000482: bl 0x80001ac <CSTM32F100C6::Delay_ms(unsigned int)>
08000486: b.n 0x8000476 <blinkRed()+6>
It is ok.
But, if I just change array index (-0x400)....
void blinkRed(void)
{
for(;;)
{
bb[0x0008606B] ^= 1;
sys.Delay_ms(14);
}
}
...I've got not so optimized code:
08000470: ldr r4, [pc, #24] ; (0x800048c <blinkRed()+28>) // r4 = 0x42218000
08000472: ldr r6, [pc, #28] ; (0x8000490 <blinkRed()+32>)
08000474: movs r5, #14
08000476: ldr.w r3, [r4, #428] ; 0x1ac
0800047a: eor.w r3, r3, #1
0800047e: str.w r3, [r4, #428] ; 0x1ac
08000482: mov r0, r6
08000484: mov r1, r5
08000486: bl 0x80001ac <CSTM32F100C6::Delay_ms(unsigned int)>
0800048a: b.n 0x8000476 <blinkRed()+6>
The difference is that in the first case r4 is loaded with target address immediately (0x422191ac) and then access to memory is performed with 2-byte instructions, but in the second case r4 is loaded with some intermediate
address (0x42218000) and then access to memory is performed with 4-bytes instruction with offset (+0x1ac) to target address (0x422181ac).
Why compiler does so?
I use:
arm-none-eabi-g++ -mcpu=cortex-m3 -mthumb -g2 -Wall -O1 -std=gnu++14 -fno-exceptions -fno-use-cxa-atexit -fstrict-volatile-bitfields -c -DSTM32F100C6T6B -DSTM32F10X_LD_VL
bb is:
__attribute__ ((section(".bitband"))) volatile u32 bb[0x00800000];
In .ld it is defined as:
in MEMORY section:
BITBAND(rwx): ORIGIN = 0x42000000, LENGTH = 0x02000000
in SECTIONS section:
.bitband (NOLOAD) :
SUBALIGN(0x02000000)
{
KEEP(*(.bitband))
} > BITBAND
I would consider it an artefact/missing optimization opportunity of -O1.
It can be understood in more detail if we look at the code generated with -O- to load bb[...]:
First case:
movw r2, #:lower16:bb
movt r2, #:upper16:bb
movw r3, #37292
movt r3, 33
adds r3, r2, r3
ldr r3, [r3, #0]
Second case:
movw r3, #:lower16:bb
movt r3, #:upper16:bb
add r3, r3, #2195456 ; 0x218000 = 4*0x86000
add r3, r3, #428
ldr r3, [r3, #0]
The code in the second case is better and it can be done this way because the constant can be added with two add instructions (which is not the case if the index is 0x0008646B).
-O1 does only optimizations which are not time consuming. So apparently it merges early the add and the ldr so it misses later the opportunity to load the whole address with one pc relative ldr.
Compile with -O2 (or -fgcse) and the code looks like expected.