I know there is already a lot of topics about this subject, however I didn't find in any of them the explanation to how it is possible to have one definition of an inline function per translation unit. I know that inline functions have external linkage, so when I call one inline function in different translation units they all reference the same function. I also know that in Linux the compiler registers inline function with the "W" symbol, so it is registered as a weak symbol (thanks to it the first definition encountered by the compiler of the inline function is the one considered). However whenever I use nm on the object files created by g++ in Windows there is no weak symbol attached to the function signature. How can the linker know which definition to consider and not to accuse of multiple definitions? (I don't show the linker's work but I tested it).
Here is some example code:
// main1.cpp
#include "main.hpp"
/* inline accepts one definition in each TU's */
inline void Test::hello_world()
{
std::cout << "Hello world" << std::endl;
}
/*void Test::hello_world()
{
std::cout << "Hello world" << std::endl;
}*/
extern void return_hello_world();
int main()
{
Test t;
t.hello_world();
return_hello_world();
return 0;
}
// main2.cpp
#include "main.hpp"
/* inline accepts one definition in each TU's */
inline void Test::hello_world()
{
std::cout << "Hello world" << std::endl;
}
/* void Test::hello_world()
{
std::cout << "Hello world" << std::endl;
} */
void return_hello_world()
{
Test t;
t.hello_world();
}
// main.hpp
#ifndef TEST
#define TEST
#include <iostream>
class Test
{
public:
void hello_world();
};
#endif
The results I receive are:
Windows:
Experimentation4> g++ -g -O0 -c main1.cpp main2.cpp
Experimentation4> nm main1.o
0000000000000000 b .bss
0000000000000000 d .ctors
0000000000000000 d .data
0000000000000000 N .debug_abbrev
0000000000000000 N .debug_aranges
0000000000000000 N .debug_frame
0000000000000000 N .debug_info
0000000000000000 N .debug_line
0000000000000000 N .debug_ranges
0000000000000000 N .debug_str
0000000000000000 p .pdata
0000000000000000 p .pdata$_ZN4Test11hello_worldEv
0000000000000000 r .rdata
0000000000000000 r .rdata$.refptr._ZSt4cout
0000000000000000 r .rdata$.refptr._ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
0000000000000000 r .rdata$zzz
0000000000000000 R .refptr._ZSt4cout
0000000000000000 R .refptr._ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
0000000000000000 t .text
0000000000000000 t .text$_ZN4Test11hello_worldEv
0000000000000000 r .xdata
0000000000000000 r .xdata$_ZN4Test11hello_worldEv
U __main
0000000000000029 t __tcf_0
0000000000000080 t _GLOBAL__sub_I_main
U _Z18return_hello_worldv
0000000000000044 t _Z41__static_initialization_and_destruction_0ii
0000000000000000 T _ZN4Test11hello_worldEv
U _ZNSolsEPFRSoS_E
U _ZNSt8ios_base4InitC1Ev
U _ZNSt8ios_base4InitD1Ev
U _ZSt4cout
U _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
0000000000000000 r _ZStL19piecewise_construct
0000000000000000 b _ZStL8__ioinit
U _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
U atexit
0000000000000000 T main
Linux:
Experimentation2$ g++ -O0 -g -c main1.cpp main2.cpp
Experimentation2$ nm main1.o
U __cxa_atexit
U __dso_handle
0000000000000081 t _GLOBAL__sub_I_main
0000000000000000 T main
U __stack_chk_fail
U _Z18return_hello_worldv
0000000000000043 t _Z41__static_initialization_and_destruction_0ii
0000000000000000 W _ZN4Test11hello_worldEv
U _ZNSolsEPFRSoS_E
U _ZNSt8ios_base4InitC1Ev
U _ZNSt8ios_base4InitD1Ev
U _ZSt4cout
U _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
0000000000000000 b _ZStL8__ioinit
U _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
I tried to test AWS lambda function with golang and cgo. I compiled everything on Amazon Linux AMI 2018.03.0 (HVM) and attached all necessary libraries. When I run it looks like a lambda function is not deterministic. Sometimes it's working properly and sometimes I get an exception attached below. It looks like lambda has two different environments (architectures). When it's working properly, lambda response properly to every query until will die or I will deploy a new version. The same binary working now and not working in half an hour.
SIGILL: illegal instruction
PC=0xb94571 m=10 sigcode=2
goroutine 0 [idle]:
runtime: unknown pc 0xb94571
stack: frame={sp:0x7f8957ffcef0, fp:0x0} stack=[0x7f89577fd830,0x7f8957ffd430)
00007f8957ffcdf0: 00007f8950000078 00007f8968e64de2
00007f8957ffce00: 0000000800000002 0000000000001000
00007f8957ffce10: 3ff0000000000000 0000000000000000
00007f8957ffce20: 0000000100000000 0000000000000001
00007f8957ffce30: 0000000100000002 0000000000000100
00007f8957ffce40: 0000000000000000 00000000014351f8
00007f8957ffce50: 00000000000000fa 00007f8950000078
00007f8957ffce60: 00007f8950000020 000000000002d97e
00007f8957ffce70: 0000000000002710 000000000002d990
00007f8957ffce80: 00007f8950000078 00007f8968e65ba5
00007f8957ffce90: 000000000002d97e 00007f896c05b5c4
00007f8957ffcea0: 0000000000000b66 00007f89693c3678
00007f8957ffceb0: 0000000000000005 000000000000000b
00007f8957ffcec0: 000000000f71e367 00007f896c05bcfb
00007f8957ffced0: 0000007c0000009b 00007f8957ffcf10
00007f8957ffcee0: 00007f89693c30ec 00007f89693c3c48
00007f8957ffcef0: <00007f8957ffd020 00007f8957ffd010
00007f8957ffcf00: d04e2b6900000027 eeba4bc401709342
00007f8957ffcf10: 0000000000000000 3fdfdf19b3eb7010
00007f8957ffcf20: 00000000000005e2 00007f8964663010
00007f8957ffcf30: 00000000000000f8 00000000000001f6
00007f8957ffcf40: 00000000000001f3 00000000000002ee
00007f8957ffcf50: 0000000000000000 00007f896c107010
00007f8957ffcf60: 0000000000000000 bfdfdf19b3eb7010
00007f8957ffcf70: 00007f896c26aca8 00007f896c2754a8
00007f8957ffcf80: 00007f896c275150 00007f896c05bfdf
00007f8957ffcf90: 00000000000000f9 00007f896c26aca8
00007f8957ffcfa0: 0000000000000005 0000000000000000
00007f8957ffcfb0: 00007f8900000001 00007f896c275150
00007f8957ffcfc0: 00007f8957ffd0f0 00007f8957ffd0e0
00007f8957ffcfd0: 00007f8900000029 0000000000b7652d
00007f8957ffcfe0: 0000000000000000 00007f896c2754a8
runtime: unknown pc 0xb94571
stack: frame={sp:0x7f8957ffcef0, fp:0x0} stack=[0x7f89577fd830,0x7f8957ffd430)
00007f8957ffcdf0: 00007f8950000078 00007f8968e64de2
00007f8957ffce00: 0000000800000002 0000000000001000
00007f8957ffce10: 3ff0000000000000 0000000000000000
00007f8957ffce20: 0000000100000000 0000000000000001
00007f8957ffce30: 0000000100000002 0000000000000100
00007f8957ffce40: 0000000000000000 00000000014351f8
00007f8957ffce50: 00000000000000fa 00007f8950000078
00007f8957ffce60: 00007f8950000020 000000000002d97e
00007f8957ffce70: 0000000000002710 000000000002d990
00007f8957ffce80: 00007f8950000078 00007f8968e65ba5
00007f8957ffce90: 000000000002d97e 00007f896c05b5c4
00007f8957ffcea0: 0000000000000b66 00007f89693c3678
00007f8957ffceb0: 0000000000000005 000000000000000b
00007f8957ffcec0: 000000000f71e367 00007f896c05bcfb
00007f8957ffced0: 0000007c0000009b 00007f8957ffcf10
00007f8957ffcee0: 00007f89693c30ec 00007f89693c3c48
00007f8957ffcef0: <00007f8957ffd020 00007f8957ffd010
00007f8957ffcf00: d04e2b6900000027 eeba4bc401709342
00007f8957ffcf10: 0000000000000000 3fdfdf19b3eb7010
00007f8957ffcf20: 00000000000005e2 00007f8964663010
00007f8957ffcf30: 00000000000000f8 00000000000001f6
00007f8957ffcf40: 00000000000001f3 00000000000002ee
00007f8957ffcf50: 0000000000000000 00007f896c107010
00007f8957ffcf60: 0000000000000000 bfdfdf19b3eb7010
00007f8957ffcf70: 00007f896c26aca8 00007f896c2754a8
00007f8957ffcf80: 00007f896c275150 00007f896c05bfdf
00007f8957ffcf90: 00000000000000f9 00007f896c26aca8
00007f8957ffcfa0: 0000000000000005 0000000000000000
00007f8957ffcfb0: 00007f8900000001 00007f896c275150
00007f8957ffcfc0: 00007f8957ffd0f0 00007f8957ffd0e0
00007f8957ffcfd0: 00007f8900000029 0000000000b7652d
00007f8957ffcfe0: 0000000000000000 00007f896c2754a8
goroutine 12 [syscall]:
runtime.cgocall(0xb20bc0, 0xc420144af0, 0x240)
/usr/local/go/src/runtime/cgocall.go:128 +0x64 fp=0xc420144ab0 sp=0xc420144a78 pc=0x4b7c84
github.com/app/vectorization._Cfunc_GetFaceRectangles(0xc420413680, 0x7f89500008c0, 0xc, 0x0)
_cgo_gotypes.go:99 +0x4e fp=0xc420144af0 sp=0xc420144ab0 pc=0xb1df8e
github.com/app/vectorization.GetFaceBoundingRectangles(0xc420402f40, 0x32, 0xc40000000c, 0x0, 0x0, 0x0, 0x0, 0x0)
...
I am dealing with a pesky compilation/linker error involving C++ homebrew compiled for the Game Boy Advance. While my library, libsaturn, is getting compiled fine with the proper C++ symbols, the compiler seems to be discarding the namespace qualifiers given in the header files for the library and just placing in undefined symbols as if they had C linkage. I have inspected the object code files using nm and confirmed that this is the case.
This is the full repository for the program I am trying to build, although I have provided snippets below. Compiling the repository requires a devkitARM installation and Node.js; I have only ensured things work on Linux sadly.
I originally encountered this problem with devkitARM r46, but unfortunately dialing things back to r45 does not alleviate the issue. I have had the code combed over by several peers of mine and they have found nothing out-of-the-ordinary with my config.
This is the code I am dealing with. I’m afraid it isn’t going to be all that useful considering I have very little idea where the source of the issue lies, but regardless…
src/mainloop.cc
bool saturn::mainloop( )
{
return false;
}
include/saturn/mainloop.hh
namespace saturn
{
bool mainloop( );
}
test/src/main.cc
int main( )
{
while(!saturn::mainloop( ))
{
// Do something here
}
return 0;
}
alex#henen-nesw saturn $ arm-none-eabi-nm -pC /tmp/saturn-buildtool/code/test+src+main.cc.o
00000000 T main
U init
U halt
U mainloop
alex#henen-nesw saturn $ arm-none-eabi-nm -pC libsaturn.a
src+bios.cc.o:
00000000 T bios::halt()
U _sat__bios_halt
00000014 T bios::softReset()
U _sat__bios_soft_reset
00000028 T bios::waitInterrupt(unsigned long, unsigned long)
U _sat__bios_intr_wait
0000004c T bios::waitVblank()
U _sat__bios_vblank_intr_wait
src+bootsector.s.o:
00000000 T __start
000000c0 T _sat__rom_start
000000ec T _sat__irq_handler
U main
src+error.cc.o:
00000000 r kDispcntBgMode0
00000002 r kDispcntBgMode1
00000004 r kDispcntBgMode2
00000006 r kDispcntBgMode3
00000008 r kDispcntBgMode4
0000000a r kDispcntBgMode5
0000000c r kDispcntCgbMode
0000000e r kDispcntFrameSel
00000010 r kDispcntHblankIntv
00000012 r kDispcntObjVramDim
00000014 r kDispcntForceBlank
00000016 r kDispcntShowBg0
00000018 r kDispcntShowBg1
0000001a r kDispcntShowBg2
0000001c r kDispcntShowBg3
0000001e r kDispcntShowObj
00000020 r kDispcntShowWin0
00000022 r kDispcntShowWin1
00000024 r kDispcntShowObjWin
00000028 r ioDispcnt
0000002c r ioGreenswap
00000030 r kDispstatVblank
00000032 r kDispstatHblank
00000034 r kDispstatVcounter
00000036 r kDispstatVblankIrq
00000038 r kDispstatHblankIrq
0000003a r kDispstatVcounterIrq
0000003c r ioDispstat
00000040 r ioVcount
00000044 r kBgcntMosaic
00000046 r kBgcntPalMode4
00000048 r kBgcntPalMode8
0000004a r kBgcntOverflow
0000004c r ioBg0cnt
00000050 r ioBg1cnt
00000054 r ioBg2cnt
00000058 r ioBg3cnt
0000005c r ioBg0hofs
00000060 r ioBg0vofs
00000064 r ioBg1hofs
00000068 r ioBg1vofs
0000006c r ioBg2hofs
00000070 r ioBg2vofs
00000074 r ioBg3hofs
00000078 r ioBg3vofs
0000007c r ioBg2pa
00000080 r ioBg2pb
00000084 r ioBg2pc
00000088 r ioBg2pd
0000008c r ioBg2xL
00000090 r ioBg2xH
00000094 r ioBg2yL
00000098 r ioBg2yH
0000009c r ioBg3pa
000000a0 r ioBg3pb
000000a4 r ioBg3pc
000000a8 r ioBg3pd
000000ac r ioBg3xL
000000b0 r ioBg3xH
000000b4 r ioBg3yL
000000b8 r ioBg3yH
000000bc r ioWin0H
000000c0 r ioWin1H
000000c4 r ioWin0V
000000c8 r ioWin1V
000000cc r ioWinIn
000000d0 r ioWinOut
000000d4 r ioMosaic
000000d8 r ioBldcnt
000000dc r ioBldalpha
000000e0 r ioBldy
000000e4 r segBios
000000e8 r segEwram
000000ec r segIwram
000000f0 r segIo
000000f4 r segPal
00000000 b segPalBg
000000f8 r segPalObj
000000fc r segVram
00000004 b segVramBg
00000100 r segVramObj
00000104 r segOam
00000108 r segRom
0000010c r segSram
00000110 r ioDma0sad
00000114 r ioDma0dad
00000118 r ioDma0cntL
0000011c r ioDma0cntH
00000120 r ioDma1sad
00000124 r ioDma1dad
00000128 r ioDma1cntL
0000012c r ioDma1cntH
00000130 r ioDma2sad
00000134 r ioDma2dad
00000138 r ioDma2cntL
0000013c r ioDma2cntH
00000140 r ioDma3sad
00000144 r ioDma3dad
00000148 r ioDma3cntL
0000014c r ioDma3cntH
000000a0 t __static_initialization_and_destruction_0(int, int)
000000e4 t _GLOBAL__sub_I_error.cc
00000000 T saturn::error(saturn::Error)
src+init.cc.o:
00000000 T saturn::init()
U saturn::error(saturn::Error)
src+lowbios.s.o:
00000000 T _sat__bios_soft_reset
00000004 T _sat__bios_register_ram_reset
00000008 T _sat__bios_halt
0000000c T _sat__bios_stop
00000010 T _sat__bios_intr_wait
00000014 T _sat__bios_vblank_intr_wait
00000018 T _sat__bios_div
0000001c T _sat__bios_div_arm
00000020 T _sat__bios_sqrt
00000024 T _sat__bios_arc_tan
00000028 T _sat__bios_arc_tan2
0000002c T _sat__bios_cpu_set
00000030 T _sat__bios_cpu_fast_set
00000034 T _sat__bios_bg_affine_set
00000038 T _sat__bios_obj_affine_set
0000003c T _sat__bios_bit_unpack
00000040 T _sat__bios_lzss_decomp_wram
00000044 T _sat__bios_lzss_decomp_vram
00000048 T _sat__bios_huff_decomp
0000004c T _sat__bios_rl_decomp_wram
00000050 T _sat__bios_rl_decomp_vram
00000054 T _sat__bios_diff_8bit_unfilter_wram
00000058 T _sat__bios_diff_8bit_unfilter_vram
0000005c T _sat__bios_diff_16bit_unfilter
00000060 T _sat__bios_sound_bias
00000064 T _sat__bios_midi_key_to_freq
00000068 T _sat__bios_multi_boot
src+mainloop.cc.o:
00000000 T saturn::mainloop()
src+math.cc.o:
00000000 T saturn::divide(long, long)
U _sat__bios_div
00000030 T saturn::sqroot(unsigned long)
U _sat__bios_sqrt
00000050 T saturn::modulus(long, long)
src+memory.cc.o:
Thanks!
Here's a simplified build script that links the test program correctly: https://gist.github.com/nnevatie/9fe11e3933ed3f51e5344639c6881bd5
I try to compile QDP++ C++ library using the Intel 17 compiler and Intel MPI on an Intel Xeon supercomputer. The dependency QMP C library has already been compiled with that compiler (mpiicc), there were no compilation errors.
When trying to configure QDP++, I get this error from the Intel compiler:
~/local-jureca/lib/libqmp.a(QMP_init.o): In function `QMP_comm_get_allocated':
QMP_init.c:(.text+0x55): undefined reference to `QMP_abort_mpi(int)'
And this from GCC 5.4:
~/local-jureca/lib/libqmp.a(QMP_init.o):QMP_init.c:function QMP_comm_get_allocated: error: undefined reference to 'QMP_abort_mpi(int)'
So it is an actual problem and not just something that the Intel C++ with MPI compiler cannot do.
The output of nm ~/local-jureca/lib/libqmp.a contains this stanza where QMP_comm_get_allocated resides in.
QMP_init.o:
U atol
U __ctype_b_loc
U exit
U __gcc_personality_v0
U __gxx_personality_v0
U malloc
00000000000015f0 T QMP_abort
00000000000015c0 T QMP_abort_string
0000000000000010 D QMP_allocated_comm
0000000000000018 d QMP_allocated_comm_s
0000000000000000 D QMP_args
0000000000000000 b QMP_args_s
U QMP_barrier
U QMP_comm_declare_logical_topology_map
0000000000000000 T QMP_comm_get_allocated
0000000000001760 T QMP_comm_get_default
00000000000000e0 T QMP_comm_get_job
U QMP_comm_get_logical_coordinates
U QMP_comm_logical_topology_is_declared
0000000000000070 T QMP_comm_set_allocated
00000000000016f0 T QMP_comm_set_default
0000000000000150 T QMP_comm_set_job
U QMP_comm_split
0000000000000048 D QMP_default_comm
U QMP_error
00000000000001c0 T QMP_finalize_msg_passing
U QMP_get_allocated_dimensions
U QMP_get_allocated_number_of_dimensions
U QMP_get_msg_passing_type
0000000000001620 T QMP_init_msg_passing
0000000000000040 D QMP_job_comm
0000000000000008 D QMP_machine
0000000000000048 b QMP_machine_s
0000000000000080 B QMP_stack_level
U __svml_idiv4
0000000000000000 r __$U0
0000000000000017 r __$U1
000000000000002e r __$U2
000000000000003f r __$U3
0000000000000091 r __$U4
000000000000007c r __$U5
0000000000000050 r __$U6
000000000000005a r __$U7
0000000000000067 r __$U8
U _Unwind_Resume
0000000000000500 t _Z12process_argsPiPPPc
000000000000002c r _Z12process_argsPiPPPc$$LSDA
U _Z13QMP_abort_mpii
U _Z20QMP_init_machine_mpiPiPPPc16QMP_thread_levelPS3_
U _Z28QMP_finalize_msg_passing_mpiv
00000000000001e0 t _Z9get_colorv
0000000000000000 r _Z9get_colorv$$LSDA
Another stanza contains the function QMP_abort_mpi which supposedly cannot be found:
QMP_init_mpi.o:
U malloc
U MPI_Abort
U MPI_Comm_dup
U MPI_Comm_rank
U MPI_Comm_size
U MPI_Finalize
U MPI_Get_processor_name
U MPI_Init_thread
0000000000000010 T QMP_abort_mpi
U QMP_abort_string
U QMP_allocated_comm
0000000000000000 T QMP_finalize_msg_passing_mpi
0000000000000020 T QMP_init_machine_mpi
U QMP_machine
The relevant output of `configure is:
configure: Parscalar build! Checking for QMP
checking for qmp-config... ~/local-jureca/bin/qmp-config
configure: Found QMP configuration program ~/local-jureca/bin/qmp-config
configure: QMP compile flags: -I~/local-jureca/include
configure: QMP linking flags: -L~/local-jureca/lib
configure: QMP libraries flags: -lqmp
checking if we can compile/link of a simple QMP C++ program... no
configure: error: Cannot compile/link a basic QMP C++ program!
Check QMP_CFLAGS, QMP_LDFLAGS, QMP_LIBS.
So it seems to find the library just fine.
In the config.log a little larger chunk is this:
configure:4227: /usr/local/software/jureca/Stages/2016b/software/impi/2017.0.098-iccifort-2017.0.098-GCC-5.4.0/bin64/mpiicpc -o conftest -O2 -Wall -fopenmp --std=c++11 -I~/local-jureca/include -L~/local-jureca/lib conftest.cpp -lqmp >&5
~/local-jureca/lib/libqmp.a(QMP_init.o): In function `QMP_comm_get_allocated':
QMP_init.c:(.text+0x55): undefined reference to `QMP_abort_mpi(int)'
~/local-jureca/lib/libqmp.a(QMP_init.o): In function `QMP_comm_set_allocated':
QMP_init.c:(.text+0xc7): undefined reference to `QMP_abort_mpi(int)'
~/local-jureca/lib/libqmp.a(QMP_init.o): In function `QMP_comm_get_job':
QMP_init.c:(.text+0x135): undefined reference to `QMP_abort_mpi(int)'
~/local-jureca/lib/libqmp.a(QMP_init.o): In function `QMP_comm_set_job':
QMP_init.c:(.text+0x1a7): undefined reference to `QMP_abort_mpi(int)'
~/local-jureca/lib/libqmp.a(QMP_init.o): In function `QMP_finalize_msg_passing':
QMP_init.c:(.text+0x1cf): undefined reference to `QMP_finalize_msg_passing_mpi()'
~/local-jureca/lib/libqmp.a(QMP_init.o): In function `get_color()':
QMP_init.c:(.text+0x29b): undefined reference to `QMP_abort_mpi(int)'
QMP_init.c:(.text+0x45c): undefined reference to `QMP_abort_mpi(int)'
QMP_init.c:(.text+0x4be): undefined reference to `QMP_abort_mpi(int)'
configure:4227: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "qdp++"
| #define PACKAGE_TARNAME "qdp--"
| #define PACKAGE_VERSION "1.44.0"
| #define PACKAGE_STRING "qdp++ 1.44.0"
| #define PACKAGE_BUGREPORT "edwards#jlab.org"
| #define PACKAGE_URL ""
| #define PACKAGE "qdp--"
| #define VERSION "1.44.0"
| #define QDP_ND 4
| #define QDP_NC 3
| #define QDP_NS 4
| #define QDP_AC_ALIGNMENT_SIZE 16
| #define QDP_USE_GENERIC_OPTS 1
| #define QDP_USE_BLUEGENEL 1
| #define BASE_PRECISION 64
| #define QDP_USE_CB2_LAYOUT 1
| #define ARCH_PARSCALAR 1
| /* end confdefs.h. */
| #include "qmp.h"
| int
| main ()
| {
|
| int argc ; char **argv ;
| QMP_thread_level_t prv;
| ;
| QMP_init_msg_passing(&argc, &argv, QMP_THREAD_SINGLE, &prv) ;
| ;
| QMP_finalize_msg_passing() ;
|
| ;
| return 0;
| }
configure:4253: checking if we can compile/link of a simple QMP C++ program
configure:4261: result: no
configure:4263: error: Cannot compile/link a basic QMP C++ program!
Check QMP_CFLAGS, QMP_LDFLAGS, QMP_LIBS.
I have replaced the absolute path to my home directory with ~ because the actual path is not interesting.
What happens here? And how can I go about fixing it?
Update 1
This is part of the make output for QMP. It seems to be compiled with mpiicc which is a C (not C++) compiler:
depbase=`echo mpi/QMP_topology_mpi.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
/usr/local/software/jureca/Stages/2016b/software/impi/2017.0.098-iccifort-2017.0.098-GCC-5.4.0/bin64/mpiicc -DHAVE_CONFIG_H -I. -I../include -I../include -O2 -Wall -fopenmp -MT mpi/QMP_topology_mpi.o -MD -MP -MF $depbase.Tpo -c -o mpi/QMP_topology_mpi.o mpi/QMP_topology_mpi.c &&\
mv -f $depbase.Tpo $depbase.Po
And then the static library is linked together from the various units:
rm -f libqmp.a
ar cru libqmp.a QMP_comm.o QMP_error.o QMP_grid.o QMP_init.o QMP_machine.o QMP_mem.o QMP_split.o QMP_topology.o QMP_util.o mpi/QMP_comm_mpi.o mpi/QMP_error_mpi.o mpi/QMP_init_mpi.o mpi/QMP_mem_mpi.o mpi/QMP_split_mpi.o mpi/QMP_topology_mpi.o
Using Ulrich Drepper's relinfo.pl script, one can easily count the number of relocations of a DSO, but it doesn't work on .o files.
Say I have a large shared library and I'm not happy about the number of its relocations. is there a way to find out where they come from (symbol, or at least .o), to check whether they're of the easily fixable type (e.g.: const char * str = "Hello World";' -> const char str[] = "Hello World";)?
Short answer: Use objdump or readelf instead.
Long answer: Let's look at an actual example case, example.c:
#include <stdio.h>
static const char global1[] = "static const char []";
static const char *global2 = "static const char *";
static const char *const global3 = "static const char *const";
const char global4[] = "const char []";
const char *global5 = "const char *";
const char *const global6 = "const char *const";
char global7[] = "char []";
char *global8 = "char *";
char *const global9 = "char *const";
int main(void)
{
static const char local1[] = "static const char []";
static const char *local2 = "static const char *";
static const char *const local3 = "static const char *const";
const char local4[] = "const char []";
const char *local5 = "const char *";
const char *const local6 = "const char *const";
char local7[] = "char []";
char *local8 = "char *";
char *const local9 = "char *const";
printf("Global:\n");
printf("\t%s\n", global1);
printf("\t%s\n", global2);
printf("\t%s\n", global3);
printf("\t%s\n", global4);
printf("\t%s\n", global5);
printf("\t%s\n", global6);
printf("\t%s\n", global7);
printf("\t%s\n", global8);
printf("\t%s\n", global9);
printf("\n");
printf("Local:\n");
printf("\t%s\n", local1);
printf("\t%s\n", local2);
printf("\t%s\n", local3);
printf("\t%s\n", local4);
printf("\t%s\n", local5);
printf("\t%s\n", local6);
printf("\t%s\n", local7);
printf("\t%s\n", local8);
printf("\t%s\n", local9);
return 0;
}
You can compile it to an object file using e.g.
gcc -W -Wall -c example.c
and to an executable using
gcc -W -Wall example.c -o example
You can use objdump -tr example.o to dump the symbol and relocation information for the (non-dynamic) object file, or objdump -TtRr example to dump the same for the executable file (and dynamic object files). Using
objdump -t example.o
on x86-64 I get
example.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 example.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .rodata 0000000000000000 .rodata
0000000000000000 l O .rodata 0000000000000015 global1
0000000000000000 l O .data 0000000000000008 global2
0000000000000048 l O .rodata 0000000000000008 global3
00000000000000c0 l O .rodata 0000000000000015 local1.2053
0000000000000020 l O .data 0000000000000008 local2.2054
00000000000000d8 l O .rodata 0000000000000008 local3.2055
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000050 g O .rodata 000000000000000e global4
0000000000000008 g O .data 0000000000000008 global5
0000000000000080 g O .rodata 0000000000000008 global6
0000000000000010 g O .data 0000000000000008 global7
0000000000000018 g O .data 0000000000000008 global8
00000000000000a0 g O .rodata 0000000000000008 global9
0000000000000000 g F .text 000000000000027a main
0000000000000000 *UND* 0000000000000000 puts
0000000000000000 *UND* 0000000000000000 printf
0000000000000000 *UND* 0000000000000000 putchar
0000000000000000 *UND* 0000000000000000 __stack_chk_fail
The output is described in man 1 objdump, under the -t heading. Note that the second "column" is actually fixed-width: seven characters wide, describing the type of the object. The third column is the section name, *UND* for undefined, .text for code, .rodata for read-only (immutable) data, .data for initialized mutable data, and .bss for uninitialized mutable data, and so on.
We can see from the above symbol table that local4, local5, local6, local7, local8, and local9 variables didn't actually get entries in the symbol table at all. This is because they are local to main(). The contents of the strings they refer to are stored in .data or .rodata (or constructed on the fly), depending on what the compiler sees best.
Let's look at the relocation records next. Using
objdump -r example.o
I get
example.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000037 R_X86_64_32S .rodata+0x000000000000005e
0000000000000040 R_X86_64_32S .rodata+0x000000000000006b
0000000000000059 R_X86_64_32S .rodata+0x0000000000000088
0000000000000062 R_X86_64_32S .rodata+0x000000000000008f
0000000000000067 R_X86_64_32 .rodata+0x00000000000000a8
000000000000006c R_X86_64_PC32 puts-0x0000000000000004
0000000000000071 R_X86_64_32 .rodata+0x00000000000000b0
0000000000000076 R_X86_64_32 .rodata
0000000000000083 R_X86_64_PC32 printf-0x0000000000000004
000000000000008a R_X86_64_PC32 .data-0x0000000000000004
000000000000008f R_X86_64_32 .rodata+0x00000000000000b0
000000000000009f R_X86_64_PC32 printf-0x0000000000000004
00000000000000a6 R_X86_64_PC32 .rodata+0x0000000000000044
00000000000000ab R_X86_64_32 .rodata+0x00000000000000b0
00000000000000bb R_X86_64_PC32 printf-0x0000000000000004
00000000000000c0 R_X86_64_32 .rodata+0x00000000000000b0
00000000000000c5 R_X86_64_32 global4
00000000000000d2 R_X86_64_PC32 printf-0x0000000000000004
00000000000000d9 R_X86_64_PC32 global5-0x0000000000000004
00000000000000de R_X86_64_32 .rodata+0x00000000000000b0
00000000000000ee R_X86_64_PC32 printf-0x0000000000000004
00000000000000f5 R_X86_64_PC32 global6-0x0000000000000004
00000000000000fa R_X86_64_32 .rodata+0x00000000000000b0
000000000000010a R_X86_64_PC32 printf-0x0000000000000004
000000000000010f R_X86_64_32 .rodata+0x00000000000000b0
0000000000000114 R_X86_64_32 global7
0000000000000121 R_X86_64_PC32 printf-0x0000000000000004
0000000000000128 R_X86_64_PC32 global8-0x0000000000000004
000000000000012d R_X86_64_32 .rodata+0x00000000000000b0
000000000000013d R_X86_64_PC32 printf-0x0000000000000004
0000000000000144 R_X86_64_PC32 global9-0x0000000000000004
0000000000000149 R_X86_64_32 .rodata+0x00000000000000b0
0000000000000159 R_X86_64_PC32 printf-0x0000000000000004
0000000000000163 R_X86_64_PC32 putchar-0x0000000000000004
0000000000000168 R_X86_64_32 .rodata+0x00000000000000b5
000000000000016d R_X86_64_PC32 puts-0x0000000000000004
0000000000000172 R_X86_64_32 .rodata+0x00000000000000b0
0000000000000177 R_X86_64_32 .rodata+0x00000000000000c0
0000000000000184 R_X86_64_PC32 printf-0x0000000000000004
000000000000018b R_X86_64_PC32 .data+0x000000000000001c
0000000000000190 R_X86_64_32 .rodata+0x00000000000000b0
00000000000001a0 R_X86_64_PC32 printf-0x0000000000000004
00000000000001a7 R_X86_64_PC32 .rodata+0x00000000000000d4
00000000000001ac R_X86_64_32 .rodata+0x00000000000000b0
00000000000001bc R_X86_64_PC32 printf-0x0000000000000004
00000000000001c1 R_X86_64_32 .rodata+0x00000000000000b0
00000000000001d6 R_X86_64_PC32 printf-0x0000000000000004
00000000000001db R_X86_64_32 .rodata+0x00000000000000b0
00000000000001ef R_X86_64_PC32 printf-0x0000000000000004
00000000000001f4 R_X86_64_32 .rodata+0x00000000000000b0
0000000000000209 R_X86_64_PC32 printf-0x0000000000000004
000000000000020e R_X86_64_32 .rodata+0x00000000000000b0
0000000000000223 R_X86_64_PC32 printf-0x0000000000000004
0000000000000228 R_X86_64_32 .rodata+0x00000000000000b0
000000000000023d R_X86_64_PC32 printf-0x0000000000000004
0000000000000242 R_X86_64_32 .rodata+0x00000000000000b0
0000000000000257 R_X86_64_PC32 printf-0x0000000000000004
0000000000000271 R_X86_64_PC32 __stack_chk_fail-0x0000000000000004
RELOCATION RECORDS FOR [.data]:
OFFSET TYPE VALUE
0000000000000000 R_X86_64_64 .rodata+0x0000000000000015
0000000000000008 R_X86_64_64 .rodata+0x000000000000005e
0000000000000018 R_X86_64_64 .rodata+0x0000000000000088
0000000000000020 R_X86_64_64 .rodata+0x0000000000000015
RELOCATION RECORDS FOR [.rodata]:
OFFSET TYPE VALUE
0000000000000048 R_X86_64_64 .rodata+0x0000000000000029
0000000000000080 R_X86_64_64 .rodata+0x000000000000006b
00000000000000a0 R_X86_64_64 .rodata+0x000000000000008f
00000000000000d8 R_X86_64_64 .rodata+0x0000000000000029
RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_PC32 .text
The relocation records are grouped by the section they relocation resides in. Because string contents are in the .data or .rodata sections, we can restrict ourselves to look at the relocations where the VALUE starts with .data or .rodata. (Mutable strings, like char global7[] = "char []";, are stored in .data, and immutable strings and string literals in .rodata.)
If we were to compile the code with debugging symbols enabled, it would be easier to determine which variable was used to refer to which string, but I might just look at the actual contents at each relocation value (target), to see which references to the immutable strings need fixing.
The command combination
objdump -r example.o | awk '($3 ~ /^\..*\+/) { t = $3; sub(/\+/, " ", t); n[t]++ } END { for (r in n) printf "%d %s\n", n[r], r }' | sort -g
will output the number of relocations per target, followed by the target section, followed by the target offset in the section, sorted with the target that occurs most in relocations last. That is, the last lines output above are the ones you need to concentrate on. For me, I get
1 .rodata
1 .rodata 0x0000000000000044
1 .rodata 0x00000000000000a8
1 .rodata 0x00000000000000b5
1 .rodata 0x00000000000000c0
1 .rodata 0x00000000000000d4
2 .rodata 0x0000000000000015
2 .rodata 0x0000000000000029
2 .rodata 0x000000000000005e
2 .rodata 0x000000000000006b
2 .rodata 0x0000000000000088
2 .rodata 0x000000000000008f
18 .rodata 0x00000000000000b0
If I add optimization (gcc -W -Wall -O3 -fomit-frame-pointer -c example.c), the result is
1 .rodata 0x0000000000000020
1 .rodata 0x0000000000000040
1 .rodata.str1.1
1 .rodata.str1.1 0x0000000000000058
2 .rodata.str1.1 0x000000000000000d
2 .rodata.str1.1 0x0000000000000021
2 .rodata.str1.1 0x000000000000005f
2 .rodata.str1.1 0x000000000000006c
3 .rodata.str1.1 0x000000000000003a
3 .rodata.str1.1 0x000000000000004c
18 .rodata.str1.1 0x0000000000000008
which shows that compiler options do have a big effect, but that there is that one target that is anyways used 18 times: section .rodata offset 0xb0 (.rodata.str1.1 offset 0x8 if optimization is enabled at compile time).
That is the `"\t%s\n" string literal.
Modifying the original program into
char *local8 = "char *";
char *const local9 = "char *const";
const char *const fmt = "\t%s\n";
printf("Global:\n");
printf(fmt, global1);
printf(fmt, global2);
and so on, replacing the format string with an immutable string pointer fmt, eliminates those 18 relocations altogether. (You can also use the equivalent const char fmt[] = "\t%s\n";, of course.)
The above analysis indicates that at least with GCC-4.6.3, most of the avoidable relocations are caused by (repeated use of) string literals. Replacing them with an array of const chars (const char fmt[] = "\t%s\n";) or a const pointer to const chars (const char *const fmt = "\t%s\n";) -- both cases putting the contents to .rodata section, read-only, and the pointer/array reference itself is immutable too -- seems an effective and safe strategy to me.
Furthermore, conversion of string literals to immutable string pointers or char arrays is completely a source-level task. That is, if you convert all string literals using the above method, you can eliminate at least one relocation per string literal.
In fact, I don't see how object-level analysis will help you much, here. It will tell you if your modifications reduce the number of relocations needed, of course.
The above awk stanza can be extended to a function that outputs the string constants for dynamic references with positive offsets:
#!/bin/bash
if [ $# -ne 1 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
exec >&2
echo ""
echo "Usage: %s [ -h | --help ]"
echo " %s object.o"
echo ""
exit 1
fi
export LANG=C LC_ALL=C
objdump -wr "$1" | awk '
BEGIN {
RS = "[\t\v\f ]*[\r\n][\t\n\v\f\r ]*"
FS = "[\t\v\f ]+"
}
$1 ~ /^[0-9A-Fa-f]+/ {
n[$3]++
}
END {
for (s in n)
printf "%d %s\n", n[s], s
}
' | sort -g | gawk -v filename="$1" '
BEGIN {
RS = "[\t\v\f ]*[\r\n][\t\n\v\f\r ]*"
FS = "[\t\v\f ]+"
cmd = "objdump --file-offsets -ws " filename
while ((cmd | getline) > 0)
if ($3 == "section") {
s = $4
sub(/:$/, "", s)
o = $NF
sub(/\)$/, "", o)
start[s] = strtonum(o)
}
close(cmd)
}
{
if ($2 ~ /\..*\+/) {
s = $2
o = $2
sub(/\+.*$/, "", s)
sub(/^[^\+]*\+/, "", o)
o = strtonum(o) + start[s]
cmd = "dd if=\"" filename "\" of=/dev/stdout bs=1 skip=" o " count=256"
OLDRS = RS
RS = "\0"
cmd | getline hex
close(cmd)
RS = OLDRS
gsub(/\\/, "\\\\", hex)
gsub(/\t/, "\\t", hex)
gsub(/\n/, "\\n", hex)
gsub(/\r/, "\\r", hex)
gsub(/\"/, "\\\"", hex)
if (hex ~ /[\x00-\x1F\x7F-\x9F\xFE\xFF]/ || length(hex) < 1)
printf "%s\n", $0
else
printf "%s = \"%s\"\n", $0, hex
} else
print $0
}
'
This is a bit crude, just slapped together, so I don't know how portable it is. On my machine, it does seem to find the string literals for the few test cases I tried it on; you should probably rewrite it to match your own needs. Or even use an actual programming language with ELF support to examine the object files directly.
For the example program shown above (prior to the modifications I suggest to reduce the number of relocations), compiled without optimization, the above script yields the output
1 .data+0x000000000000001c = ""
1 .data-0x0000000000000004
1 .rodata
1 .rodata+0x0000000000000044 = ""
1 .rodata+0x00000000000000a8 = "Global:"
1 .rodata+0x00000000000000b5 = "Local:"
1 .rodata+0x00000000000000c0 = "static const char []"
1 .rodata+0x00000000000000d4 = ""
1 .text
1 __stack_chk_fail-0x0000000000000004
1 format
1 global4
1 global5-0x0000000000000004
1 global6-0x0000000000000004
1 global7
1 global8-0x0000000000000004
1 global9-0x0000000000000004
1 putchar-0x0000000000000004
2 .rodata+0x0000000000000015 = "static const char *"
2 .rodata+0x0000000000000029 = "static const char *const"
2 .rodata+0x000000000000005e = "const char *"
2 .rodata+0x000000000000006b = "const char *const"
2 .rodata+0x0000000000000088 = "char *"
2 .rodata+0x000000000000008f = "char *const"
2 puts-0x0000000000000004
18 .rodata+0x00000000000000b0 = "\t%s\n"
18 printf-0x0000000000000004
Finally, you might notice that using a function pointer to printf() instead of calling printf() directly will reduce another 18 relocations from the example code, but I would consider that a mistake.
For code, you want relocations, as indirect function calls (calls via function pointers) are much slower than direct calls. Simply put, those relocations make function and subroutine calls much faster, so you most definitely want to keep those.
Apologies for the long answer; hope you find this useful. Questions?
Based on Nomainal Animals's answer, which I still have to fully digest, I have come up with the following simple shell script that seems to work for finding what I called the "easily fixable" variety:
for i in path/to/*.o ; do
REL="$(objdump -TtRr "$i" 2>/dev/null | grep '.data.rel.ro.local[^]+-]')"
if [ -n "$REL" ]; then
echo "$(basename "$i"):"
echo "$REL" | c++filt
echo
fi
done
Sample output (for the QtGui library):
qimagereader.o:
0000000000000000 l O .data.rel.ro.local 00000000000000c0 _qt_BuiltInFormats
0000000000000000 l d .data.rel.ro.local 0000000000000000 .data.rel.ro.local
qopenglengineshadermanager.o:
0000000000000000 l O .data.rel.ro.local 0000000000000090 QOpenGLEngineShaderManager::getUniformLocation(QOpenGLEngineShaderManager::Uniform)::uniformNames
0000000000000000 l d .data.rel.ro.local 0000000000000000 .data.rel.ro.local
qopenglpaintengine.o:
0000000000000000 l O .data.rel.ro.local 0000000000000020 vtable for (anonymous namespace)::QOpenGLStaticTextUserData
0000000000000000 l d .data.rel.ro.local 0000000000000000 .data.rel.ro.local
qtexthtmlparser.o:
0000000000000000 l O .data.rel.ro.local 00000000000003b0 elements
0000000000000000 l d .data.rel.ro.local 0000000000000000 .data.rel.ro.local
Looking up those symbols in the source file usually leads quickly to a fix, or else to the discovery that they're not easily fixable.
But I guess I'll have to revisit Nominal Animal's answer once I run out of .data.rel.ro.locals to fix...