libgcc_s.so conflicts could lead to cpu overload using exceptions? - c++

I developed a C++ server application for embedded i386 compatible environment, so no cross compiler was needed. A dynamic library developed by a collegue, making (large) use of exceptions tecnique. That library is demanded to implement network communications, and, once copied in the target file sytem, after the client connection, causes an abort with the common message:terminated after throwing an instance of... even if the libstdc++ is available on the embedded os.
After several tries, including static link of libraries, we apparently found a solution just copying the libgcc_s.so.1 used at compile time on a Fedora3 virtual machine to the embedded file system and launching the server with environment variable LD_LIBRARY_PATH=path to fedora lib.
On the embedded os we have a busybox with few and reduced tools, but we noticed, with the command uptime, that after the client connection, the cpu usage raised from 20% to 100% (and I don't know how... even more). The first impression is an application bug but it was never noticed during the debug sessions on the Fedora machine and if you see on /proc/task/status you will see this log:
Name: taskname
State: S (sleeping)
SleepAVG: 97%
Tgid: 589
Pid: 589
PPid: 1
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 256
Groups: 0
VmSize: 3396 kB
VmLck: 0 kB
VmRSS: 1604 kB
VmData: 492 kB
VmStk: 84 kB
VmExe: 84 kB
VmLib: 2512 kB
VmPTE: 20 kB
Threads: 1
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000080000000
SigIgn: 0000000000001004
SigCgt: 0000000380004a02
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff
So I cannot figure out who's using the cpu massively, even if the client disconnects.
This behaviour is not present if the server is launched on the Fedora machine.
I suspect that mixing the Fedora3 libgcc_s.so.1 with embedded system could lead to some strange side effect but I don't have any clue.
So I started to find another way to deploy the server:
Copying others required libraries from Fedora3 to embedded so (libstdc++ and libc). Same behaviour
Reversing the process: copying the required libraries to source tree and forcing the linker to use those libraries. Launching the application (compiler side) the error message terminated after throwing an instance of... respawned.
Additional Infos:
If useful: applying ldd -v libgcc_s.so.1 (not available on embedded system) on the two libraries I had the following results:
HOST LIBRARY:
libc.so.6 => /lib/tls/libc.so.6 (0x00694000)
/lib/ld-linux.so.2 (0x0067b000)
Version information:
/lib/libgcc_s.so.1:
libc.so.6 (GLIBC_2.2.4) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.1.3) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/tls/libc.so.6
/lib/tls/libc.so.6:
ld-linux.so.2 (GLIBC_2.1) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.3) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.0) => /lib/ld-linux.so.2
EMBEDDED LIBRARY:
libc.so.6 => /lib/tls/libc.so.6 (0xf6ec3000)
/lib/ld-linux.so.2 (0x0067b000)
Version information:
./libgcc_s.so.1:
libc.so.6 (GLIBC_2.1.3) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/tls/libc.so.6
/lib/tls/libc.so.6:
ld-linux.so.2 (GLIBC_2.1) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.3) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.0) => /lib/ld-linux.so.2
Someone have any explanation or suggestion?
Thank you
A. Cappelli
More info about processors type:
Compiler host /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.40GHz
stepping : 1
cpu MHz : 3390.524
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss nx pni
bogomips : 6471.68
Embedded machine /proc/cpu_info:
processor : 0
vendor_id : AuthenticAMD
cpu family : 4
model : 9
model name : 486 DX/4-WB
stepping : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu
bogomips : 65.40

If your embedded system has enough recent version of Linux kernel, you can try using Linux performance counters(perf). When you install them run perf record ./server on your embedded system. This will generate perf.data when the server exits. After that you can just analyze the file using perf report in the same directory as the file. It will show how much CPU% each library and executable symbol used. Then you can narrow down the issue to the libraries or your server code. More info about perf here

Related

Running a DPDK application binary with all dependent library (compiled in different machine) on another machine

A DPDK application with several runtime dependent libraries are compiled on one machine
Binaries and libraries are copied from that machine to another machine with similar specs and environment
Running the DPDK application with the parameters as given below, but the application crashes during rte_eal_init()
App-binary -l 1 -a 0000:02:00.0 -a 0000:03:00.0 -d /opt/upf/lib/ --proc-type=primary --file-prefix=.app_0000:02:00.0
This is the back trace from gnu debugger crash core file
#0 0x00007faaa0ead337 in raise () from /lib64/libc.so.6
#1 0x00007faaa0eaea28 in abort () from /lib64/libc.so.6
#2 0x00007faaa125104f in __rte_panic () from /opt/upf/lib/librte_eal.so.21
#3 0x00007faa9e228e1c in tailqinitfn_rte_ring_tailq () from /opt/upf/lib/librte_ring.so.21.0
#4 0x00007faaa278a973 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#5 0x00007faaa278f54e in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#6 0x00007faaa278a784 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#7 0x00007faaa278eb3b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#8 0x00007faaa0c73eeb in dlopen_doit () from /lib64/libdl.so.2
#9 0x00007faaa278a784 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00007faaa0c744ed in _dlerror_run () from /lib64/libdl.so.2
#11 0x00007faaa0c73f81 in dlopen##GLIBC_2.2.5 () from /lib64/libdl.so.2
#12 0x00007faaa125bc55 in eal_plugins_init () from /opt/upf/lib/librte_eal.so.21
#13 0x00007faaa126f2ba in rte_eal_init () from /opt/upf/lib/librte_eal.so.21
#14 0x000000000041414a in Dpdk_LibTask (arg=<optimized out>) at /root/5g_upf/core/service/common/dpdk/dpdk.c:1244
#15 0x00007faaa2566e65 in start_thread () from /lib64/libpthread.so.0
#16 0x00007faaa0f7588d in clone () from /lib64/libc.so.6
Updates:
Host Machine details:
i3 8100 3.6GHz 4 Cores
8 GB RAM
CentOS 7
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
GNU ld version 2.27-44.base.el7
DPDK 20.11.0
3 NICs bounded to DPDK
0000:01:00.0 '82574L Gigabit Network Connection 10d3' drv=uio_pci_generic unused=e1000e
0000:07:00.0 '82574L Gigabit Network Connection 10d3' drv=uio_pci_generic unused=e1000e
0000:08:00.0 '82574L Gigabit Network Connection 10d3' drv=uio_pci_generic unused=e1000e
Target Machine details:
i3 8100 3.6GHz 4 Cores
8 GB RAM
CentOS 7
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
GNU ld version 2.27-44.base.el7
DPDK 20.11.0
3 NICs bounded to DPDK
0000:02:00.0 '82574L Gigabit Network Connection 10d3' drv=uio_pci_generic unused=e1000e
0000:03:00.0 '82574L Gigabit Network Connection 10d3' drv=uio_pci_generic unused=e1000e
0000:04:00.0 '82574L Gigabit Network Connection 10d3' drv=uio_pci_generic unused=e1000e
[Based on the live debug with Sumesh].
Application background:
The application has dependency on DPDK libraries, 3rd party libraries and GNU libraries.
Actual open source project, builds DPDK 20.11.
Docker instance is started with docker run where root permission is shared and and copies these over to a local folder (same machine).
With LD_LIBRARY_PATH set to desired folder DPDK libraries dependency are corrected.
What caused the issues:
Machine-A is used to build the DPDK 20.11 libraries.
Instead of running docker instance on Machine-A, Machine-B is choose as target machine.
DPDK libraries are copied from Machine-A to Machine-B.
Docker-run is used to start the application in Machine-B
How to fix the issue:
DPDK libraries when built has numerous other libraries and version dependency
Build and install DPDK on target (MACHINE-B) and do not copy.
in docker-run share the permission to access the /usr/lib/lib64 which houses DPDK, 3rd party and GNU Libraries.
update the LD_LIBRARY_PATH to have access to right folder to resolve the dependency.
Note:
Tested and validated on both Host and Docker with hello-world for sanity
Sumesh is updating the scripts to reflect the folder permission for the custom application.

Mix and match GCC and Intel compilers: link OpenMP correctly

I have a scientific C++ application that is parallelized with OpenMP and compiled typically with GCC/8.2.0. The application further depends on gsl and fftw, the latter using OpenMP as well. The application uses a C API to access a Fortran library that is parallelized with OpenMP as well and can use either Intel's MKL or openblas as backend. Compilation of the library is preferred using the Intel/19.1.0 toolchain. I have successfully compiled, linked, and tested everything using GCC/8.2.0 and openblas (as base line). However, test studies on minimal examples suggest MKL with Intel would be faster and speed is important for my use case.
icc --version gives me: icc (ICC) 19.1.0.166 20191121; operating system is CentOS 7. Bear in mind I'm on a cluster and have limited control on what I can install. Software is centrally managed using spack and environments are loaded by specification of a compiler layer (only one at a time).
I have considered different approaches how to get the Intel/MKL library into my code:
Compile C++ and Fortran code using the Intel toolchain. While that's probably the tidiest solution, the compiler throws "internal error: 20000_7001" for a particular file with a OMP include. I could not find documentation for that particular error code and have not gotten feedback from Intel either (https://community.intel.com/t5/Intel-C-Compiler/Compilation-error-internal-error-20000-7001/m-p/1365208#M39785). I allocated > 80 GB of memory for compilation as I have experienced the compiler break down before when limited resources were available. Maybe someone here has seen that error code?
Compile C++ and Fortran code with GCC/8.2.0 but link dynamically to Intel compiled MKL as backend for the Fortran library. I managed to do that from the GCC/8.2.0 layer and extension of LIBRARY_PATH and LD_LIBRARY_PATH to where MKL lives on the cluster. It seems like only GNU OMP is linked and MKL was found. Analysis shows that CPU load is quite low (but higher than the binary with the GCC/8.2.0 + openblas set-up). Execution time of my program is improved by ~30%. However, I got this runtime error in at least one case when I run the binary with 20 cores: libgomp: Thread creation failed: Resource temporarily unavailable.
Sticking with GCC/8.2.0 for my C++ code and linking dynamically to the precompiled Fortran library that was compiled itself with Intel/MKL using Intel OMP. This approach turned out to be tricky. As with approach (2), I loaded the GCC environment and manually expanded LD_LIBRARY_PATH. A minimal example that is not OMP parallelized itself worked beautifully out of the box. However, even though I managed to compile and link my C++ program as well, I got an immediate runtime error once the OMP call in the Fortran library occurs.
Here is the output of ldd of the compiled C++ code:
linux-vdso.so.1 => (0x00007fff2d7bb000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ab227c25000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002ab227e41000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002ab228337000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002ab228595000)
libz.so.1 => /lib64/libz.so.1 (0x00002ab228a36000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002ab228c4c000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002ab228e53000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002ab22a16d000)
libm.so.6 => /lib64/libm.so.6 (0x00002ab22a4f1000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002ab22a7f3000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002ab22aa21000)
libc.so.6 => /lib64/libc.so.6 (0x00002ab22ac39000)
/lib64/ld-linux-x86-64.so.2 (0x00002ab227a01000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002ab22b007000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002ab22bb73000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002ab22e0df000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002ab2323ff000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002ab232763000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002ab232d01000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002ab234688000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002ab2349f2000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002ab234de2000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002ab235059000)
I did some research and found interesting discussions here and at Intel's documentation regarding crashes with two different OMP implementations:
Telling GCC to *not* link libgomp so it links libiomp5 instead
https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming-guide/openmp-support/openmp-library-support/using-the-openmp-libraries.html
http://www.nacad.ufrj.br/online/intel/Documentation/en_US/compiler_c/main_cls/optaps/common/optaps_par_compat_libs_using.htm
I followed the guidelines provided for the Intel OpenMP compatibility libraries. Compilation of my C++ code was done from the GCC environment using the -fopenmp flag as always. During the linking stage (g++), I took the same linker command I usually take but replaced -fopenmp by -L/cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64 -liomp5 -lpthread. The resulting binary runs like a charm and is roughly twice as fast as my original built (GCC/openblas).
Here is the output of ldd of the compiled C++ code:
linux-vdso.so.1 => (0x00007ffd7eb9a000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002b4fb08da000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b4fb0cca000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002b4fb0ee6000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002b4fb13dc000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002b4fb163a000)
libz.so.1 => /lib64/libz.so.1 (0x00002b4fb1adb000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002b4fb1cf1000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002b4fb1ef8000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002b4fb3212000)
libm.so.6 => /lib64/libm.so.6 (0x00002b4fb3596000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002b4fb3898000)
libc.so.6 => /lib64/libc.so.6 (0x00002b4fb3ab0000)
/lib64/ld-linux-x86-64.so.2 (0x00002b4fb06b6000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b4fb3e7e000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002b4fb4082000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002b4fb42b0000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002b4fb4e1c000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002b4fb7388000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002b4fbb6a8000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002b4fbba0c000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002b4fbbfaa000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002b4fbd931000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002b4fbdc9b000)
Unlike in approach (2), the binary is linked against both libiomp5 and libgomp. I suspect that I get references to libgomp because I link to libfftw3_omp, which was compiled with GCC/8.2.0. I find it quite puzzing that ldd seems to give the exact same links as for my first attempt with approach (3), only the order seems to have changed (libiomp5 before libgomp).
While I am quite happy to have gotten a working binary in the end, I have some questions I could not resolve by myself:
do you interpret Intel's documentation and the previous SO post like I do and agree that the Intel OpenMP compatibility libraries are applicable in my case and that I have used the correct workflow? Or do you think approach (3) is a recipe for disaster in the future?
does any of you have more experience with Intel's C++ compiler and has seen the error code described in approach (1)? (see update below)
do you think it's worth investigating whether I can completely get rid of libgomp by, for example, manually linking to Intel compiled libfftw3_omp that only depends on libiomp5? (see update below)
do you have an explanation why thread creation fails in some cases using approach (2)?
Thank you very much in advance!
// Update: In the meantime I managed to tweak approach (3) by not linking against GCC/8.2.0 compiled gsl and fftw but used instead Intel/19.1.0 compiled gsl and fftw. The resulting binary is similar in speed compared to what I have gotten before, however, links only to libiomp5.so, which seems like the cleaner solution to me.
// Update: Manual exclusion of compiler optimizations for files that throw internal errors from CMakeLists.txt (CMake: how to disable optimization of a single *.cpp file?) gave me a working binary, however, with linker warnings.

Can't see symbols from Erlang NIF library in core file

I'm working on an Erlang wrapper over a 3rd party C library on Ubuntu Linux on x86, so I'm creating a NIF. Sometimes my code (I think) crashes, resulting in a core file. Unfortunately the stacktrace is not really helpful:
(gdb) bt
#0 0x00007fc22229968a in ?? ()
#1 0x0000000060e816d8 in ?? ()
#2 0x0000000007cd48b0 in ?? ()
#3 0x00007fc228031410 in ?? ()
#4 0x00007fc228040b80 in ?? ()
#5 0x00007fc228040c50 in ?? ()
#6 0x00007fc22223de0b in ?? ()
#7 0x0000000000000000 in ?? ()
even though I built my NIF .so file with debug info:
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=b70dd1f2450f5c0e9980c8396aaad2e1cd29024c, with debug_info, not stripped
The beam also has debug info:
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e0a5dba6507b8c2b333faebc89fbc6ea2f7263b9, for GNU/Linux 3.2.0, with debug_info, not stripped
However, info sharedlibrary doesn't show neither the NIF nor the 3rd party lib:
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x00007fc28942ed50 0x00007fc289432004 Yes /lib/x86_64-linux-gnu/libgtk3-nocsd.so.0
0x00007fc289429220 0x00007fc28942a179 Yes /lib/x86_64-linux-gnu/libdl.so.2
0x00007fc2892e83c0 0x00007fc28938ef18 Yes /lib/x86_64-linux-gnu/libm.so.6
0x00007fc2892b76a0 0x00007fc2892c517c Yes /lib/x86_64-linux-gnu/libtinfo.so.6
0x00007fc28928dae0 0x00007fc28929d4d5 Yes /lib/x86_64-linux-gnu/libpthread.so.0
0x00007fc2890b9630 0x00007fc28922e20d Yes /lib/x86_64-linux-gnu/libc.so.6
0x00007fc289657100 0x00007fc289679674 Yes (*) /lib64/ld-linux-x86-64.so.2
0x00007fc24459c040 0x00007fc2445ab8ad Yes /home/nar/otp/23.3.4.2/lib/crypto-4.9.0.2/priv/lib/crypto.so
0x00007fc2239e3000 0x00007fc223b7c800 Yes (*) /lib/x86_64-linux-gnu/libcrypto.so.1.1
0x00007fc2896500e0 0x00007fc28965028c Yes /home/nar/otp/23.3.4.2/lib/crypto-4.9.0.2/priv/lib/crypto_callback.so
0x00007fc289649380 0x00007fc28964bc1c Yes /home/nar/otp/23.3.4.2/lib/asn1-5.0.15/priv/lib/asn1rt_nif.so
0x00007fc289638720 0x00007fc28963bd70 Yes /lib/x86_64-linux-gnu/librt.so.1
I found this answer mentioning that "The Erlang VM doesn't load NIF libraries with global symbols exposed". Could this be the reason why I don't see the symbols? Is there a way to tell gdb to look up symbols from my .so file?
I built the Erlang VM with debug enabled (I used kerl to build and set KERL_BUILD_DEBUG_VM to true), then started the erlang with the -debug option. This way some asserts were seem to be enabled in the code, they crashed and that lead to me to the bugs in my code. Since then I don't have the crashes.

arm: ./busybox: line 1: syntax error: unexpected word (expecting ")")

I am setting up a virtual machine specifically for crosscompiling for armv7l. As a test I decided to compile busybox, and while the crosscompilation itself works fine, upon uploading the resulting binary to a router with the correct architecture, the binary complains about ./busybox: line 1: syntax error: unexpected word (expecting ")")
I did not have this issue when compiling for x86, and as such I believe the problem is with my build environment.
It's based on ubuntu18 server, and I've installed these packages:
gcc-arm-linux-gnueabi
binutils-arm-linux-gnueabi
libncurses5-dev
gawk
build-essentials
make
my buildscript:
export ac_cv_linux_vers=2
export CC=/usr/local/arm-2011.09/bin/arm-none-linux-gnueabi-gcc
export GCC=/usr/local/arm-2011.09/bin/arm-none-linux-gnueabi-gcc
export CXX=/usr/local/arm-2011.09/bin/arm-none-linux-gnueabi-g++
export CPP=/usr/local/arm-2011.09/bin/arm-none-linux-gnueabi-cpp
export LD=/usr/local/arm-2011.09/bin/arm-none-linux-gnueabi-ld
export AR=/usr/local/arm-2011.09/bin/arm-none-linux-gnueabi-ar
export AS=/usr/local/arm-2011.09/bin/arm-none-linux-gnueabi-as
export NM=/usr/local/arm-2011.09/bin/arm-none-linux-gnueabi-nm
export RANLIB=/usr/local/arm-2011.09/arm-none-linux-gnueabi/bin/ranlib
export CC1=/usr/local/arm-2011.09/libexec/gcc/arm-none-linux-gnueabi/4.6.1/cc1
export PATH=/usr/local/arm-2011.09/bin:/usr/local/arm-2011.09/:/usr/local/arm-2011.09/lib:/usr/local/arm-2011.09/libexec/gcc/arm-none-linux-gnueabi/4.6.1:$PATH
export ac_cv_func_getpgrp_void=yes
export ac_cv_func_setpgrp_void=yes
export LDFLAGS="-static"
export CFLAGS="-Os -s"
# I already did make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-
make install
Any obvious flaws with my build process?
The platform on which I am trying to run busybox:
# cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 1 (v7l)
BogoMIPS : 1594.16
Features : swp half thumb fastmult edsp thumbee tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x4
CPU part : 0xc09
CPU revision : 1
CPU physical :0
processor : 1
model name : ARMv7 Processor rev 1 (v7l)
BogoMIPS : 1594.16
Features : swp half thumb fastmult edsp thumbee tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x4
CPU part : 0xc09
CPU revision : 1
CPU physical :1
Hardware : Hisilicon A9
Revision : 0000
Serial : 0000000000000000
# uname -m
armv7l
Solved it. It turned out that my make menuconfig was incomplete.
Busybox Settings -> Build Options -> Build Busybox as a static binary (no shared libs)
Busybox Settings -> Build Options -> Cross compiler prefix -> Set this option equal to "arm-linux-gnueabi-"
Busybox Settings -> Installation Options -> Don't use /usr -> Enable
Linux Module Utilities -> () Default directory containing modules -> blank
after that, the binary produced by the build script worked as intended

coredump at __correctly_grouped_prefixwc

the program under live env, segmentation fault some time, i try to gdb the coredump file,
but can't found the code line cause coredump.
Program terminated with signal 11, Segmentation fault.
#0 0x00000038f3a41bf5 in __correctly_grouped_prefixwc () from /lib64/libc.so.6
(gdb) bt
#0 0x00000038f3a41bf5 in __correctly_grouped_prefixwc () from /lib64/libc.so.6
#1 0x0000000000000000 in ?? ()
(gdb) info r
rax 0x1ac1b108 448901384
rbx 0x2add423b4ff0 47129787322352
rcx 0x2add48128640 47129885312576
rdx 0x0 0
rsi 0x1 1
rdi 0x2add48000020 47129884098592
rbp 0x2add3f1aef50 0x2add3f1aef50
rsp 0x2add423b4ff0 0x2add423b4ff0
r8 0x2 2
r9 0x2 2
r10 0x0 0
r11 0x0 0
r12 0x0 0
r13 0x3 3
r14 0x1000 4096
r15 0x2add3f1b0000 47129734873088
rip 0x38f3a41bf5 0x38f3a41bf5 <__correctly_grouped_prefixwc+165>
eflags 0x10246 [ PF ZF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fctrl 0x37f 895
fstat 0x0 0
ftag 0xffff 65535
fiseg 0x0 0
fioff 0xc54f06 12930822
foseg 0x2add 10973
fooff 0x423b3f00 1111179008
fop 0x0 0
mxcsr 0x1fa1 [ IE PE IM DM ZM OM UM PM ]
cat /etc/redhat-release
CentOS release 5.5 (Final)
and i want to debug glibc at the source level, run yum install yum-utils to install the debuginfo-install program.
then, run sudo debuginfo-install glibc, the result following
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* addons: centos.ustc.edu.cn
* base: mirror.bit.edu.cn
* extras: centos.ustc.edu.cn
* updates: centos.ustc.edu.cn
Checking for new repos for mirrors
Could not find debuginfo for main pkg: glibc-2.5-123.x86_64
Could not find debuginfo for main pkg: glibc-2.5-123.i686
No debuginfo packages available to install
and then i try to run yum search glibc-debuginfo
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* addons: centos.ustc.edu.cn
* base: mirrors.163.com
* extras: centos.ustc.edu.cn
* updates: centos.ustc.edu.cn
Warning: No matches found for: glibc-debuginfo
No Matches found
no matches found again.
i try to run yum search glibc
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* addons: centos.ustc.edu.cn
* base: mirrors.163.com
* extras: centos.ustc.edu.cn
* updates: centos.ustc.edu.cn
================================================================================ Matched: glibc =================================================================================
compat-glibc.i386 : Compatibility C library
compat-glibc.x86_64 : Compatibility C library
compat-glibc-headers.x86_64 : Header files for development using standard C libraries.
glibc.i686 : The GNU libc libraries.
glibc.x86_64 : The GNU libc libraries.
glibc-common.x86_64 : Common binaries and locale data for glibc
glibc-devel.i386 : Object files for development using standard C libraries.
glibc-devel.x86_64 : Object files for development using standard C libraries.
glibc-headers.x86_64 : Header files for development using standard C libraries.
glibc-utils.x86_64 : Development utilities from GNU C library
kernel-headers.x86_64 : Header files for the Linux kernel for use by glibc
nss_db.i386 : An NSS library for the Berkeley DB.
nss_db.x86_64 : An NSS library for the Berkeley DB.
yp-tools.x86_64 : NIS (or YP) client programs.
yum-protect-packages.noarch : Yum plugin to prevents Yum from removing itself and other protected packages
i try to run sudo yum install glibc-devel.x86_64, and gdb the coredump file again,
but it display the following
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
how can i find the code line cause coredump? i try to google, but not found, any ideas?
First, although __correctly_grouped_prefixwc caused the segmentation fault, it's likely that it was passed incorrect arguments from some other piece of code, perhaps strtod or strtol or something that called them. That being said, here is how to set things up so that gdb can show the line of source code in __correctly_grouped_prefixwc that caused the segmentation fault.
To do source-level debugging, you need an executable or shared object's debug info, and its source code. Linux and Unix distributions in general do not include these by default, to conserve storage space, but they make them available as packages.
On CentOS, you just need to install the debuginfo package for each executable or library you're interested in. To do this, run
sudo yum install yum-utils
which will install the debuginfo-install program, then run
sudo debuginfo-install glibc
to download and install the glibc-debuginfo-2.5-123 package (your version number may vary). This will install, among many other files, /usr/lib/debug/lib64/libc.so.6.debug, /usr/lib/debug/lib64/libc-2.5.so.debug, and /usr/src/debug/glibc-2.5-20061008T1257/stdlib/grouping.c, which are what you need.
debuginfo-install is a short python program that enables the debuginfo repositories and downloads and installs the debuginfo package corresponding to the package you give as an argument, plus all its dependencies. As an alternative, you can download the debuginfo packages directly from http://debuginfo.centos.org (or any mirrors) and install them using rpm -i.
You mentioned that you got the error No debuginfo packages available to install. Perhaps you don't have the debuginfo repo configured. On my CentOS 5 system, the configuration is in the file /etc/yum.repos.d/CentOS-Debuginfo.repo
# All debug packages from all the various CentOS-5 releases
# are merged into a single repo, split by BaseArch
#
# Note: packages in the debuginfo repo are currently not signed
#
[base-debuginfo]
name=CentOS-5 - Debuginfo
baseurl=http://debuginfo.centos.org/5/$basearch/
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5
enabled=0
For other releases, general instructions for adding the debuginfo repo are in this CentOS wiki article.