Whenever I create two separate libraries with LLVM 3.0 and link them together. I always get the following stack trace on exit.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000001004b0000
#0 0x00007fff8a95cda2 in memmove$VARIANT$sse42 ()
#1 0x00000001006020a0 in llvm::PassRegistry::removeRegistrationListener ()
#2 0x00000001005fbe60 in llvm::cl::list<llvm::PassInfo const*, bool, llvm::PassNameParser>::~list ()
#3 0x00007fff8a9767c8 in __cxa_finalize ()
#4 0x00007fff8a976652 in exit ()
I am creating one shared library from the Core component and one from the Target component.
I have tried calling:
LLVMPassRegistryRef pass_registry = LLVMGetGlobalPassRegistry();
LLVMInitializeCore(pass_registry);
Any ideas on how proceed?
I've found a simple solution in case anyone is wondering. The --enable-shared option (disabled by default) on the configure script creates a LLVM-3.X shared library. Linking to this rather than the output of the llvm-config --libs core solved it.
Related
So I am experiencing this really weird behavior of gdb on Linux (KDE Neon 5.20.2):
I start gdb and load my executable using the file command:
(gdb) file mumble
Reading symbols from mumble...
As you can see it did find debug symbols. Then I start my program (using start) which causes gdb to pause at the entry to the main function. At this point I can also print out the back trace using bt and it works as expected.
If I now continue my program and interrupt it at any point during startup, I can still display the backtrace without issues. However if I do something in my application that happens in another thread than the startup (which all happens in thread 1) and interrupt my program there, gdb will no longer be able to display the stacktrace properly. Instead it gives
(gdb) bt
#0 0x00007ffff5bedaff in ?? ()
#1 0x0000555556a863f0 in ?? ()
#2 0x0000555556a863f0 in ?? ()
#3 0x0000000000000004 in ?? ()
#4 0x0000000100000001 in ?? ()
#5 0x00007fffec005000 in ?? ()
#6 0x00007ffff58a81ae in ?? ()
#7 0x0000000000000000 in ?? ()
which shows that it can't find the respective debug symbols.
I compiled my application with cmake (gcc) using -DCMAKE_BUILD_TYPE=Debug. I also ensured that a bunch of debug symbols are present in the binary using objdump --debug mumble (Which also printed a few lines of objdump: Error: LEB value too large, but I'm not sure if this is related to the problem I am seeing).
While playing around with gdb, I also encountered the error
Cannot find user-level thread for LWP <SomeNumber>: generic error
a few times, which lets me suspect that maybe there is indeed some issue invloving threads here...
Finally I tried starting gdb and before loading my binary using set verbose on which yields
(gdb) set verbose on
(gdb) file mumble
Reading symbols from mumble...
Reading in symbols for /home/user/Documents/Git/mumble/src/mumble/main.cpp...done.
This does also look suspicious to me as only main.cpp is explicitly listed here (even though the project has much, much more source files). I should also note that all successful backtraces that I am able to produce (as described above) all originate from main.cpp.
I am honestly a bit clueless as to what might be the root issue here. Does someone have an idea what could be going on? Or how I could investigate further?
Note: I also tried using clang as a compiler but the result was the same.
Used program versions:
cmake: 3.18.4
gcc: 9.3.0
clang: 10.0.0
make: 4.2.1
I am working with DPDK version 18.11.8 stable on Linux, using a gcc x64 build.
At runtime I get a segmentation fault. Running gdb on the core dump gives this backtrace:
#0 0x0000000000f65680 in rte_eth_devices ()
#1 0x000000000048a03a in rte_eth_rx_burst (nb_pkts=7,
rx_pkts=0x7fab40620480, queue_id=0, port_id=<optimized out>)
at
/opt/dpdk/dpdk-18.08/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:3825
#2 Socket_poll (ucRxPortId=<optimized out>, ucRxQueId=ucRxQueId at entry=0
'\000', uiMaxNumOfRxFrm=uiMaxNumOfRxFrm at entry=7,
pISocketListener=pISocketListener at entry=0xf635d0 <FH_gtFrontHaulObj+16>)
at /data/<snip>/SocketClass.c:2188
#3 0x000000000048b941 in FH_perform (args_ptr=<optimized out>) at
/data/<snip>/FrontHaul.c:281
#4 0x00000000005788e4 in eal_thread_loop ()
#5 0x00007fab419fae65 in start_thread () from /lib64/libpthread.so.0
#6 0x00007fab4172388d in clone () from /lib64/libc.so.6
So it seems that rte_eth_rx_burst() calls rte_eth_devices () and that function crashes, presumably because of an illegal memory access. Possibly a hugepages problem?
I want to enable more debug info in DPDK. I am building DPDK using:
usertools/dpdk-setup.sh
Am I correct in thinking that the build commands in that script use make and I should modify the appropriate:
config/defconfig_*
file (defconfig_x86_64-native-linuxapp-gcc in my case) ?
If so, would these values be appropriate?
CONFIG_RTE_LIBRTE_ETHDEV_DEBUG=y
RTE_LOG_LEVEL=RTE_LOG_DEBUG
RTE_LIBRTE_ETHDEV_DEBUG=y
(not sure whether all values should be prefixed by 'CONFIG_'?)
I tried building DPDK using:
$ export EXTRA_CFLAGS='-O0 -g'
$ make install T=x86_64-native-linuxapp-gcc
but that gave no extra info in the backtrace.
EDIT: error is identified update is Fixed and running without crashing now
using chat room dpdk-debug, we were able to rebuild the libraries and application with proper CFLAGS. Using gdb have identified the probable cause is in rte_eth_rx_burst not being passed with pointer array for mbuf.
Based on the GDB details for frame 1, it looks the application is not build with the EXTRA_CFLAGS (assuming you are using DPDK example Makefile). The right way to build an DPDK application for debugging is to follow the steps as
cd [dpdk target folder]
make clean
make EXTRA_CFLAGS='-O0 -ggdb'
cd [application folder]
make EXTRA_CFLAGS='-O0 -ggdb'
then use GDB in TUI or non-TUI mode to analyze the error.
note:
one of the most common mistakes I commit in rx_burst, is passing *mbuf_array instead of **mbuf_array as the argument.
if custom Makefile is used for the application, pass the EXTRA_CFLAGS as CFLAGS+="-O0 -ggdb"
I am trying to create a C++ standalone app based on ArmNN that operates on ONNX models. To start with I have downloaded a few standard models for testing, and while trying to load the model I see a crash saying "Tensor numDimensions must be greater than 0".
The strange thing is, the function that I am invoking to load the model takes just one parameter which is the model name. There is no place for me to specify the dimensions and whatnot. Probably I am doing something wrong here? Or this is not the way to load the model?
I have compiled armnn with support for ONNX as detailed down here. The build and include folders have been copied to a ARM linux machine where I am trying to run the code. I am using a Makefile to compile and run it.
The model I am currently using is download from here.
Initially I was on ArmNN master branch, and while searching for this error message I came across the ArmNN release notes where it was mentioned that the very same error has been fixed in release 19.05. So I switched to tag v19.05 and rebuild everything from scratch and tried to run the application again, but the same error kept popping up.
Here is the C++ code -
#include "armnn/ArmNN.hpp"
#include "armnn/Exceptions.hpp"
#include "armnn/Tensor.hpp"
#include "armnn/INetwork.hpp"
#include "armnnOnnxParser/IOnnxParser.hpp"
int main(int argc, char** argv)
{
armnnOnnxParser::IOnnxParserPtr parser = armnnOnnxParser::IOnnxParser::Create();
std::cout << "\nmodel load start";
armnn::INetworkPtr network = parser->CreateNetworkFromBinaryFile("model.onnx");
std::cout << "\nmodel load end";
std::cout << "\nmain end";
return 0;
}
The Makefile looks like this -
ARMNN_LIB = /home/root/Rahul/armnn_onnx/build
ARMNN_INC = /home/root/Rahul/armnn_onnx/include
all: onnx_test
onnx_test: onnx_test.cpp
g++ -O3 -std=c++14 -I$(ARMNN_INC) onnx_test.cpp -I.-I/usr/include -L/usr/lib -lopencv_core -lopencv_imgcodecs -lopencv_highgui -o onnx_test -L$(ARMNN_LIB) -larmnn -lpthread -larmnnOnnxParser
clean:
-rm -f onnx_test
test: onnx_test
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$(ARMNN_LIB) ./onnx_test
Expected output -
The code should load the model as expected and do a clean exit.
Actual error message -
terminate called after throwing an instance of 'armnn::InvalidArgumentException'
what(): Tensor numDimensions must be greater than 0
model load startAborted (core dumped)
A gdb backtrace is provided below -
(gdb) r
Starting program: /home/root/Rahul/sample_onnx/onnx_test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
terminate called after throwing an instance of 'armnn::InvalidArgumentException'
what(): Tensor numDimensions must be greater than 0
model load start
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig#entry=6) at /usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51 }
(gdb) bt
#0 __GI_raise (sig=sig#entry=6) at /usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1 0x0000ffffbe97ff00 in __GI_abort () at /usr/src/debug/glibc/2.26-r0/git/stdlib/abort.c:90
#2 0x0000ffffbec0c0f8 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3 0x0000ffffbec09afc in ?? () from /usr/lib/libstdc++.so.6
#4 0x0000ffffbec09b50 in std::terminate() () from /usr/lib/libstdc++.so.6
#5 0x0000ffffbec09e20 in __cxa_throw () from /usr/lib/libstdc++.so.6
#6 0x0000ffffbefdad84 in armnn::TensorShape::TensorShape(unsigned int, unsigned int const*) () from /home/root/Rahul/armnn_onnx/build/libarmnn.so
#7 0x0000ffffbed454d8 in armnnOnnxParser::(anonymous namespace)::ToTensorInfo(onnx::ValueInfoProto const&) [clone .constprop.493] () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#8 0x0000ffffbed46080 in armnnOnnxParser::OnnxParser::SetupInfo(google::protobuf::RepeatedPtrField<onnx::ValueInfoProto> const*) () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#9 0x0000ffffbed461ac in armnnOnnxParser::OnnxParser::LoadGraph() () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#10 0x0000ffffbed46760 in armnnOnnxParser::OnnxParser::CreateNetworkFromModel(onnx::ModelProto&) () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#11 0x0000ffffbed469b0 in armnnOnnxParser::OnnxParser::CreateNetworkFromBinaryFile(char const*) () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#12 0x0000000000400a48 in main ()
It looks like a scalar in ONNX is represented as a tensor with no dimensions. So the problem here is that armnnOnnxParser is not correctly handling ONNX scalars. I would suggest raising an issue on the armnn Github.
I think you should try with at least one input layer and output layer.
// Helper function to make input tensors
armnn::InputTensors MakeInputTensors(const std::pair<armnn::LayerBindingId,
armnn::TensorInfo>& input,
const void* inputTensorData)
{
return { { input.first, armnn::ConstTensor(input.second, inputTensorData) } };
}
For reference visit: https://developer.arm.com/solutions/machine-learning-on-arm/developer-material/how-to-guides/configuring-the-arm-nn-sdk-build-environment-for-onnx
I have an Ubuntu 13.04 system with the latest SVN version of the Boost C++ libraries installed. The Boost installation was built using the system's native gcc version, v4.7.3. I use Boost pretty extensively, and it works very well when I compile using gcc; I have used many of them, including Boost.Thread (which I will talk about more below), without any issues.
My problem occurs if I try to build a program using the Intel C++ compiler (I've personally used a few different versions in the v13.x series) that link with the installed Boost libraries. When I do so, I get a segmentation fault immediately after program startup; it appears to occur during static initialization of the Boost.Thread library. Here's a simple example program:
#include <boost/version.hpp>
#include <boost/thread.hpp>
int main()
{
boost::this_thread::sleep(boost::posix_time::seconds(1));
}
I compile it using Intel C++:
icpc test.cc -lboost_thread -lboost_system -I/path/to/boost/inc/dir -L/path/to/boost/lib/dir
As I said, when I run the resulting program, I get a near-immediate segfault. Via gdb, the stack trace from the point of the segfault is as follows:
#0 0x00007ffff79b6351 in boost::exception_ptr boost::exception_detail::get_static_exception_object<boost::exception_detail::bad_exception_>() () from ./libboost_thread.so.1.55.0
#1 0x00007ffff79b02e1 in _GLOBAL__sub_I_thread.cpp () from ./libboost_thread.so.1.55.0
#2 0x00007ffff7de9876 in call_init (l=l#entry=0x7ffff7ff9a10, argc=argc#entry=1,
argv=argv#entry=0x7fffffffe0b8, env=env#entry=0x7fffffffe0c8) at dl-init.c:84
#3 0x00007ffff7de9930 in call_init (env=<optimized out>, argv=<optimized out>,
argc=<optimized out>, l=0x7ffff7ff9a10) at dl-init.c:55
#4 _dl_init (main_map=0x7ffff7ffe268, argc=1, argv=0x7fffffffe0b8, env=0x7fffffffe0c8)
at dl-init.c:133
#5 0x00007ffff7ddb68a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6 0x0000000000000001 in ?? ()
#7 0x00007fffffffe391 in ?? ()
#8 0x0000000000000000 in ?? ()
Not very enlightening, but it's clearly dying during the initialization of libboost_thread.so. If I rebuild Boost including debug symbols, then I get a slightly better picture:
#0 shared_count (r=..., this=0x7ffff7bbc5f8 <boost::exception_ptr boost::exception_detail::get_static_exception_object<boost::exception_detail::bad_exception_>()::ep+8>)
at ./boost/smart_ptr/shared_ptr.hpp:328
#1 shared_ptr (this=0x7ffff7bbc5f0 <boost::exception_ptr boost::exception_detail::get_static_exception_object<boost::exception_detail::bad_exception_>()::ep>) at ./boost/smart_ptr/shared_ptr.hpp:328
#2 exception_ptr (ptr=..., this=0x7ffff7bbc5f0 <boost::exception_ptr boost::exception_detail::get_static_exception_object<boost::exception_detail::bad_exception_>()::ep>)
at ./boost/exception/detail/exception_ptr.hpp:53
#3 boost::exception_detail::get_static_exception_object<boost::exception_detail::bad_exception_> () at ./boost/exception/detail/exception_ptr.hpp:130
#4 0x00007ffff79b02e1 in __static_initialization_and_destruction_0 (__initialize_p=<optimized out>, __priority=<optimized out>) at ./boost/exception/detail/exception_ptr.hpp:143
#5 _GLOBAL__sub_I_thread.cpp(void) () at libs/thread/src/pthread/thread.cpp:767
#6 0x00007ffff7de9876 in call_init (l=l#entry=0x7ffff7ff9a10, argc=argc#entry=1, argv=argv#entry=0x7fffffffe0b8, env=env#entry=0x7fffffffe0c8) at dl-init.c:84
#7 0x00007ffff7de9930 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=0x7ffff7ff9a10) at dl-init.c:55
#8 _dl_init (main_map=0x7ffff7ffe268, argc=1, argv=0x7fffffffe0b8, env=0x7fffffffe0c8) at dl-init.c:133
#9 0x00007ffff7ddb68a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#10 0x0000000000000001 in ?? ()
#11 0x00007fffffffe391 in ?? ()
#12 0x0000000000000000 in ?? ()
It's unclear to me what static/global object is causing the problem to occur, so I'm not sure how to proceed. I have duplicated this behavior using a number of Boost versions and a few different versions of the Intel C++ compiler in the v13.x series, which is the only release that I have access to at the moment. I have tried every compiler permutation (i.e. I have built Boost with both gcc and icpc and I've built my test application with both also); the only permutation that fails is where Boost is built with gcc and my test application is built using icpc. In every other case, the test application runs successfully.
With that said, you might be led to the obvious answer:
Why not just rebuild Boost using icpc and call it a day? That approach would seem to be effective, given my experimentation, but I have customers who like to use icpc to build my software. Those same customers are likely to have a Linux-distro-provided Boost package installed; they do not have any control over the build environment that was used to generate that package (and, in all likelihood, it was compiled using gcc anyway). Therefore, if it is possible to support such a mixed-compiler configuration, that would be optimal.
Does anyone have any recommendations as to how I might address this static initialization issue?
This is a long shot, but... If you have a different g++ in your PATH than the one used to build the Boost libraries, get rid of it or pass -gxx-name /usr/bin/g++ to icpc. (The Intel compiler adapts itself to the version of GCC it thinks you are using. -gxx-name lets you force the issue.)
OK that probably did not help.
Ubuntu 13.04 is not supported prior to Intel Composer XE 2013 SP1, aka. compiler version 14.0.0. See the "System Requirements" section of the Release Notes and compare it to the same section for the last 13.x release.
Intel definitely aims for link compatibility with GCC. If you can reproduce this problem on a clean install of a supported version of Linux, you should be able to submit a support ticket and get it fixed.
There was a core dump produced at the customer end for my application and while looking at the backtrace I don't have the symbols loaded...
(gdb) where
#0 0x000000364c032885 in ?? ()
#1 0x000000364c034065 in ?? ()
#2 0x0000000000000000 in ?? ()
(gdb) bt full
#0 0x000000364c032885 in ?? ()
No symbol table info available.
#1 0x000000364c034065 in ?? ()
No symbol table info available.
#2 0x0000000000000000 in ?? ()
No symbol table info available.
One think I want to mention in here is that the application being used is build with -g option.
To me it seems that the required libraries are not being loaded. I tried to load the libraries manually using the "symbol-file", but this doesn't help.
What could be the possible issue?
No symbol table info available.
Chances are you invoked GDB incorrectly. Don't do this:
gdb core
gdb -c core
Do this instead:
gdb exename core
Also see this answer for what you'll likely have to do to get meaningful crash stack trace for a core from customer's machine.
I was facing a similar issue and later found out that I am missing -g option, Make sure you have compiled the binary with -g.
This happens when you run gdb with path to executable that does not correspond to the one that produced the core dump.
Make sure that you provide gdb with the correct path.
<put an example of correct code or commands here>