Code jumps into unreferenced shared object - c++

My statically linked CryptoPP code (when called via Matlab mex on Linux) jumps into the libmwflcryptocryptopp.so binary (and then it freezes).
How can my code jump into a foreign .so instead of the statically linked library?
from function CryptoPP::SourceTemplate<CryptoPP::FileStore>::PumpAll2 in my vcpkg_installed/.../cryptopp/filters.h:1443 it jumps into CryptoPP::BufferedTransformation::TransferAllTo2 from Matlabs own libmwflcryptocryptopp.so file.
the gdb stack trace
#0 0x00007ffff02e6cab in CryptoPP::BufferedTransformation::Peek(unsigned char&) const () from /usr/local/MATLAB/R2020b/bin/glnxa64/libmwflcryptocryptopp.so
#1 0x00007ffff02e6c19 in CryptoPP::BufferedTransformation::AnyRetrievable() const () from /usr/local/MATLAB/R2020b/bin/glnxa64/libmwflcryptocryptopp.so
#2 0x00007ffff02e6ffe in CryptoPP::BufferedTransformation::TransferMessagesTo2(CryptoPP::BufferedTransformation&, unsigned int&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /usr/local/MATLAB/R2020b/bin/glnxa64/libmwflcryptocryptopp.so
#3 0x00007ffff02e7129 in CryptoPP::BufferedTransformation::TransferAllTo2(CryptoPP::BufferedTransformation&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /usr/local/MATLAB/R2020b/bin/glnxa64/libmwflcryptocryptopp.so
#4 0x00007fff9758aa4d in CryptoPP::SourceTemplate<CryptoPP::FileStore>::PumpAll2 (this=0x7fffdd87fbc0, blocking=<optimized out>)
at /home/keinkoenig/build-o1/vcpkg_installed/x64-linux/include/cryptopp/filters.h:1443
#5 0x00007fff9758a130 in CryptoPP::Source::PumpAll (this=0x7fffdd87fbc0) at /home/keinkoenig/build-o1/vcpkg_installed/x64-linux/include/cryptopp/filters.h:1420
#6 CryptoPP::Source::SourceInitialize (parameters=warning: RTTI symbol not found for class 'CryptoPP::AlgorithmParameters'
..., pumpAll=true, this=0x7fffdd87fbc0) at /home/keinkoenig/build-o1/vcpkg_installed/x64-linux/include/cryptopp/filters.h:1420
#7 CryptoPP::FileSource::FileSource (attachment=0x7fffd6063160, pumpAll=true, in=..., this=0x7fffdd87fbc0) at /home/keinkoenig/build-o1/vcpkg_installed/x64-linux/include/cryptopp/files.h:102
#8 mexFunction (nlhs=<optimized out>, plhs=<optimized out>, nrhs=<optimized out>, prhs=<optimized out>) at /home/keinkoenig/src/CryptoppMinimal/CryptoppMinimal.cpp:18
minmal example code
#include <mex.h>
#include <cryptopp/files.h>
#include <cryptopp/aes.h>
#include <cryptopp/modes.h>
#include <fstream>
unsigned char Key[CryptoPP::AES::DEFAULT_KEYLENGTH] = { 0x00,0x11,0x22,0x33,0x44,0x55,0x66,0x77,0x88,0x99,0xAA,0xBB,0xCC,0xDD,0xEE,0xFF };
unsigned char IV[CryptoPP::AES::BLOCKSIZE] = { 0x00,0x11,0x22,0x33,0x44,0x55,0x66,0x77,0x88,0x99,0xAA,0xBB,0xCC,0xDD,0xEE,0xFF };
extern "C"
{
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
std::ifstream instream(mxArrayToString(prhs[0]));
std::string decryptedString;
CryptoPP::CFB_Mode<CryptoPP::AES>::Decryption cfbDecryption(Key, sizeof(Key), IV);
CryptoPP::FileSource(instream, true, new CryptoPP::StreamTransformationFilter( cfbDecryption, new CryptoPP::StringSink( decryptedString)));
}
}
and the CMakeLists linking
set(PROJECT_NAME CryptoppMinimal)
...
target_link_libraries(${PROJECT_NAME} PRIVATE cryptopp-static)
starting console matlab with gdb
/usr/local/MATLAB/R2020b/bin/matlab -nojvm -Dgdb
and running the test mex function in Matlab
addpath ~/build-o1/bin
CryptoppMinimal('/home/keinkoenig/src/encrypted.dat')

TL;DR; your cryptocpp is getting LD_PRELOADed by Matlab, even if it is being linked statically.
A big hint towards what is going on is the path of the .so being loaded:
#3 0x00007ffff02e7129 [...] /usr/local/MATLAB/R2020b/bin/glnxa64/libmwflcryptocryptopp.so
Compared to the headers that were used during compilation:
#5 [...] /home/keinkoenig/build-o1/vcpkg_installed/x64-linux/include/cryptopp/filters.h:1420
What this shows is that not only is cryptocpp running from a dynamic library, it's running from a different .so than the one you compiled against!
How could this happen? If a static library is compiled with position independent code, calls to functions in the library will still be dispatched in the same way they would for a dynamic library: with a jump table.
And much like dynamic libraries, if the jump table is already populated when the code loaded, the existing function pointers will be reused.
So, if:
crypto-cpp was compiled with -fPIC
Matlab happens to have a crypto-cpp .so loaded prior to your code being loaded.
Then the version of crypto-cpp embedded within your project will be ignored, and Matlab's will be used instead.
Having a quick look at the crypto-cpp project, we find, in the Makefile:
# Add -fPIC for targets *except* X86, X32, Cygwin or MinGW
ifeq ($(IS_X86)$(IS_CYGWIN)$(IS_MINGW),000)
ifeq ($(findstring -fpic,$(CXXFLAGS))$(findstring -fPIC,$(CXXFLAGS)),)
CRYPTOPP_CXXFLAGS += -fPIC
endif
endif
Which appears to confirm this is what's going on.
You can confirm that by loading and running your library from outside Matlab, and it should be calling the static library's code as expected.

Related

Is armadillo solve() thread safe?

In my code I have loop in which I construct and over determined linear system and try to solve it:
#pragma omp parallel for
for (int i = 0; i < n[0]+1; i++) {
for (int j = 0; j < n[1]+1; j++) {
for (int k = 0; k < n[2]+1; k++) {
arma::mat A(max_points, 2);
arma::mat y(max_points, 1);
// initialize A and y
arma::vec solution = solve(A,y);
}
}
}
Sometimes, quite randomly the program hangs or the results in the solution vector are NaN. And if I put do this:
arma::vec solution;
#pragma omp critical
{
solution = solve(weights*A,weights*y);
}
then these problem don't seem to happen anymore.
When it hangs, it does so because some threads are waiting at the OpenMP barrier:
Thread 2 (Thread 0x7fe4325a5700 (LWP 39839)):
#0 0x00007fe44d3c2084 in gomp_team_barrier_wait_end () from /usr/lib64/gcc-4.9.2/lib64/gcc/x86_64-redhat-linux-gnu/4.9.2/libgomp.so.1
#1 0x00007fe44d3bf8c2 in gomp_thread_start () at ../.././libgomp/team.c:118
#2 0x0000003f64607851 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003f642e890d in clone () from /lib64/libc.so.6
And the other threads are stuck inside Armadillo:
Thread 1 (Thread 0x7fe44afe2e60 (LWP 39800)):
#0 0x0000003ee541f748 in dscal_ () from /usr/lib64/libblas.so.3
#1 0x00007fe44c0d3666 in dlarfp_ () from /usr/lib64/atlas/liblapack.so.3
#2 0x00007fe44c058736 in dgelq2_ () from /usr/lib64/atlas/liblapack.so.3
#3 0x00007fe44c058ad9 in dgelqf_ () from /usr/lib64/atlas/liblapack.so.3
#4 0x00007fe44c059a32 in dgels_ () from /usr/lib64/atlas/liblapack.so.3
#5 0x00007fe44f09fb3d in bool arma::auxlib::solve_ud<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> >(arma::Mat<double>&, arma::Mat<double>&, arma::Base<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> > const&) () at /usr/include/armadillo_bits/lapack_wrapper.hpp:677
#6 0x00007fe44f0a0f87 in arma::Col<double>::Col<arma::Glue<arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::glue_solve> >(arma::Base<double, arma::Glue<arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::glue_solve> > const&) ()
at /usr/include/armadillo_bits/glue_solve_meat.hpp:39
As you can see from the stacktrace my version of Armadillo uses atlas. And according to this documentation atlas seems to be thread safe: ftp://lsec.cc.ac.cn/netlib/atlas/faq.html#tsafe
Update 9/11/2015
I finally got some time to run more tests, based on the suggestions of Vladimir F.
When I compile armadillo with ATLAS's BLAS, I'm still able to reproduce then hangs and the NaNs. When it hangs, the only thing that changes in the stacktrace is the call to BLAS:
#0 0x0000003fa8054718 in ATL_dscal_xp1yp0aXbX#plt () from /usr/lib64/atlas/libatlas.so.3
#1 0x0000003fb05e7666 in dlarfp_ () from /usr/lib64/atlas/liblapack.so.3
#2 0x0000003fb0576a61 in dgeqr2_ () from /usr/lib64/atlas/liblapack.so.3
#3 0x0000003fb0576e06 in dgeqrf_ () from /usr/lib64/atlas/liblapack.so.3
#4 0x0000003fb056d7d1 in dgels_ () from /usr/lib64/atlas/liblapack.so.3
#5 0x00007ff8f3de4c34 in void arma::lapack::gels<double>(char*, int*, int*, int*, double*, int*, double*, int*, double*, int*, int*) () at /usr/include/armadillo_bits/lapack_wrapper.hpp:677
#6 0x00007ff8f3de1787 in bool arma::auxlib::solve_od<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> >(arma::Mat<double>&, arma::Mat<double>&, arma::Base<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> > const&) () at /usr/include/armadillo_bits/auxlib_meat.hpp:3434
Compiling without ATLAS, only with netlib BLAS and LAPACK, I was able to reproduce the NaNs but not the hangs.
In both cases, surrounding solve() with #pragma omp critical I have no problems at all
Are you sure your systems are over determined? solve_ud in your stack trace says otherwise. Though you have solve_od too, and probably that's nothing to do with the issue. But it doesn't hurt to find why that's happening and fix it if you think the systems should be od.
Is armadillo solve() thread safe?
That I think depends on your lapack version, also see this. Looking at the code of solve_od all the variables accessed seem to be local. Note the warning in the code:
NOTE: the dgels() function in the lapack library supplied by ATLAS 3.6
seems to have problems
Thus it seems only lapack::gels can cause trouble for you. If fixing lapack is not possible, a workaround is to stack your systems and solve a single large system. That probably would be even more efficient if your individual systems are small.
The thread safety of Armadillo's solve() function depends (only) on the BLAS library that you use. The LAPACK implementations are thread safe when BLAS is. The Armadillo solve() function is not thread safe when linking to the reference BLAS library. However, it is thread safe when using OpenBLAS. Additionally, ATLAS provides a BLAS implementation that also mentions it is thread safe, and the Intel MKL is thread safe as well, but I have no experience with Armadillo linked to those libraries.
Of course, this only applies when you run solve() from multiple threads with different data.

Xcode 5.1 std::string concatenation crash on iOS 5.1 devices in release builds

When built with the latest XCode 5.1, iOS SDK 7.1 and LLVM 5.1, using libstdc++ for the C++ standard library, get crash in the std::string append method only on iOS 5.1 device.
Here's code example:
class TestClass
{
public:
TestClass()
{
m_string = "string1";
}
void AppendString()
{
m_string += std::string("string2");
std::string newString2 = m_string + "string3";
}
private:
string m_string;
};
App crashes in AppendString() method on this line:
std::string newString2 = m_string + "string3";
If I remove this line m_string += std::string("string2"); before creating newString2, code works fine.
Here's the stacktrace of crash:
#0 0x34c99fe8 in memcpy$VARIANT$CortexA8 ()
#1 0x3706f95a in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) ()
#2 0x3706fbfa in std::string::reserve(unsigned long) ()
#3 0x3706fdb0 in std::string::append(char const*, unsigned long) ()
#4 0x000c0e5a in std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*) [inlined] at /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.1.sdk/usr/include/c++/4.2.1/bits/basic_string.h:2121
#5 0x000c0e3c in TestClass::AppendString() at /Users/sergei/Documents/2_3/House of Fun/cpp/CasinoSceneManager.cpp:122
Have anybody experinced same crash? It happens only on devices with iOS 5.1 and 5.1.1 and only if app was compiled with release flag, not debug.
Thank you for the help.
One more note.
If I change Optimization level to None [-O0] in Code Generation section in project settings, everything works fine.
Maybe the problem has some connection to code optimization.
Apple released a new GM seed for Xcode 5.1.1. In the release notes they say they fixed a couple crashes:
Fixed a compiled code crash on when targeting iOS 5.1.1. (16485980)!
Fixed a compiled code crash when using ARC and C++. (16368824)
http://adcdownload.apple.com//Developer_Tools/xcode_5.1.1_gm_seed/release_notes_xcode_5.1.1_gm_seed.pdf

Using GCC's function instrumentation, why does using C++ STL containers or stream I/O cause a segfault?

I recently read about using GCC's code generation features (specifically, the -finstrument-functions compiler flag) to easily add instrumentation to my programs. I thought it sounded really cool and went to try it out on a previous C++ project. After several revisions of my patch, I found that any time I tried to use an STL container or print to stdout using C++ stream I/O, my program would immediately crash with a segfault. My first idea was to maintain a std::list of Event structs
typedef struct
{
unsigned char event_code;
intptr_t func_addr;
intptr_t caller_addr;
pthread_t thread_id;
timespec ts;
}Event;
list<Event> events;
which would be written to a file when the program terminated. GDB told me that when I tried to add an Event to the list, calling events.push_back(ev) itself initiated an instrumentation call. This wasn't terrible surprising and made sense after I thought about it for a bit, so on to plan 2.
The example in the blog which got me involved in all this mess didn't do anything crazy, it simply wrote a string to a file using fprintf(). I didn't think there would be any harm in using C++'s stream-based I/O instead of the older (f)printf(), but that assumption proved to be wrong. This time, instead of a nearly-infinite death spiral, GDB reported a fairly normal-looking descent into the standard library... followed by a segfault.
A Short Example
#include <list>
#include <iostream>
#include <stdio.h>
using namespace std;
extern "C" __attribute__ ((no_instrument_function)) void __cyg_profile_func_enter(void*, void*);
list<string> text;
extern "C" void __cyg_profile_func_enter(void* /* unused */, void* /* unused */)
{
// Method 1
text.push_back("NOPE");
// Method 2
cout << "This explodes" << endl;
// Method 3
printf("This works!");
}
Sample GDB Backtrace
Method 1
#0 _int_malloc (av=0x7ffff7380720, bytes=29) at malloc.c:3570
#1 0x00007ffff704ca45 in __GI___libc_malloc (bytes=29) at malloc.c:2924
#2 0x00007ffff7652ded in operator new(unsigned long) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff763ba89 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff763d495 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff763d5e3 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00000000004028c1 in __cyg_profile_func_enter () at src/instrumentation.cpp:82
#7 0x0000000000402c6f in std::move<std::string&> (__t=...) at /usr/include/c++/4.6/bits/move.h:82
#8 0x0000000000402af5 in std::list<std::string, std::allocator<std::string> >::push_back(std::string&&) (this=0x6055c0, __x=...) at /usr/include/c++/4.6/bits/stl_list.h:993
#9 0x00000000004028d2 in __cyg_profile_func_enter () at src/instrumentation.cpp:82
#10 0x0000000000402c6f in std::move<std::string&> (__t=...) at /usr/include/c++/4.6/bits/move.h:82
#11 0x0000000000402af5 in std::list<std::string, std::allocator<std::string> >::push_back(std::string&&) (this=0x6055c0, __x=...) at /usr/include/c++/4.6/bits/stl_list.h:993
#12 0x00000000004028d2 in __cyg_profile_func_enter () at src/instrumentation.cpp:82
#13 0x0000000000402c6f in std::move<std::string&> (__t=...) at /usr/include/c++/4.6/bits/move.h:82
#14 0x0000000000402af5 in std::list<std::string, std::allocator<std::string> >::push_back(std::string&
...
Method 2
#0 0x00007ffff76307d1 in std::ostream::sentry::sentry(std::ostream&) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff7630ee9 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007ffff76312ef in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x000000000040251e in __cyg_profile_func_enter () at src/instrumentation.cpp:81
#4 0x000000000040216d in _GLOBAL__sub_I__ZN8GLWindow7attribsE () at src/glwindow.cpp:164
#5 0x0000000000402f2d in __libc_csu_init ()
#6 0x00007ffff6feb700 in __libc_start_main (main=0x402cac <main()>, argc=1, ubp_av=0x7fffffffe268,
init=0x402ed0 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffffffe258) at libc-start.c:185
#7 0x0000000000401589 in _start ()
Environment:
Ubuntu Linux 12.04 (x64)
GCC 4.6.3
Intel 3750K CPU
8GB RAM
The problem with using cout in the instrumentation function is that the instrumentation function is being called by __libc_csu_init() which is a very early part of the runtime's initialization - before global C++ objects get a chance to be constructed (in fact, I think __libc_csu_init() is responsible for kicking off those constructors - at least indirectly).
So cout hasn't had a chance to be constructed yet and trying to use it doesn't work very well...
And that may well be the problem you run into with trying to use std::List after fixing the infinite recursion (mentioned in Dave S' answer).
If you're willing to lose some instrumentation during initialization, you can do something like:
#include <iostream>
#include <stdio.h>
int initialization_complete = 0;
using namespace std;
extern "C" __attribute__ ((no_instrument_function)) void __cyg_profile_func_enter(void*, void*);
extern "C" void __cyg_profile_func_enter(void* /* unused */, void* /* unused */)
{
if (!initialization_complete) return;
// Method 2
cout << "This explodes" << endl;
// Method 3
printf("This works! ");
}
void foo()
{
cout << "foo()" << endl;
}
int main()
{
initialization_complete = 1;
foo();
}
The first case seems to be an infinite loop, resulting in stack overflow. This is probably because std::list is a template, and it's code is generated as part of the translation unit where you're using it. This causes it to get instrumented as well. So you call push_back, which calls the handler, which calls push_back, ...
The second, if I had to guess, might be similar, though it's harder to tell.
The solution is to compile the instrumentation functions separately, without the -finstrument-functions. Note, the example blog compiled the trace.c separately, without the option.

Initialization across c++ translation units

Or I have two files, each with one global initialization. One depends on the other.
Simplified example:
file1.h:
#include <string>
extern const std::string PREFIX;
file1.cpp:
#include "file1.h"
const std::string PREFIX = "prefix,";
file2.cpp:
#include "file1.h"
std::string MSG = PREFIX + "body";
int main(){}
I compile them as such:
/usr/local/bin/g++-4.6.2 -c -Wall -g -o file1.o file1.cpp
/usr/local/bin/g++-4.6.2 -c -Wall -g -o file2.o file2.cpp
/usr/local/bin/g++-4.6.2 -Wall -g -o example file1.o file2.o
When I run this, it segfaults. gdb trace:
Starting program: example
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b7ae0b in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) ()
from /usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.6.2/libstdc++.so.6
(gdb) bt
#0 0x00007ffff7b7ae0b in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) ()
from /usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.6.2/libstdc++.so.6
#1 0x00000000004009b5 in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=Traceback (most recent call last):
File "/usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.6.2/../../../../share/gcc-4.6.2/python/libstdcxx/v6/printers.py", line 587, in to_string
return ptr.lazy_string (length = len)
RuntimeError: Cannot access memory at address 0xffffffffffffffe8
, __rhs=0x400af0 "body") at /usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.6.2/include/c++/bits/basic_string.h:2346
#2 0x000000000040095f in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at file2.cpp:3
#3 0x000000000040098b in _GLOBAL__sub_I_MSG () at file2.cpp:6
#4 0x0000000000400ab6 in __do_global_ctors_aux ()
#5 0x00000000004006e3 in _init ()
#6 0x00007ffff7ffa5a0 in ?? ()
#7 0x0000000000400a45 in __libc_csu_init ()
#8 0x00007ffff72dcbe0 in __libc_start_main () from /lib/libc.so.6
#9 0x00000000004007c9 in _start ()
Is there anyway to do what I'm trying to do (keeping in mind that this is an overly simplified example)? Or am I at the mercy of the initialization order chosen by the compiler/linker?
You need to make the prefix string be there before the other strings. One way is to change it to a C string
// header
#include <string>
extern const char * const PREFIX;
// .cpp file
const char * const PREFIX = "prefix,";
Another way is to return the prefix from a function and use PREFIX() where you previously used PREFIX.
// header
inline const string& PREFIX() {
static const string value = "prefix,";
return value;
}
At last, just a hint: Names with all uppercase letters are used for macros only in most coding conventions, so I would avoid such names for variables and functions.
I guess this appears because the code attempts to initialize your globals in the wrong order. There is a SO question about initialisation order, its answers should prove useful.
It's obviously not a portable solution, but for gcc you can use function attributes - with the __attribute__ syntax. i.e., the __init_priority__ (priority) attribute. The initialization priorities cross translation units. link

unable to create a C++ ruby extension

I have problems creating a ruby extension to export a C++ library I wrote to ruby under OSX. This simple example:
#include <boost/regex.hpp>
extern "C" void Init_bayeux()
{
boost::regex expression("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?");
}
results in a bad_cast exception being thrown:
#0 0x00000001014663bd in __cxa_throw ()
#1 0x00000001014cf6b2 in __cxa_bad_cast ()
#2 0x00000001014986f9 in std::use_facet<std::collate<char> > ()
#3 0x0000000101135a4f in boost::re_detail::cpp_regex_traits_base<char>::imbue (this=0x7fff5fbfe4d0, l=#0x7fff5fbfe520) at cpp_regex_traits.hpp:218
#4 0x0000000101138d42 in cpp_regex_traits_base (this=0x7fff5fbfe4d0, l=#0x7fff5fbfe520) at cpp_regex_traits.hpp:173
#5 0x000000010113eda6 in boost::re_detail::create_cpp_regex_traits<char> (l=#0x7fff5fbfe520) at cpp_regex_traits.hpp:859
#6 0x0000000101149bee in cpp_regex_traits (this=0x101600200) at cpp_regex_traits.hpp:880
#7 0x0000000101142758 in regex_traits (this=0x101600200) at regex_traits.hpp:75
#8 0x000000010113d68c in regex_traits_wrapper (this=0x101600200) at regex_traits.hpp:169
#9 0x000000010113bae1 in regex_data (this=0x101600060) at basic_regex.hpp:166
#10 0x000000010113981e in basic_regex_implementation (this=0x101600060) at basic_regex.hpp:202
#11 0x0000000101136e1a in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::do_assign (this=0x7fff5fbfe710, p1=0x100540ae0 "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?", p2=0x100540b19 "", f=0) at basic_regex.hpp:652
#12 0x0000000100540a66 in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::assign (this=0x7fff5fbfe710, p1=0x100540ae0 "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?", p2=0x100540b19 "", f=0) at basic_regex.hpp:379
#13 0x0000000100540a13 in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::assign (this=0x7fff5fbfe710, p=0x100540ae0 "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?", f=0) at basic_regex.hpp:364
#14 0x000000010054096e in basic_regex (this=0x7fff5fbfe710, p=0x100540ae0 "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?", f=0) at basic_regex.hpp:333
#15 0x00000001005407e2 in Init_bayeux () at bayeux.cpp:10
#16 0x0000000100004593 in dln_load (file=0x1008bc000 "/Users/todi/sioux/lib/debug/rack/bayeux.bundle") at dln.c:1293
I compile the extension with:
g++ ./source/rack/bayeux.cpp -o /Users/todi/sioux/obj/debug/rack/bayeux.o -Wall -pedantic -Wno-parentheses -Wno-sign-compare -fno-common -c -pipe -I/Users/todi/sioux/source -ggdb -O0
And finally link the dynamic library with:
g++ -o /Users/todi/sioux/lib/debug/rack/bayeux.bundle -bundle -ggdb /Users/todi/sioux/obj/debug/rack/bayeux.o -L/Users/todi/sioux/lib/debug -lrack -lboost_regex-mt-d -lruby
I've searched the web and tried all kind of link and compiler switches. If I build a executable there is no such problem. Does someone else had such a problem and found a solution?
I've further investigated this and found that the function causing the exception looks like this:
std::locale loc = std::locale("C");
std::use_facet< std::collate<char> >( loc );
In the source of std::collate<> I found the throw statment:
use_facet(const locale& __loc)
{
const size_t __i = _Facet::id._M_id();
const locale::facet** __facets = __loc._M_impl->_M_facets;
if (__i >= __loc._M_impl->_M_facets_size || !__facets[__i])
__throw_bad_cast();
#ifdef __GXX_RTTI
return dynamic_cast<const _Facet&>(*__facets[__i]);
#else
return static_cast<const _Facet&>(*__facets[__i]);
#endif
}
Does this makes any sense to you?
Update: I've tried Jan's suggestion:
Todis-MacBook-Pro:rack todi$ g++ -shared -fpic -o bayeux.bundle bayeux.cpp
Todis-MacBook-Pro:rack todi$ ruby -I. -rbayeux -e 'puts :ok'
terminate called after throwing an instance of 'std::bad_cast'
what(): std::bad_cast
Abort trap
versions:
Todis-MacBook-Pro:rack todi$ ruby -v
ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-darwin10.6.0]
Todis-MacBook-Pro:rack todi$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/local/libexec/gcc/x86_64-apple-darwin10/4.5.2/lto-wrapper
Target: x86_64-apple-darwin10
Configured with: ../gcc-4.5.2/configure --prefix=/opt/local --build=x86_64-apple-darwin10 --enable-languages=c,c++,objc,obj-c++,fortran,java --libdir=/opt/local/lib/gcc45 --includedir=/opt/local/include/gcc45 --infodir=/opt/local/share/info --mandir=/opt/local/share/man --datarootdir=/opt/local/share/gcc-4.5 --with-local-prefix=/opt/local --with-system-zlib --disable-nls --program-suffix=-mp-4.5 --with-gxx-include-dir=/opt/local/include/gcc45/c++/ --with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local --enable-stage1-checking --disable-multilib --enable-fully-dynamic-string
Thread model: posix
gcc version 4.5.2 (GCC)
Update:
It's not the bound-check in use_facet() that throws, but the next line, that actually does a dynamic cast. This example boils it down to maybe something with RTTI:
#define private public
#include <locale>
#include <iostream>
#include <typeinfo>
extern "C" void Init_bayeux()
{
std::locale loc = std::locale("C");
printf( "size: %i\n", loc._M_impl->_M_facets_size );
printf( "id: %i\n", std::collate< char >::id._M_id() );
const std::locale::facet& fac = *loc._M_impl->_M_facets[ std::collate< char >::id._M_id() ];
printf( "name: %s\n", typeid( fac ).name());
printf( "name: %s\n", typeid( std::collate<char> ).name());
const std::type_info& a = typeid( fac );
const std::type_info& b = typeid( std::collate<char> );
printf( "equal: %i\n", !a.before( b ) && !b.before( a ) );
dynamic_cast< const std::collate< char >& >( fac );
}
I've used printf() because usage of cout also fails. The output of the code above is:
size: 28
id: 5
name: St7collateIcE
name: St7collateIcE
equal: 1
terminate called after throwing an instance of 'std::bad_cast'
what(): std::bad_cast
Abort trap
Build with:
g++ -shared -fpic -o bayeux.bundle bayeux.cpp && ruby -I. -rbayeux -e 'puts :ok'
Update:
If I rename Init_bayeux to main() and link it to an executable, the output is the same, but no call to terminate.
Update:
When I write a little program to load the shared library and to execute Init_bayeux(), again, no exception is thrown:
#include <dlfcn.h>
int main()
{
void* handle = dlopen("bayeux.bundle", RTLD_LAZY|RTLD_GLOBAL );
void(*f)(void) = (void(*)(void)) dlsym( handle, "Init_bayeux" ) ;
f();
}
So it looks to me, that it might be a problem with how the ruby.exe was build. Does that make sense?
Update:
I had a look at the addresses containing the names of the two type_info objects. Same content, but different addresses. I added the -flat_namespace switch to the link command. Now the dynamic_cast works. The original Problem with the boost regex library still exists, but I think this might be solvable by linking boost statically into the shared library or by rebuilding the boost libraries with the -flat_namespace switch.
Update:
Now I'm back to the very first example with the boost regex expression, build with this command:
g++ -shared -flat_namespace -fPIC -o bayeux.bundle /Users/todi/boost_1_49_0/stage/lib/libboost_regex.a bayeux.cpp
But when loading the extension into the ruby interpreter, initializing of static symbols fails:
ruby(59384,0x7fff712b8cc0) malloc: *** error for object 0x7fff70b19500: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Program received signal SIGABRT, Aborted.
0x00007fff8a6ab0b6 in __kill ()
(gdb) bt
#0 0x00007fff8a6ab0b6 in __kill ()
#1 0x00007fff8a74b9f6 in abort ()
#2 0x00007fff8a663195 in free ()
#3 0x0000000100541023 in boost::re_detail::cpp_regex_traits_char_layer<char>::init (this=0x10060be50) at basic_string.h:237
#4 0x0000000100543904 in boost::object_cache<boost::re_detail::cpp_regex_traits_base<char>, boost::re_detail::cpp_regex_traits_implementation<char> >::do_get (k=#0x7fff5fbfddd0) at cpp_regex_traits.hpp:366
#5 0x000000010056005b in create_cpp_regex_traits<char> (l=<value temporarily unavailable, due to optimizations>) at pending/object_cache.hpp:69
#6 0x0000000100544c33 in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::do_assign (this=0x7fff5fbfe090, p1=0x100567158 "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?", p2=0x100567191 "", f=0) at cpp_regex_traits.hpp:880
#7 0x0000000100566280 in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::assign ()
#8 0x000000010056622d in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::assign ()
#9 0x0000000100566188 in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::basic_regex ()
#10 0x0000000100566025 in Init_bayeux ()
#11 0x0000000100003a23 in dln_load (file=0x10201a000 "/Users/todi/sioux/source/rack/bayeux.bundle") at dln.c:1293
#12 0x000000010016569d in vm_pop_frame [inlined] () at /Users/todi/.rvm/src/ruby-1.9.2-p320/vm_insnhelper.c:1465
#13 0x000000010016569d in rb_vm_call_cfunc (recv=4303980440, func=0x100042520 <load_ext>, arg=4303803000, blockptr=0x1, filename=<value temporarily unavailable, due to optimizations>, filepath=<value temporarily unavailable, due to optimizations>) at vm.c:1467
#14 0x0000000100043382 in rb_require_safe (fname=4303904640, safe=0) at load.c:602
#15 0x000000010017cbf3 in vm_call_cfunc [inlined] () at /Users/todi/.rvm/src/ruby-1.9.2-p320/vm_insnhelper.c:402
#16 0x000000010017cbf3 in vm_call_method (th=0x1003016b0, cfp=0x1004ffef8, num=1, blockptr=0x1, flag=8, id=<value temporarily unavailable, due to optimizations>, me=0x10182cfa0, recv=4303980440) at vm_insnhelper.c:528
...
Again, this doesn't fail, when I load the shared library by the little c program from above.
Update:
Now I link the first example static:
g++ -shared -fPIC -flat_namespace -nodefaultlibs -o bayeux.bundle -static -lstdc++ -lpthread -lgcc_eh -lboost_regex-mt bayeux.cpp
With the same error:
ruby(15197,0x7fff708aecc0) malloc: *** error for object 0x7fff7027e500: pointer being freed was not allocated
otool -L confirmed that every library is linked static:
bayeux.bundle:
bayeux.bundle (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.2.11)
debug:
If I link against the boost debug version, then it works like expected.
For the records: I've now build boost and my application with the very same compiler (version 4.2.1 [official apple version]). No problems so far. Why it will not work as expected when the ruby extension links all libraries statically is a miracle to me. Thank to all who put time into this issue.
Kind regards
Torsten