There is a bug in intel compiler on user-defined reduction in OpenMP which was discussed here (including the wrokaround). Now I want to pass the vector to a function and do the same thing but I get this error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
This is the example:
#include <iostream>
#include <vector>
#include <algorithm>
#include "omp.h"
#pragma omp declare reduction(vec_double_plus : std::vector<double> : \
std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<double>())) \
initializer(omp_priv = omp_orig)
int foo(std::vector<double> &w){
#pragma omp parallel reduction(vec_double_plus:w)
{
#pragma omp for
for (int i = 0; i < 2; ++i)
for (int j = 0; j < w.size(); ++j)
w[j] += 1;
};
return 0;
}
int main() {
omp_set_num_threads(2);
std::vector<double> w(10,0);
foo(w);
for(auto i:w)
if(i != 2)
std::cout << i << std::endl;
return 0;
}
Again it works fine with GNU/6.4.0 but fails with intel/2018.1.163. Any ideas?
Update: I changed the values to make it easier to debug. I work on a remote node, so I am using terminal. I used gdb to debug the code that was compiled with intel/2018.1.163. I'm not sure if it is the right thing to do, or if there is a better way to debug the code. This is the error from gdb:
[New Thread 0x2aaaac68a780 (LWP 15573)]
terminate called recursively
terminate called after throwing an instance of 'std::bad_alloc
Program received signal SIGABRT, Aborted.
0x00002aaaabaf91f7 in raise () from /lib64/libc.so.6
And, this is the cmake configuration:
cmake_minimum_required(VERSION 3.2)
project(openmp_reduction001)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Debug)
find_package(OpenMP)
if(OPENMP_FOUND)
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
endif()
add_executable(openmp_reduction001 main.cpp)
Update2: The result of backtrace in gdb is added in the following. The computing node that I used has the intel compiler module loaded as the default compiler, but it looks into /usr/include/c++/4.8.5 for the header files. Is that normal? I looked into /usr/include/c++/. It only includes 4.4.7, 4.8.2, 4.8.5 folders. Another issue is at line #12, in which the length of the vector is -15, which probably causes the std::allocator to set its n parameter a very large number.
#0 0x00002aaaabaf91f7 in raise () from /lib64/libc.so.6
#1 0x00002aaaabafa8e8 in abort () from /lib64/libc.so.6
#2 0x00002aaaaad2fa55 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3 0x00002aaaaad2da36 in ?? () from /lib64/libstdc++.so.6
#4 0x00002aaaaad2da63 in std::terminate() () from /lib64/libstdc++.so.6
#5 0x00002aaaaad2dc83 in __cxa_throw () from /lib64/libstdc++.so.6
#6 0x00002aaaaad826d2 in std::__throw_bad_alloc() () from /lib64/libstdc++.so.6
#7 0x0000000000404022 in __gnu_cxx::new_allocator<double>::allocate (this=0x2aaaac689af0, __n=18446744073709551602)
at /usr/include/c++/4.8.5/ext/new_allocator.h:102
#8 0x0000000000403856 in std::_Vector_base<double, std::allocator<double> >::_M_allocate (this=0x2aaaac689af0, __n=18446744073709551602)
at /usr/include/c++/4.8.5/bits/stl_vector.h:168
#9 0x000000000040394b in std::_Vector_base<double, std::allocator<double> >::_M_create_storage (this=0x7fffffffa370, __n=18446744073709551601)
at /usr/include/c++/4.8.5/bits/stl_vector.h:181
#10 0x00000000004037a6 in std::_Vector_base<double, std::allocator<double> >::_Vector_base (this=0x7fffffffa370, __n=18446744073709551601, __a=...)
at /usr/include/c++/4.8.5/bits/stl_vector.h:136
#11 0x00000000004037fa in std::_Vector_base<double, std::allocator<double> >::_Vector_base (this=0x7fffffffa370)
at /usr/include/c++/4.8.5/bits/stl_vector.h:134
#12 0x0000000000403b15 in std::vector<double, std::allocator<double> >::vector (this=0x7fffffffa370,
__x=std::vector of length -15, capacity -17592185515333 = {...}) at /usr/include/c++/4.8.5/bits/stl_vector.h:312
#13 0x0000000000402cd3 in __udr_i_0x914e698 (__omp_priv=0x7fffffffa370, __omp_orig=0x7fffffffa838)
at /uufs/chpc.utah.edu/common/home/u1013493/openmp_reduction001/main.cpp:8
#14 0x0000000000402e7d in L__Z3fooRSt6vectorIdSaIdEE_14__par_region0_2_4 () at /uufs/chpc.utah.edu/common/home/u1013493/openmp_reduction001/main.cpp:14
#15 0x00002aaaab39e7a3 in __kmp_invoke_microtask ()
from /uufs/chpc.utah.edu/sys/installdir/intel/compilers_and_libraries_2018.1.163/linux/compiler/lib/intel64/libiomp5.so
Related
I'm building a simple utility program that queries a mysql database, and uses regex to isolate strings in the table data.
I'm using MariaDB c++/connector, and the latest versions of MariaDB. The code was copied from the MariaDB website. I have simplified the software to illustrate the problem. See below:
// g++ -o mariadb_connect mariadb_connect.cpp -lmariadbcpp
// From https://mariadb.com/docs/clients/connector-cpp/
// with three additional lines that cause segfault
#include <iostream>
#include <mariadb/conncpp.hpp>
#include <regex> // <-- Added to the example
int main()
{
try
{
// Instantiate Driver
sql::Driver* driver = sql::mariadb::get_driver_instance();
// Configure Connection
// The URL or TCP connection string format is
// ``jdbc:mariadb://host:port/database``.
sql::SQLString url("jdbc:mariadb://localhost:3306/??????");
// Use a properties map for the user name and password
sql::Properties properties({
{"user", "???????"},
{"password", "????????"}
});
// Establish Connection
// Use a smart pointer for extra safety
std::unique_ptr<sql::Connection> conn(driver->connect(url, properties));
// Use Connection
std::cout << "Using the connection" << std::endl; // <-- Added
std::regex regexp("(faststatic.com)(.*)"); // <-- Added (Causes segfault)
// Close Connection
conn->close();
}
// Catch Exceptions
catch (sql::SQLException& e)
{
std::cout << "Error Connecting to MariaDB Platform: "
<< e.what() << std::endl;
// Exit (Failed)
return 1;
}
// Exit (Success)
return 0;
}
(???? used for private data)
Compiled with g++ on an AWS EC2 instance running Amazon Linux 2 AMI.
Compiles fine and runs fine until I added the std::regex regexp(...)
line. It still compiles fine with the addition, but on execution calls
a segfault.
I have used gdb which provides the following output with breakpoint set
to main.
(gdb) b main
Breakpoint 1 at 0x40404b: file mariadb_connect.cpp, line 15.
(gdb) run
Starting program: /home/msellers/proj/preload_images/spike/mariadb_connect
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x000000000064a588 in ?? ()
Here is the output of the gdb bt command after the segfault:
(gdb) bt
#0 0x000000000064a588 in ?? ()
#1 0x0000000000409155 in std::__detail::_Scanner<char>::_M_scan_normal (this=0x7fffffffe018) at /usr/include/c++/7/bits/regex_scanner.tcc:119
#2 0x00000000004084a1 in std::__detail::_Scanner<char>::_M_advance (this=0x7fffffffe018) at /usr/include/c++/7/bits/regex_scanner.tcc:80
#3 0x00007ffff7c3e060 in std::__detail::_Compiler<std::regex_traits<char> >::_M_match_token (this=this#entry=0x7fffffffe000, token=std::__detail::_ScannerBase::_S_token_subexpr_begin) at /usr/local/include/c++/4.9.4/bits/regex_compiler.tcc:541
#4 0x00007ffff7c513a2 in std::__detail::_Compiler<std::regex_traits<char> >::_M_match_token (token=std::__detail::_ScannerBase::_S_token_subexpr_begin, this=0x7fffffffe000) at /usr/local/include/c++/4.9.4/bits/regex_compiler.tcc:316
#5 std::__detail::_Compiler<std::regex_traits<char> >::_M_atom (this=this#entry=0x7fffffffe000) at /usr/local/include/c++/4.9.4/bits/regex_compiler.tcc:326
#6 0x00007ffff7c515b0 in std::__detail::_Compiler<std::regex_traits<char> >::_M_term (this=0x7fffffffe000) at /usr/local/include/c++/4.9.4/bits/regex_compiler.tcc:136
#7 std::__detail::_Compiler<std::regex_traits<char> >::_M_alternative (this=0x7fffffffe000) at /usr/local/include/c++/4.9.4/bits/regex_compiler.tcc:118
#8 0x00007ffff7c51809 in std::__detail::_Compiler<std::regex_traits<char> >::_M_disjunction (this=this#entry=0x7fffffffe000) at /usr/local/include/c++/4.9.4/bits/regex_compiler.tcc:97
#9 0x00007ffff7c51e18 in std::__detail::_Compiler<std::regex_traits<char> >::_Compiler (this=0x7fffffffe000, __b=<optimized out>, __e=<optimized out>, __traits=..., __flags=<optimized out>)
at /usr/local/include/c++/4.9.4/bits/regex_compiler.tcc:82
#10 0x00007ffff7c5222d in std::__detail::__compile_nfa<std::regex_traits<char> > (__first=<optimized out>, __last=<optimized out>, __traits=..., __flags=<optimized out>) at /usr/local/include/c++/4.9.4/bits/regex_compiler.h:158
#11 0x00007ffff7c524da in std::basic_regex<char, std::regex_traits<char> >::basic_regex<char const*> (__f=<optimized out>, __last=<optimized out>, __first=<optimized out>, this=0x7ffff7dc2a40 <sql::mariadb::UrlParser::URL_PARAMETER>)
at /usr/local/include/c++/4.9.4/bits/regex.h:540
#12 std::basic_regex<char, std::regex_traits<char> >::basic_regex (this=0x7ffff7dc2a40 <sql::mariadb::UrlParser::URL_PARAMETER>, __p=<optimized out>, __f=<optimized out>) at /usr/local/include/c++/4.9.4/bits/regex.h:452
#13 0x00007ffff7c331ee in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /home/buildbot/src/src/UrlParser.cpp:34
#14 _GLOBAL__sub_I_UrlParser.cpp(void) () at /home/buildbot/src/src/UrlParser.cpp:444
#15 0x00007ffff7de7dc2 in call_init (l=<optimized out>, argc=argc#entry=1, argv=argv#entry=0x7fffffffe2b8, env=env#entry=0x7fffffffe2c8) at dl-init.c:72
#16 0x00007ffff7de7eb6 in call_init (env=0x7fffffffe2c8, argv=0x7fffffffe2b8, argc=1, l=<optimized out>) at dl-init.c:119
#17 _dl_init (main_map=0x7ffff7ffe130, argc=1, argv=0x7fffffffe2b8, env=0x7fffffffe2c8) at dl-init.c:120
#18 0x00007ffff7dd9f2a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#19 0x0000000000000001 in ?? ()
#20 0x00007fffffffe520 in ?? ()
#21 0x0000000000000000 in ?? ()
(gdb)
Does this help?
Mark
GCC version 7.3.1
In the backtrace, we see that the crash is happening in the GCC-7 regexp implementation:
#1 0x0000000000409155 in std::__detail::_Scanner<char>::_M_scan_normal (this=0x7fffffffe018) at /usr/include/c++/7/bits/regex_scanner.tcc:119
We also see that this crash is happening while some global inside (presumably1) MariaDB connector is being initialized, while using GCC-4.9.4 version of libstdc++:
#12 std::basic_regex<char, std::regex_traits<char> >::basic_regex (this=0x7ffff7dc2a40 <sql::mariadb::UrlParser::URL_PARAMETER>, __p=<optimized out>, __f=<optimized out>) at /usr/local/include/c++/4.9.4/bits/regex.h:452
#13 0x00007ffff7c331ee in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /home/buildbot/src/src/UrlParser.cpp:34
It is exceedingly likely that this 4.9.4 vs. 7.3.1 mismatch is the cause of the crash, and that either building the app with g++-4.9.4 or building the MariaDB with g++-7.3.1 will fix the problem.
In theory GCC version of libstdc++ should be backwards compatible, but verifying ABI compatibility in C++ is quite hard, and many mistakes have been made. Also, g++4.9.4 is ancient.
Another possible solution is to build the application with clang using libc++ -- this will avoid any possibility of symbol conflicts2.
1 You can verify whether frame #13 is really coming from the MariaDB by executing these GDB commands: frame 13, info symbol $pc.
2 To achieve this, you may need to explicitly tell clang to use libc++, as it may default to using libstdc++. Use clang++ -stdlib=libc++ ... to be sure. Documentation here.
I am building a custom protoc-compiler that is based on googles c++ libraries for protobuf.
I ran into a strange error when running it on linux, while it runs fine on MacOS
terminate called after throwing an instance of 'std::system_error'
what(): Unknown error -1
After setting up and trying around with my debugger this is the stacktrace:
#1 0x00007f61c097b897 in abort () from /usr/lib/libc.so.6
#2 0x00007f61c0d1381d in __gnu_cxx::__verbose_terminate_handler () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007f61c0d204da in __cxxabiv1::__terminate (handler=<optimized out>) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x00007f61c0d20537 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5 0x00007f61c0d2078e in __cxxabiv1::__cxa_throw (obj=obj#entry=0x5568aec89df0, tinfo=tinfo#entry=0x7f61c0e5a750 <typeinfo for std::system_error>, dest=dest#entry=0x7f61c0d4cc60 <std::system_error::~system_error()>) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#6 0x00007f61c0d167ff in std::__throw_system_error (__i=-1) at /build/gcc/src/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:89
#7 0x00007f61c14b5c63 in std::call_once<void (&)(google::protobuf::internal::DescriptorTable const*), google::protobuf::internal::DescriptorTable const*&> (__f=#0x7f61c14c0780: {void (const google::protobuf::internal::DescriptorTable *)} 0x7f61c14c0780 <google::protobuf::(anonymous namespace)::AssignDescriptorsImpl(google::protobuf::internal::DescriptorTable const*)>, __once=...) at /usr/include/c++/7/mutex:698
#8 google::protobuf::internal::AssignDescriptors (table=<optimized out>, table#entry=0x7f61c17dddc0 <descriptor_table_google_2fprotobuf_2fdescriptor_2eproto>) at google/protobuf/generated_message_reflection.cc:2407
#9 0x00007f61c148f440 in google::protobuf::FileDescriptorProto::GetMetadataStatic () at ./google/protobuf/descriptor.pb.h:623
#10 google::protobuf::FileDescriptorProto::GetMetadata (this=<optimized out>) at google/protobuf/descriptor.pb.cc:2281
#11 0x00005568acbb22ed in google::protobuf::Message::GetReflection (this=0x7ffd47e4efc0) at /home/leo/CLionProjects/protoc-gen-java-leo/protoc/include/google/protobuf/message.h:333
#12 0x00005568acbad61f in google::protobuf::compiler::java_leo::(anonymous namespace)::CollectExtensions (message=..., extensions=0x7ffd47e4eef0) at /home/leo/CLionProjects/protoc-gen-java-leo/src/google/protobuf/compiler/java_leo/java_file.cc:84
#13 0x00005568acbad8ca in google::protobuf::compiler::java_leo::(anonymous namespace)::CollectExtensions (file_proto=..., alternate_pool=..., extensions=0x7ffd47e4eef0, file_data="\n\021addressbook.proto\022\btutorial\032\037google/protobuf/timestamp.proto\032\roptions.proto\"\255\002\n\006Person\022\"\n\002id\030\001 \001(\tB\022\222\202\031\016java.util.UUIDR\002id\022\022\n\004name\030\002 \001(\tR\004name\022\020\n\003age\030\003 \001(\005R\003age\022\024\n\005email\030\004 \001(\tR\005email\022\064\n\006phones\030\005 \003(\v2\034.tutorial.Person.PhoneNumberR\006phones\022=\n\flast_updated\030\006 \001(\v2\032.google.protobuf.TimestampR\vlastUpdated\032N\n\vPhoneNumber\022\026\n\006number\030\001 \001(\tR\006number\022'\n\004type\030\002 \001(\016\062\023.tutorial.PhoneTypeR\004type\"7\n\vAddressBook\022(\n\006people\030\001 \003(\v2\020.tutorial.PersonR\006people*+\n\tPhoneType\022\n\n\006MOBILE\020\000\022\b\n\004HOME\020\001\022\b\n\004WORK\020\002B+\n\024com.example.tutorialB\021AddressBookProtosP\001b\006proto3") at /home/leo/CLionProjects/protoc-gen-java-leo/src/google/protobuf/compiler/java_leo/java_file.cc:122
#14 0x00005568acbaf23f in google::protobuf::compiler::java_leo::FileGenerator::GenerateDescriptorInitializationCodeForImmutable (this=0x5568aec7d590, printer=0x7ffd47e4f380) at /home/leo/CLionProjects/protoc-gen-java-leo/src/google/protobuf/compiler/java_leo/java_file.cc:439
#15 0x00005568acbaed2a in google::protobuf::compiler::java_leo::FileGenerator::Generate (this=0x5568aec7d590, printer=0x7ffd47e4f380) at /home/leo/CLionProjects/protoc-gen-java-leo/src/google/protobuf/compiler/java_leo/java_file.cc:351
#16 0x00005568acbb73ea in google::protobuf::compiler::java_leo::JavaGenerator::Generate (this=0x7ffd47e4f788, file=0x5568aec77500, parameter="", context=0x7ffd47e4f5f0, error=0x7ffd47e4f5d0) at /home/leo/CLionProjects/protoc-gen-java-leo/src/google/protobuf/compiler/java_leo/java_generator.cc:158
#17 0x00007f61c0f06fae in google::protobuf::compiler::CodeGenerator::GenerateAll (this=0x7ffd47e4f788, files=std::vector of length 2, capacity 2 = {...}, parameter="", generator_context=0x7ffd47e4f5f0, error=0x7ffd47e4f5d0) at google/protobuf/compiler/code_generator.cc:58
#18 0x00007f61c0f16733 in google::protobuf::compiler::GenerateCode (request=..., generator=..., response=response#entry=0x7ffd47e4f6a0, error_msg=error_msg#entry=0x7ffd47e4f680) at google/protobuf/compiler/plugin.cc:133
#19 0x00007f61c0f16b17 in google::protobuf::compiler::PluginMain (argc=<optimized out>, argv=0x7ffd47e4f8c8, generator=0x7ffd47e4f788) at google/protobuf/compiler/plugin.cc:169
#20 0x00005568acbe9ed5 in main (argc=1, argv=0x7ffd47e4f8c8) at /home/leo/CLionProjects/protoc-gen-java-leo/main.cpp:10
#21 0x00007f61c097d153 in __libc_start_main () from /usr/lib/libc.so.6
#22 0x00005568acb9598e in _start ()
Since I'm a total C++ noob, it took me around 8 hours trying stuff out and googleing around before I found out the reason, so I wanted to share it here for the next person that might run into this problem.
It turns out, that the "pthread"-library is required and it crashes while calling the "call_once"-method if it's not present. A better error-message would have been great, "what(): Unknown error -1" didn't help out a lot :D
All I did was to add this line in my CMakeLists.txt:
target_link_libraries(${CMAKE_PROJECT_NAME} pthread)
Now it runs like a charm :)
Hope this helps someone.
I am using C++ for a program retrieving informations about files. Among them, I want to find out the MIME type of a given file.
To do so I use libmagic as follow:
#include <iostream>
#include <string>
#include <magic.h>
void foo (std::string path)
{
magic_t magic;
magic = magic_open (MAGIC_MIME_TYPE);
magic_load(magic, NULL);
magic_compile(magic, NULL);
std::string filetype (magic_file(magic, path.c_str()));
magic_close(magic);
std::cout << filetype << std::endl;
}
int main(int argc, char *argv[])
{
std::string str = "test.cxx";
foo (str);
return 0;
}
Trying on a computer running on Debian Jessie with gcc 4.9.2 and glibc 2.19, it works just fine.
However, on another computer on arch linux with gcc 5.1.0 and glibc 2.21, I have the following at runtime:
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct null not valid
gdb gives me additional information:
Program received signal SIGABRT, Aborted.
0x00007ffff6fb1528 in raise () from /usr/lib/libc.so.6
#0 0x00007ffff6fb1528 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff6fb293a in abort () from /usr/lib/libc.so.6
#2 0x00007ffff78c9b3d in __gnu_cxx::__verbose_terminate_handler ()
at /build/gcc/src/gcc-5-20150519/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007ffff78c7996 in __cxxabiv1::__terminate (handler=<optimized out>)
at /build/gcc/src/gcc-5-20150519/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x00007ffff78c79e1 in std::terminate ()
at /build/gcc/src/gcc-5-20150519/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5 0x00007ffff78c7bf8 in __cxxabiv1::__cxa_throw (obj=0x613fb0,
tinfo=0x7ffff7baea78 <typeinfo for std::logic_error>,
dest=0x7ffff78dd040 <std::logic_error::~logic_error()>)
at /build/gcc/src/gcc-5-20150519/libstdc++-v3/libsupc++/eh_throw.cc:87
#6 0x00007ffff78f08bf in std::__throw_logic_error (
__s=__s#entry=0x7ffff7976100 "basic_string::_S_construct null not valid")
at /build/gcc/src/gcc-5-20150519/libstdc++-v3/src/c++11/functexcept.cc:74
#7 0x00007ffff790acef in std::string::_S_construct<char const*> (__beg=<optimized out>,
__end=<optimized out>, __a=...)
at /build/gcc/src/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:577
#8 0x00007ffff790b0e6 in _S_construct_aux<char const*> (__a=..., __end=<optimized out>,
__beg=0x0)
at /build/gcc/src/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:4136
#9 _S_construct<char const*> (__a=..., __end=<optimized out>, __beg=0x0)
at /build/gcc/src/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string---Type <return> to continue, or q <return> to quit---
.h:4157
#10 std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (
this=0x7fffffffe980, __s=0x0, __a=...)
at /build/gcc/src/gcc-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:659
#11 0x0000000000400df3 in foo (path="test.cxx") at test.cxx:11
#12 0x0000000000400ece in main (argc=1, argv=0x7fffffffeae8) at test.cxx:21
So I'm not quite sure if I can solve my problem, or is there a possible bug coming from glibc or libmagic?
I'm having trouble using boost_threads with clang. The clang version is 3.6.0 and boost version is 1.55.0 from the new Ubuntu 15.04. Program that used to work with previous versions of clang now segfaults at startup. There is no problems when I use g++ instead.
Here is an example program to illustrate the point.
#include <iostream>
#include <boost/thread.hpp>
using namespace std;
void output() {
try {
int x = 0;
for (;;) {
boost::this_thread::sleep(boost::posix_time::milliseconds(100));
cerr << x++ << endl;
}
} catch (boost::thread_interrupted&) {}
}
int main(int argc, char* argv[]) {
try {
boost::thread output_worker(output);
boost::this_thread::sleep(boost::posix_time::milliseconds(1000));
output_worker.interrupt();
output_worker.join();
} catch (...) {
cerr << "Unexpected error!" << endl;
exit(1);
}
}
If I compile it with g++ it works, i.e.
g++ thread.cpp -lboost_thread -lboost_system
If I compile it with clang
clang++ thread.cpp -lboost_thread -lboost_system
I get a segfault with the gdb trace below
Starting program: /home/dejan/test/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bd0580 in boost::exception_ptr boost::exception_detail::get_static_exception_object<boost::exception_detail::bad_alloc_>() ()
from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0
(gdb) bt
#0 0x00007ffff7bd0580 in boost::exception_ptr boost::exception_detail::get_static_exception_object<boost::exception_detail::bad_alloc_>() ()
from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0
#1 0x00007ffff7bcb16a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0
#2 0x00007ffff7de95ba in call_init (l=<optimized out>, argc=argc#entry=1, argv=argv#entry=0x7fffffffdf98, env=env#entry=0x7fffffffdfa8)
at dl-init.c:72
#3 0x00007ffff7de96cb in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=<optimized out>) at dl-init.c:30
#4 _dl_init (main_map=0x7ffff7ffe188, argc=1, argv=0x7fffffffdf98, env=0x7fffffffdfa8) at dl-init.c:120
#5 0x00007ffff7dd9d0a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6 0x0000000000000001 in ?? ()
#7 0x00007fffffffe2fe in ?? ()
#8 0x0000000000000000 in ?? ()
Am I doing something wrong?
Compiling using clang -std=c++11 makes boost change its internal implementation and actually solves the segmentation fault.
It is not an ideal solution, but it is the way I will be going with our code.
Using different gcc optimizations my program dies due different OS signals and I wonder if the cause is the same or not.
I was getting a core dump due a abort() in a c++ multithread program compiled using O2.
Program terminated with signal 6, Aborted.
#0 0x00007ff2572d28a5 in raise () from /lib64/libc.so.6
I just was not able to find out which was the cause as it seems to be in a local std::vector destructor.. that made no
sense for me.
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ff248d6c700 (LWP 16767))]#0 0x00007ff2572d28a5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ff2572d28a5 in raise () from /lib64/libc.so.6
#1 0x00007ff2572d4085 in abort () from /lib64/libc.so.6
#2 0x00007ff25730fa37 in __libc_message () from /lib64/libc.so.6
#3 0x00007ff257315366 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ff257317e93 in _int_free () from /lib64/libc.so.6
#5 0x000000000044dd45 in deallocate (this=0x7ff250389610) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/ext/new_allocator.h:95
#6 _M_deallocate (this=0x7ff250389610) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_vector.h:146
#7 ~_Vector_base (this=0x7ff250389610) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_vector.h:132
#8 ~vector (this=0x7ff250389610) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_vector.h:313
#9 ...
Studing deeper the code I realized that the vector was initialized using another vector comming from other thread and,
here is the point, no mutex was used to do that. In order to simplify
I wrote this code that reproduces that. (please ignore that stopThread is not protected)
void* doWork(void*)
{
while(!stopThread)
{
double min = std::numeric_limits<int>::max();
double max = std::numeric_limits<int>::min();
pthread_mutex_lock(&_mutex);
std::vector<double> localVector = (sharedVector);
sharedVector.clear();
pthread_mutex_unlock(&_mutex);
for(unsigned int index = 0; index < localVector.size(); ++index)
{
std::cout << "Thread 2 " << localVector[index] << ", " << std::endl;
if(min > localVector[index])
{
min = localVector[index];
}
if(max < localVector[index])
{
max = localVector[index];
}
}
}
return NULL;
}
int main()
{
pthread_mutex_init(&_mutex, NULL);
stopThread = false;
pthread_create(&_thread, NULL, doWork, NULL);
for(int i = 0; i < 10000; i++)
{
sharedVector.push_back(i);
std::cout << "Thread 1 " << i << std::endl;
usleep(5000);
}
stopThread = true;
pthread_join(_thread, NULL);
pthread_cancel(_thread);
std::cout << "Finished! " << std::endl;
}
I fixed that but I cannot say that I solved the problem (I know I fixed a problem but not the problem I was looking for) as the core happens once per month more or less.
So I decided to compile using O0 to see If i can see more details in the core file and then I forced the program to crash. Now, what I have is a Segfault where I expected.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f4598f70cd7 in memmove () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f4598f70cd7 in memmove () from /lib64/libc.so.6
#1 0x000000000045fb84 in std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<double> (__first=0x7f4580977ba0, __last=0x7f4580977ba8, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_algobase.h:378
#2 0x0000000000465f01 in std::__copy_move_a<false, double const*, double*> (__first=0x7f4580977ba0, __last=0x7f4580977ba8, __result=0x0) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_algobase.h:397
#3 0x0000000000465e66 in std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*> (__first=4.3559999999999999, __last=3.1560000000000001, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_algobase.h:436
#4 0x0000000000465d6d in std::copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*> (__first=4.3559999999999999, __last=3.1560000000000001, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_algobase.h:468
#5 0x0000000000465c84 in std::__uninitialized_copy<true>::uninitialized_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*> (__first=4.3559999999999999, __last=3.1560000000000001,
__result=0x0) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_uninitialized.h:93
#6 0x0000000000465ad9 in std::uninitialized_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*> (__first=4.3559999999999999, __last=3.1560000000000001, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_uninitialized.h:117
#7 0x0000000000465718 in std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*, double> (__first=4.3559999999999999, __last=3.1560000000000001, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_uninitialized.h:257
#8 0x00000000004650f9 in std::vector<double, std::allocator<double> >::vector (this=0x7f4594d90d70, __x=std::vector of length 1, capacity 4 = {...})
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_vector.h:243
#9 ...
I look for some documentation but i found nothing saying that the type of error can change due to the optimization.
However, I run the code above, that reproduces the problem and compiling with O0 a Segmentation fault happens but compiling with O2
it finishs fine.
Thanks for your time
You're locking the mutex while the worker thread access the shared vector; but not when the main thread modifies it. You need to guard all accesses to shared mutable data.
for(int i = 0; i < 10000; i++)
{
pthread_mutex_lock(&_mutex); // Add this
sharedVector.push_back(i);
pthread_mutex_unlock(&_mutex); // Add this
std::cout << "Thread 1 " << i << std::endl;
usleep(5000);
}
You might also consider using a condition variable to notify the worker thread when the vector changes, so that the worker doesn't consume resources busy-waiting.