std::lock_guard<std::mutex> segfaults on construction?

std::lock_guard<std::mutex> segfaults on construction? - c++

I'm attempting to access a shared std::queue using a std::mutex and a std::lock_guard. The mutex (pending_md_mtx_) is a member variable of another object (whose address is valid). My code seems to be segfault'ing on the construction of the lock_guard.
Any ideas? Should I be using a std::unique_lock (or some other object) instead? Running GCC 4.6 (--std=c++0x) under Ubuntu Linux. I can't post the entire class, but the only accesses to the mutex and queue listed below.
template <typename ListenerT>
class Driver
{
public:
template <typename... Args>
Driver(Args&&... args) :
listener_(std::forward<Args>(args)...) {}
void enqueue_md(netw::Packet* packet)
{
std::lock_guard<std::mutex> lock(pending_md_mtx_);
pending_md_.push(packet);
}
void process_md()
{
std::lock_guard<std::mutex> lock(pending_md_mtx_);
while (pending_md_.size())
{
netw::Packet* pkt=pending_md_.front();
pending_md_.pop();
process_md(*pkt);
}
}
//... Other code which I can't post...
private:
ListenerT listener_;
std::mutex pending_md_mtx_;
std::queue<netw::Packet*> pending_md_;
};
GDB Stacktrace:
(gdb) bt
#0 __pthread_mutex_lock (mutex=0x2f20aa75e6f4000) at pthread_mutex_lock.c:50
#1 0x000000000041a2dc in __gthread_mutex_lock (__mutex=0xff282ceacb40) at /usr/include/c++/4.6/x86_64-linux-gnu/./bits/gthr-default.h:742
#2 lock (this=0xff282ceacb40) at /usr/include/c++/4.6/mutex:90
#3 lock_guard (__m=..., this=0x7f2874fc4db0) at /usr/include/c++/4.6/mutex:445
#4 driver::Driver<Listener, false>::enqueue_md (this=0xff282ceac8a0, packet=...) at exec/../../driver/Driver.hpp:95

I was getting a segfault on constructing the std::lock_guard, turns out my code was using an uninitialized std::shared_ptr<my_object_with_mutex>. Using a properly constructed my_object_with_mutex resolves the problem.

I recently encountered this problem. It was caused by line of code causing a buffer overrun after acquiring the lock. It would seem odd for a line of code below the lock to be causing a problem a few lines earlier, but I suppose the buffer overrun would cause some corruption that causes a problem on a second call to the function.

The issue rootcause in my case:
An object A references object B
On call to object B.func() I see a SegFault on lock_guard
Object B has never been set for object A (not initialized, a NULL pointer), leading to a SegFault on accessing a field (mutex, in my case).
The error could be diagnosed from GDB by noticing this=0x0:
...
#4 0x000055e3a9e14a3c in B<C>::write (this=0x4e2280, msg=0x55e3aac03be0) at /proj/B.hpp:35
#5 0x000055e3a9e206e6 in A::write (this=0x0, msg=0x55e3aac03be0) at /proj/A.cpp:286
#6 0x000055e3a9e2069a in A::write (this=0x7f21eae64010, msg=0x55e3aac03be0) at /proj/A.cpp:277
...

In my case the root cause was the same (object with mutex uninitialized), but the reason was not.
The object that had the mutex had a function reset. Guess what, shared_ptr has also a function named reset, and I called that instead!
Avoid using reset as a name, or double-check if you're not using obj.reset() instead of obj->reset()!

Related

Caller failing to hold lock before calling function std::_Mutex_base::unlock

I want to create my own mutex class for usage with std::lock_guard etc. I have a simple implementation:
class A
{
std::mutex m;
public:
void lock() {
m.lock();
}
void unlock() {
m.unlock();
}
};
but when compiling with MSVC, it gives a warning caller failing to hold lock before calling function std::_Mutex_base::unlock. Why is that?

I think that warning from the compiler is fair, its saying the unlock function may attempt to unlock a mutex which is not locked, see: https://www.cplusplus.com/reference/mutex/mutex/unlock/ :
If the mutex is not currently locked by the calling thread, it causes
undefined behavior.
Undefined behaviour is a very very bad thing so the warning is good there as if this circumstance occoured then you are in trouble.
So i don't think theres much you can do about that using std::mutex. However if you were using posix mutex theres an error code it would return which you could i guess just ignore: https://linux.die.net/man/3/pthread_mutex_unlock and then react to the other error codes.
EPERM
The current thread does not own the mutex.

How do I avoid or suppress the race in this lock free stack?

I'm using a lock free stack (via tagged pointers) to manage a pool of small blocks of memory. The list nodes are created and destroyed in-place when the blocks are inserted into, and removed from, the pool.
This is a very simplified test program, which only pops from the stack. So, no ABA problem and no tagged pointers. It is sufficient to demonstrate the race I'm running into:
#include <atomic>
#include <list>
#include <thread>
#include <type_traits>
struct Node {
Node() = default;
Node(Node *n) { next.store(n); }
std::atomic<Node *> next;
};
using Memory = std::aligned_storage_t<sizeof(Node)>;
struct Stack {
bool pop_and_use() {
for (Node *current_head = head.load(); current_head;) {
Node *next = current_head->next.load(); // READ RACE
if (head.compare_exchange_weak(current_head, next, std::memory_order_seq_cst)) {
current_head->~Node();
Memory *mem = reinterpret_cast<Memory *>(current_head);
new (mem) int{0}; // use memory with non-atomic write (WRITE RACE)
return true;
}
}
return false;
}
void populate(Memory *mem, int count) {
for (int i = 0; i < count; ++i) {
head = new (mem + i) Node(head.load());
}
}
std::atomic<Node *> head{};
};
int main() {
Memory storage[10000];
Stack test_list;
test_list.populate(storage, 10000);
std::thread worker([&test_list]() {
while (test_list.pop_and_use()) {
};
});
while (test_list.pop_and_use()) {};
worker.join();
return 0;
}
Thread sanitizer reports the following error:
clang++-10 -fsanitize=thread tsan_test_2.cpp -o tsan_test_2 -O2 -g2 -Wall -Wextra && ./tsan_test_2
LLVMSymbolizer: error reading file: No such file or directory
==================
WARNING: ThreadSanitizer: data race (pid=35998)
Atomic read of size 8 at 0x7fff48bd57b0 by thread T1:
#0 __tsan_atomic64_load <null> (tsan_test_2+0x46d88e)
#1 std::__atomic_base<Node*>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/atomic_base.h:713:9 (tsan_test_2+0x4b3e6c)
#2 std::atomic<Node*>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/atomic:452:21 (tsan_test_2+0x4b3e6c)
#3 Stack::pop_and_use() /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:17:39 (tsan_test_2+0x4b3e6c)
#4 main::$_0::operator()() const /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:40:22 (tsan_test_2+0x4b3e6c)
#5 void std::__invoke_impl<void, main::$_0>(std::__invoke_other, main::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:60:14 (tsan_test_2+0x4b3e6c)
#6 std::__invoke_result<main::$_0>::type std::__invoke<main::$_0>(main::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:95:14 (tsan_test_2+0x4b3e6c)
#7 decltype(std::__invoke(_S_declval<0ul>())) std::thread::_Invoker<std::tuple<main::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:244:13 (tsan_test_2+0x4b3e6c)
#8 std::thread::_Invoker<std::tuple<main::$_0> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:253:11 (tsan_test_2+0x4b3e6c)
#9 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196:13 (tsan_test_2+0x4b3e6c)
#10 <null> <null> (libstdc++.so.6+0xbd6de)
Previous write of size 4 at 0x7fff48bd57b0 by main thread:
#0 Stack::pop_and_use() /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:21:9 (tsan_test_2+0x4b3d5d)
#1 main /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:43:20 (tsan_test_2+0x4b3d5d)
Location is stack of main thread.
Location is global '??' at 0x7fff48bad000 ([stack]+0x0000000287b0)
Thread T1 (tid=36000, running) created by main thread at:
#0 pthread_create <null> (tsan_test_2+0x4246bb)
#1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xbd994)
#2 __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 (libc.so.6+0x21b96)
SUMMARY: ThreadSanitizer: data race (/home/BOSDYN/akhripin/tmp/tsan_test_2+0x46d88e) in __tsan_atomic64_load
==================
ThreadSanitizer: reported 1 warnings
The problem arises when the two threads read the same value of current_head, but one of them completes the pop and overwrites the node before the other has a chance to read current_head->next.
This is similar to the problem discussed here: Why would 'deleting' nodes in this lock-free stack class would cause race condition? except the memory is not actually being deallocated.
I know that from the machine's perspective, this race is benign -- if the read race occurs, the compare-and-swap will not succeed -- but I think this is still getting into undefined behavior territory in C++.
Is there any way to write this code without getting a race condition?
Is there any way to annotate the code to make thread sanitizer ignore it? I experimented with __tsan_acquire and __tsan_release but could not find something that consistently worked.
Update I'm pretty convinced that there is no way to perform the atomic read safely in standard C++ -- the object just doesn't exist any more. But -- can I go from relying on undefined behavior to relying on implementation-defined behavior? What's the best I could do, given typical architectures and toolchains (x86/ARM, gcc/clang)?
Update 2 One implementation-specific approach that seems to work is to replace the load with inline assembly:
inline Node *load_next_wrapper(Node *h) {
Node *ret;
asm volatile("movq (%1), %0" : "=r"(ret) : "r"(&h->next));
return ret;
}
This is both architecture and compiler specific -- but I think this does replace "undefined" behavior with "implementation-defined" behavior.

Tagged pointers are fine if you simply want to reuse the same nodes in the data structure, i.e., you don't destroy it, but simply put it on a free-list so it can be reused when you need a new node in the next push operation. In this case tagged pointers are sufficient to prevent the ABA problem, but they are no solution to the _ memory reclamation problem_ that you face here.
Another object of some type will be constructed in the same location. Eventually, it will be destroyed and the memory would return to the pool.
This is the real issue - you are destroying the object and reusing the memory for something else. As many others have already explained in the comments this causes undefined behavior. I am not sure what you mean by "return to the pool" - return to the memory manager? Ignoring the UB for a moment - you are right that this race is usually benign (from the hardware perspective), but if you do release the memory at some point, you could actually run into a segmentation fault (e.g. in case the memory manager decides to return the memory to the OS).
How to avoid undefined behavior in this scenario
If you want to reuse the memory for something else, you have to use a memory reclamation scheme like lock-free reference counting, hazard pointers, epoch based reclamation or DEBRA. These can ensure that an object is only destroyed once it is guaranteed that all references to it have been dropped, so it can no longer be accessed by any thread.
My xenium library provides C++ implementations of various reclamation schemes (including all those previously mentioned) that you could use in this situation.

Destructor of STL string abort

I have a multi threaded program where I forgot to use a mutex. Once, the program aborted with the following stack trace:
T abort
T __libc_message
t malloc_printerr
T free
T operator delete(void*)
W std::basic_string<char, std::char_traits<char>, std::allocator<char>>::~basic_string()`
I used the gnu c++ compiler 4.4.3 (on Ubuntu 10.04). Is it true, that this behaviour could be because of the usage of the string like the following example. In real it is much more complicated but I want to know if the following simple code could cause such an abort.
Thread which aborts when the destructor of the copy string is called:
void f()
{
std::string s = someglobalstring;
}
Thread which modifies string:
void g()
{
someglobalstring = newcontent;
}
Questions:
Are newer C++ implementations thread safe with reading and writing of std::string?
Is it expected that the destructor aborts here?

Strings are not thread safe. If you want to do this, use a std::mutex when accessing your string.
void g()
{
std::lock_guard<std::mutex> lock(m);
someglobalstring = newcontent;
}
Same for f, and define m (type std::mutex) with the string.

can I use a boost::shared_ptr when creating&accepting a socket in boost::asio async mode?

Sorry if I wasn't able to put a better title to my question.
I was debugging my program when I noticed something very interesting. The code is very straightforward. please follow my comments inline:
//my session class
class Session
{
public:
/// Constructor.
Session(boost::asio::io_service &io_service)
: socket_(io_service)
{
}
boost::asio::ip::tcp::socket& socket()
{
return socket_;
}
void async_read(/*...*/);
void async_write(/*...*/);
//blah blah
private:
std::vector<char> inbound_data_;//<---note this variable, but don't mind it until i tell you
std::string outbound_data_;
boost::asio::ip::tcp::socket socket_;
}
typedef boost::shared_ptr<Session> session_ptr; //just for easy reading
//and this is my connection server class
class ConnectionServer {
public:
void ConnectionServer::CreatSocketAndAccept() {
session_ptr new_sess(new Session(io_service_));//<--I created a scope limited shared_ptr
Print()<< "new_sess.use_count()= " << new_sess.use_count() << std::endl;//prints 1
acceptor_.async_accept(new_sess->socket(),//<-used it for async connection acceptance
boost::bind(&ConnectionServer::handle_accept, this,
boost::asio::placeholders::error, new_sess));
Print()<< "new_sess.use_count()= " << new_sess.use_count() << std::endl;//prints 2
}//<-- Scope is ending. what happens to my new_sess? who keeps a copy of my session?
//and now the strangest thing:
void ConnectionServer::handle_accept(const boost::system::error_code& e, session_ptr sess) {
if (!e) {
Print()<< "sess.use_count()= " << sess.use_count() << std::endl;//prints 4 !!!! while I have never copied the session anywhere else in between
Print() << "Connection Accepted" << std::endl;
handleNewClient(sess);
}
else
{
std::cout << "Connection Refused" << std::endl;
}
CreatSocketAndAccept();
}
I don't know who(in boost::asio) copies my shared_ptr internally and when it is going to release them all.
In fact, I noticed this situation when:
My application runs to completion and at the time when containers full of nested shared_ptr ed objects are being cleaned up(automatically and not by me),
I get a seg fault after ~Session() is called where program is trying to deal with a std::vector<char> (this is where I told you to remember in the beginning).
I could see this through eclipse debugger.
I am not good in reading seg faults but I guess the program is trying to clear a vector that doesn't exist.
Sorry for the long question
I value your time and appreciate your kind comments.
EDIT-1:
I just modified my application to use raw pointers for creating new Session(s) rather than shared_ptr. The seg fault is gone if I dont delete the Session. So at least I am sure the cause of the seg fault is in Session .
EDIT-2:
As I mentioned in my previous update, the problem occurs when I try to delete the session but every time the trace leading to the seg fault is different.
sometimes this:
Basic Debug [C/C++ Application]
SimMobility_Short [10350] [cores: 0]
Thread [1] 10350 [core: 0] (Suspended : Signal : SIGSEGV:Segmentation fault)
malloc_consolidate() at malloc.c:4,246 0x7ffff5870e20
malloc_consolidate() at malloc.c:4,215 0x7ffff5871b19
_int_free() at malloc.c:4,146 0x7ffff5871b19
__gnu_cxx::new_allocator<char>::deallocate() at new_allocator.h:100 0xa4ab4a
std::_Vector_base<char, std::allocator<char> >::_M_deallocate() at stl_vector.h:175 0xab9508
std::_Vector_base<char, std::allocator<char> >::~_Vector_base() at stl_vector.h:161 0xabf8c7
std::vector<char, std::allocator<char> >::~vector() at stl_vector.h:404 0xabeca4
sim_mob::Session::~Session() at Session.hpp:35 0xabea8d
safe_delete_item<sim_mob::Session>() at LangHelpers.hpp:136 0xabef31
sim_mob::ConnectionHandler::~ConnectionHandler() at ConnectionHandler.cpp:40 0xabd7e6
<...more frames...>
gdb
and some times this:
Basic Debug [C/C++ Application]
SimMobility_Short [10498] [cores: 1]
Thread [1] 10498 [core: 1] (Suspended : Signal : SIGSEGV:Segmentation fault)
_int_free() at malloc.c:4,076 0x7ffff5871674
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at 0x7ffff639d540
sim_mob::ConnectionHandler::~ConnectionHandler() at ConnectionHandler.cpp:30 0xabd806
boost::checked_delete<sim_mob::ConnectionHandler>() at checked_delete.hpp:34 0xadd482
boost::detail::sp_counted_impl_p<sim_mob::ConnectionHandler>::dispose() at sp_counted_impl.hpp:78 0xadd6a2
boost::detail::sp_counted_base::release() at sp_counted_base_gcc_x86.hpp:145 0x849d5e
boost::detail::shared_count::~shared_count() at shared_count.hpp:305 0x849dd7
boost::shared_ptr<sim_mob::ConnectionHandler>::~shared_ptr() at shared_ptr.hpp:164 0x84a668
sim_mob::ClientHandler::~ClientHandler() at ClientHandler.cpp:42 0xac726d
sim_mob::ClientHandler::~ClientHandler() at ClientHandler.cpp:45 0xac72da
<...more frames...>
gdb
does it mean my memory is already corrupted? How can I do more checks? Thank you

This line is where the magic lives:
acceptor_.async_accept(new_sess->socket(),//<-used it for async connection acceptance
boost::bind(&ConnectionServer::handle_accept, this,
boost::asio::placeholders::error, new_sess));
The async_accept has an (optional) second parameter - a completion function which you are using here. You are using boost::bind to create a functor that matches the completion function declaration. You are passing a new_sess smart pointer to that handler (this is why the smart_pointer is not deleted when you leave the scope).
In other words: The async_accept function takes either a functor with no parameters or a functor that accepts an error. You may now create a class that overloads the operator() with that signature. Instead you use boost::bind. Boost::bind allows you to either provide the parameters when the (inner) function is called or when constructing the functor by calling boost::bind. You provided some parameters when calling boost::bind - the smart pointer to the session.
This is a common pattern with boost::asio. You pass your context to the asynchronous function. When this function detects an error all you need to do is to leave the function. The context then leaves the scope and will be deleted. When no error is detected you pass the context (via boost::bind) to the next async function and the context will be kept alive.

You should be able to use shared_ptr in that way, I use it in the same manner without issue.
Internally, asio keeps a copy of your shared_ptr (via boost::bind) until it calls handle_accept. This is what allows you to pass the shared_ptr to begin with. If you did not add it as one of the arguments, then it would clean up the object as soon as it scoped in the function you created it.
I suspect that you have other undefined behavior that using a raw pointer with does not uncover.

To (try to) answer your second question: It seems like a you are issuing a double delete on the session. This is only possible if you create a second scoped_ptr from a raw pointer. This is something you shouldn't do. Are you passing a raw pointer to session to any function that in turn creates a scoped ptr of it?
You could try to let Session inherit enable_shared_from_this. This will fix the problem as any raw pointer uses the same scoped_ptr counter. But you should not see this as a real fix. The real fix would be to eliminate the multiple scope_ptr instanciations.
Edit: Added another debug possibility
Something else you could try would be to set a breakpoint in the destructor of the session and see the backtrace of the first/second delete.

As covered in this answer, it is fine to use shared pointers with Boost.Asio's async_* functions.
Based on the call stacks and behavior, it looks as though at least one resource is being deleted twice. Is it possible that Session is being managed through both a raw pointer and a shared_ptr?
Managing with boost::shared_ptr:
void ConnectionServer::CreatSocketAndAccept() {
session_ptr new_sess(new Session(io_service_)); // shared pointer
...
}
Managing with raw-pointer:
sim_mob::Session::~Session()
safe_delete_item<sim_mob::Session>() // raw pointer
sim_mob::ConnectionHandler::~ConnectionHandler()
If ConnectionHandler was managing Session with boost::shared_ptr, then the call stack should show boost::shared_ptr<sim_mob::Session>::~shared_ptr(). Also, be careful not to create a shared_ptr from a raw pointer that is already being managed by a shared_ptr, as it will result in the shared_ptrs managing the resource as two distinct resources, resulting in a double deletion:
// p1 and p2 use the same reference count to manage the int.
boost::shared_ptr<int> p1(new int(42));
boost::shared_ptr<int> p2(p1); // good
// p3 uses a different reference count, causing int to be managed
// as if it was a different resource.
boost::shared_ptr<int> p3(p1.get()); // bad
As a side note, one common idiom is to have Session inherit from enable_shared_from_this. It allows for Session to remain alive throughout the duration of its asynchronous call chains by passing the shared pointer as a handle to the instance in place of this. For example, it would allow for Session to remain alive while an asynchronous read operation is outstanding, as long as the result of shared_from_this() is bound as the instance handle to the Session::async_read callback.

boost::mutex, pthread_mutex_destroy failed - debug suggestions?

We have several locks (boost::mutex) in static classes, but when the program exits, pthread_mutex_destroy fails in the destructor of the mutex (there is an assertion checking this in boost).
As far as I know, pthread_mutex_destroy will only fail in two cases.
[EBUSY] The implementation has detected an attempt to destroy the object referenced by mutex while it is locked or referenced (for example, while being used in a pthread_cond_timedwait() or pthread_cond_wait()) by another thread.
[EINVAL] The value specified by mutex is invalid.
When I run in GDB and I print the lock I see that it is unlocked.
Unfortunately I'm having trouble printing errno in GDB.
#3 0x000000000044a2c6 in ~mutex (this=0x847840, __in_chrg=<value optimized out>) at /usr/include/boost/thread/pthread/mutex.hpp:47
47 BOOST_VERIFY(!pthread_mutex_destroy(&m));
(gdb) p m
$1 = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 4294967294, __kind = 0, __spins = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = '\000' <repeats 12 times>"\376, \377\377\377", '\000' <repeats 23 times>, __align = 0}
Now that I am writing this post the value of __nusers and __size look strange. This could hint to the lock being invalid, but I know that the lock was valid at some point (I wrap the boost::mutex in a Lock class, where I printed the value of this(0x847840) in the constructor, destructor and lock/unlock functions.
Any help as to how to debug this would be greatly appreciated.
Edit
The Locks class inherits from boost::mutex, and exports a scopedlock (from memory):
lock_type::scoped_lock getScopedLock() {
return lock_type::scoped_lock( *this );
}
I've also tried to add the lock as a member, instead of inheriting from it, with no change in behavior.
I do not think that the getScopedLock function could introduce any problems(the scoped lock is returned y value, but a copy is not made because of RVO), but thought it could be worth mentioning.
It is used as follows (we are using c++0x):
auto lock = lock_.getScopedLock();
The complete stracktrace:
(gdb) where
#0 0x00007ffff559da75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff55a15c0 in *__GI_abort () at abort.c:92
#2 0x00007ffff5596941 in *__GI___assert_fail (assertion=0x55851c "!pthread_mutex_destroy(&m)", file=<value optimized out>, line=47,
function=0x5595a0 "boost::mutex::~mutex()") at assert.c:81
#3 0x000000000044a2c6 in ~mutex (this=0x847840, __in_chrg=<value optimized out>) at /usr/include/boost/thread/pthread/mutex.hpp:47
#4 0x000000000044d923 in ~Lock (this=0x847840, __in_chrg=<value optimized out>) at include/Locks.h:43
#5 0x00007ffff55a3262 in __run_exit_handlers (status=0) at exit.c:78
#6 *__GI_exit (status=0) at exit.c:100
#7 0x00000000004ea9a6 in start () at src/main.cc:191
#8 0x00000000004de5aa in main (argc=1, argv=0x7fffffffe7b8) at src/main.cc:90

You typically get this error when you unlock your mutex without locking it first.
boost::mutex m;
m.unlock();
My guess is that somewhere you are using lock and unlock members rather than RAII,
and that you have lost a call to lock.
Note that most of the time you should not be calling the lock and unlock members. Use the scoped_lock which calls the functions for you.
struct s
{
void foo()
{
boost::mutex::scoped_lock l(m_mutex);
//do something
}
private:
boost::mutex m_mutex;
};
Also, you mention that you are inheriting from boost::mutex. This can cause problems becuase boost::mutex does not have a virtual destructor, so its best not to do that.

Ok turns out there were two problems.
There was one lock, which never got used but when stopping I did call unlock.
Obviously I didn't read the documentation correctly, as there is a precondition on unlock that the current thread must own lock.
Thank you Tom for getting me to see this.
The second problem was that somewhere I have a scoped lock, and I want to unlock it before it goes out of scope:
auto lock = lock_.getScopedLock();
if( something )
lock.unlock();
Originally, this read lock_.unlock();, so I was unlocking the mutex, not via the scoped lock.
#Tom, the reason I don't like writing boost::mutex::scoped_lock l(lock_) is that if you write boost::mutex::scoped_lock l() there will be no errors whatsoever.
Now, the only danger I see is that someone writes lock_.getScopedLock() without storing it in a variable, I guess when someone else starts touching the code we'd just define a macro for getting the scoped lock (yes yes, we could do the same for the variant without getScopedLock ;)).
In any case, I'm not inheriting from boost::mutex anymore, but instead keeping it as a member. You are right that we should not risk inheriting from it.
#Daniel,
Compiling with -lpthread did not help, I don't have time to look at that particular problem at the moment, as I don't need it, but thank you for your suggestion anyway.
#Sam,
I did run in valgrind, but it showed no interesting output to the lock problem.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

std::lock_guard<std::mutex> segfaults on construction? - c++

I was getting a segfault on constructing the std::lock_guard, turns out my code was using an uninitialized std::shared_ptr<my_object_with_mutex>. Using a properly constructed my_object_with_mutex resolves the problem.

Related

Caller failing to hold lock before calling function std::_Mutex_base::unlock

How do I avoid or suppress the race in this lock free stack?

Destructor of STL string abort

can I use a boost::shared_ptr when creating&accepting a socket in boost::asio async mode?

boost::mutex, pthread_mutex_destroy failed - debug suggestions?

Categories

Resources