We have several locks (boost::mutex) in static classes, but when the program exits, pthread_mutex_destroy fails in the destructor of the mutex (there is an assertion checking this in boost).
As far as I know, pthread_mutex_destroy will only fail in two cases.
[EBUSY] The implementation has detected an attempt to destroy the object referenced by mutex while it is locked or referenced (for example, while being used in a pthread_cond_timedwait() or pthread_cond_wait()) by another thread.
[EINVAL] The value specified by mutex is invalid.
When I run in GDB and I print the lock I see that it is unlocked.
Unfortunately I'm having trouble printing errno in GDB.
#3 0x000000000044a2c6 in ~mutex (this=0x847840, __in_chrg=<value optimized out>) at /usr/include/boost/thread/pthread/mutex.hpp:47
47 BOOST_VERIFY(!pthread_mutex_destroy(&m));
(gdb) p m
$1 = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 4294967294, __kind = 0, __spins = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = '\000' <repeats 12 times>"\376, \377\377\377", '\000' <repeats 23 times>, __align = 0}
Now that I am writing this post the value of __nusers and __size look strange. This could hint to the lock being invalid, but I know that the lock was valid at some point (I wrap the boost::mutex in a Lock class, where I printed the value of this(0x847840) in the constructor, destructor and lock/unlock functions.
Any help as to how to debug this would be greatly appreciated.
Edit
The Locks class inherits from boost::mutex, and exports a scopedlock (from memory):
lock_type::scoped_lock getScopedLock() {
return lock_type::scoped_lock( *this );
}
I've also tried to add the lock as a member, instead of inheriting from it, with no change in behavior.
I do not think that the getScopedLock function could introduce any problems(the scoped lock is returned y value, but a copy is not made because of RVO), but thought it could be worth mentioning.
It is used as follows (we are using c++0x):
auto lock = lock_.getScopedLock();
The complete stracktrace:
(gdb) where
#0 0x00007ffff559da75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff55a15c0 in *__GI_abort () at abort.c:92
#2 0x00007ffff5596941 in *__GI___assert_fail (assertion=0x55851c "!pthread_mutex_destroy(&m)", file=<value optimized out>, line=47,
function=0x5595a0 "boost::mutex::~mutex()") at assert.c:81
#3 0x000000000044a2c6 in ~mutex (this=0x847840, __in_chrg=<value optimized out>) at /usr/include/boost/thread/pthread/mutex.hpp:47
#4 0x000000000044d923 in ~Lock (this=0x847840, __in_chrg=<value optimized out>) at include/Locks.h:43
#5 0x00007ffff55a3262 in __run_exit_handlers (status=0) at exit.c:78
#6 *__GI_exit (status=0) at exit.c:100
#7 0x00000000004ea9a6 in start () at src/main.cc:191
#8 0x00000000004de5aa in main (argc=1, argv=0x7fffffffe7b8) at src/main.cc:90
You typically get this error when you unlock your mutex without locking it first.
boost::mutex m;
m.unlock();
My guess is that somewhere you are using lock and unlock members rather than RAII,
and that you have lost a call to lock.
Note that most of the time you should not be calling the lock and unlock members. Use the scoped_lock which calls the functions for you.
struct s
{
void foo()
{
boost::mutex::scoped_lock l(m_mutex);
//do something
}
private:
boost::mutex m_mutex;
};
Also, you mention that you are inheriting from boost::mutex. This can cause problems becuase boost::mutex does not have a virtual destructor, so its best not to do that.
Ok turns out there were two problems.
There was one lock, which never got used but when stopping I did call unlock.
Obviously I didn't read the documentation correctly, as there is a precondition on unlock that the current thread must own lock.
Thank you Tom for getting me to see this.
The second problem was that somewhere I have a scoped lock, and I want to unlock it before it goes out of scope:
auto lock = lock_.getScopedLock();
if( something )
lock.unlock();
Originally, this read lock_.unlock();, so I was unlocking the mutex, not via the scoped lock.
#Tom, the reason I don't like writing boost::mutex::scoped_lock l(lock_) is that if you write boost::mutex::scoped_lock l() there will be no errors whatsoever.
Now, the only danger I see is that someone writes lock_.getScopedLock() without storing it in a variable, I guess when someone else starts touching the code we'd just define a macro for getting the scoped lock (yes yes, we could do the same for the variant without getScopedLock ;)).
In any case, I'm not inheriting from boost::mutex anymore, but instead keeping it as a member. You are right that we should not risk inheriting from it.
#Daniel,
Compiling with -lpthread did not help, I don't have time to look at that particular problem at the moment, as I don't need it, but thank you for your suggestion anyway.
#Sam,
I did run in valgrind, but it showed no interesting output to the lock problem.
Related
The clang ThreadSanitizer reports a data race in the following code:
#include <future>
#include <iostream>
#include <vector>
int main() {
std::cout << "start!" << std::endl;
for (size_t i = 0; i < 100000; i++) {
std::promise<void> p;
std::future<void> f = p.get_future();
std::thread t = std::thread([p = std::move(p)]() mutable {
p.set_value();
});
f.get();
t.join();
}
std::cout << "done!" << std::endl;
return 0;
}
I can fix the race by replacing p = std::move(p) with &p. However, I couldn't find documentation that explained whether the promise and future objects are thread safe or whether it matters in which order they are destroyed. My understanding was that since the promise and future communicate via a "shared state", the state should be thread-safe and destruction order shouldn't matter, but TSan disagrees. (Without TSan, the program seems to behave correctly, not crash.)
Does this code actually have a potential race, or is this a TSan false positive?
You can reproduce this with Clang 9 by running the following commands in an Ubuntu 19.10 Docker container:
$ docker run -it ubuntu:eoan /bin/bash
Inside container:
# apt update
# apt install clang-9 libc++-9-dev libc++abi-9-dev
# clang++-9 -fsanitize=thread -lpthread -std=c++17 -stdlib=libc++ -O0 -g test.cpp -o test
(See test.cpp file contents above)
# ./test
Example output showing a data race (actual output varies a bit between runs):
==================
WARNING: ThreadSanitizer: data race (pid=9731)
Write of size 8 at 0x7b2000000018 by thread T14:
#0 operator delete(void*) <null> (test+0x4b4e9e)
#1 std::__1::__shared_count::__release_shared() <null> (libc++.so.1+0x83f2c)
#2 std::__1::__tuple_leaf<1ul, test()::$_0, false>::~__tuple_leaf() /usr/lib/llvm-9/bin/../include/c++/v1/tuple:170:7 (test+0x4b7d38)
#3 std::__1::__tuple_impl<std::__1::__tuple_indices<0ul, 1ul>, std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>::~__tuple_impl() /usr/lib/llvm-9/bin/../include/c++/v1/tuple:361:37 (test+0x4b7ce9)
#4 std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>::~tuple() /usr/lib/llvm-9/bin/../include/c++/v1/tuple:466:28 (test+0x4b7c98)
#5 std::__1::default_delete<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0> >::operator()(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>*) const /usr/lib/llvm-9/bin/../include/c++/v1/memory:2338:5 (test+0x4b7c16)
#6 std::__1::unique_ptr<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>, std::__1::default_delete<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0> > >::reset(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>*) /usr/lib/llvm-9/bin/../include/c++/v1/memory:2593:7 (test+0x4b7b80)
#7 std::__1::unique_ptr<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0>, std::__1::default_delete<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0> > >::~unique_ptr() /usr/lib/llvm-9/bin/../include/c++/v1/memory:2547:19 (test+0x4b74ec)
#8 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, test()::$_0> >(void*) /usr/lib/llvm-9/bin/../include/c++/v1/thread:289:1 (test+0x4b7397)
Previous atomic read of size 1 at 0x7b2000000018 by main thread:
#0 pthread_cond_wait <null> (test+0x4268d8)
#1 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) <null> (libc++.so.1+0x422de)
#2 main /test/test.cpp:61:9 (test+0x4b713c)
Thread T14 (tid=18144, running) created by main thread at:
#0 pthread_create <null> (test+0x425c6b)
#1 std::__1::__libcpp_thread_create(unsigned long*, void* (*)(void*), void*) /usr/lib/llvm-9/bin/../include/c++/v1/__threading_support:336:10 (test+0x4b958c)
#2 std::__1::thread::thread<test()::$_0, void>(test()::$_0&&) /usr/lib/llvm-9/bin/../include/c++/v1/thread:303:16 (test+0x4b6fc4)
#3 test() /test/test.cpp:44:25 (test+0x4b6d96)
#4 main /test/test.cpp:61:9 (test+0x4b713c)
SUMMARY: ThreadSanitizer: data race (/test/test+0x4b4e9e) in operator delete(void*)
==================
When a promise goes out of scope*, the following happens:
if the shared state is not ready,
stores an exception of type future_error with error type broken_promise within the shared state, then
makes the state ready
otherwise, the state is ready
calling get() on the future then can only cause an exception if no value was ever set on the promise before it went out of scope.
Now, it's actually pretty hard to make a promise go out of scope before the shared state has a value. Either the thread has exited via exception anyway, or you have a logic error where not all branches call promise::set_value.
Your specific code does not appear to exhibit any symptoms like this. Moving a promise simply moves ownership of the shared state to the new promise.
As for race conditions, get_future is guaranteed to not have any data races with promise::set_value and its variations. future::get is also guaranteed to wait until the shared_state is ready. When a promise goes out of scope, it "releases" its shared state after making it ready, which would destroy the shared state only if it held the last reference to it. Since you have another reference to it (via a future), you're safe.
Now, it's always possible that the implementation has data races (by accident), but per the standard the code you posted shouldn't have any.
*Refer to [futures.state]
I'm using a lock free stack (via tagged pointers) to manage a pool of small blocks of memory. The list nodes are created and destroyed in-place when the blocks are inserted into, and removed from, the pool.
This is a very simplified test program, which only pops from the stack. So, no ABA problem and no tagged pointers. It is sufficient to demonstrate the race I'm running into:
#include <atomic>
#include <list>
#include <thread>
#include <type_traits>
struct Node {
Node() = default;
Node(Node *n) { next.store(n); }
std::atomic<Node *> next;
};
using Memory = std::aligned_storage_t<sizeof(Node)>;
struct Stack {
bool pop_and_use() {
for (Node *current_head = head.load(); current_head;) {
Node *next = current_head->next.load(); // READ RACE
if (head.compare_exchange_weak(current_head, next, std::memory_order_seq_cst)) {
current_head->~Node();
Memory *mem = reinterpret_cast<Memory *>(current_head);
new (mem) int{0}; // use memory with non-atomic write (WRITE RACE)
return true;
}
}
return false;
}
void populate(Memory *mem, int count) {
for (int i = 0; i < count; ++i) {
head = new (mem + i) Node(head.load());
}
}
std::atomic<Node *> head{};
};
int main() {
Memory storage[10000];
Stack test_list;
test_list.populate(storage, 10000);
std::thread worker([&test_list]() {
while (test_list.pop_and_use()) {
};
});
while (test_list.pop_and_use()) {};
worker.join();
return 0;
}
Thread sanitizer reports the following error:
clang++-10 -fsanitize=thread tsan_test_2.cpp -o tsan_test_2 -O2 -g2 -Wall -Wextra && ./tsan_test_2
LLVMSymbolizer: error reading file: No such file or directory
==================
WARNING: ThreadSanitizer: data race (pid=35998)
Atomic read of size 8 at 0x7fff48bd57b0 by thread T1:
#0 __tsan_atomic64_load <null> (tsan_test_2+0x46d88e)
#1 std::__atomic_base<Node*>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/atomic_base.h:713:9 (tsan_test_2+0x4b3e6c)
#2 std::atomic<Node*>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/atomic:452:21 (tsan_test_2+0x4b3e6c)
#3 Stack::pop_and_use() /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:17:39 (tsan_test_2+0x4b3e6c)
#4 main::$_0::operator()() const /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:40:22 (tsan_test_2+0x4b3e6c)
#5 void std::__invoke_impl<void, main::$_0>(std::__invoke_other, main::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:60:14 (tsan_test_2+0x4b3e6c)
#6 std::__invoke_result<main::$_0>::type std::__invoke<main::$_0>(main::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:95:14 (tsan_test_2+0x4b3e6c)
#7 decltype(std::__invoke(_S_declval<0ul>())) std::thread::_Invoker<std::tuple<main::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:244:13 (tsan_test_2+0x4b3e6c)
#8 std::thread::_Invoker<std::tuple<main::$_0> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:253:11 (tsan_test_2+0x4b3e6c)
#9 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196:13 (tsan_test_2+0x4b3e6c)
#10 <null> <null> (libstdc++.so.6+0xbd6de)
Previous write of size 4 at 0x7fff48bd57b0 by main thread:
#0 Stack::pop_and_use() /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:21:9 (tsan_test_2+0x4b3d5d)
#1 main /home/BOSDYN/akhripin/tmp/tsan_test_2.cpp:43:20 (tsan_test_2+0x4b3d5d)
Location is stack of main thread.
Location is global '??' at 0x7fff48bad000 ([stack]+0x0000000287b0)
Thread T1 (tid=36000, running) created by main thread at:
#0 pthread_create <null> (tsan_test_2+0x4246bb)
#1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xbd994)
#2 __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 (libc.so.6+0x21b96)
SUMMARY: ThreadSanitizer: data race (/home/BOSDYN/akhripin/tmp/tsan_test_2+0x46d88e) in __tsan_atomic64_load
==================
ThreadSanitizer: reported 1 warnings
The problem arises when the two threads read the same value of current_head, but one of them completes the pop and overwrites the node before the other has a chance to read current_head->next.
This is similar to the problem discussed here: Why would 'deleting' nodes in this lock-free stack class would cause race condition? except the memory is not actually being deallocated.
I know that from the machine's perspective, this race is benign -- if the read race occurs, the compare-and-swap will not succeed -- but I think this is still getting into undefined behavior territory in C++.
Is there any way to write this code without getting a race condition?
Is there any way to annotate the code to make thread sanitizer ignore it? I experimented with __tsan_acquire and __tsan_release but could not find something that consistently worked.
Update I'm pretty convinced that there is no way to perform the atomic read safely in standard C++ -- the object just doesn't exist any more. But -- can I go from relying on undefined behavior to relying on implementation-defined behavior? What's the best I could do, given typical architectures and toolchains (x86/ARM, gcc/clang)?
Update 2 One implementation-specific approach that seems to work is to replace the load with inline assembly:
inline Node *load_next_wrapper(Node *h) {
Node *ret;
asm volatile("movq (%1), %0" : "=r"(ret) : "r"(&h->next));
return ret;
}
This is both architecture and compiler specific -- but I think this does replace "undefined" behavior with "implementation-defined" behavior.
Tagged pointers are fine if you simply want to reuse the same nodes in the data structure, i.e., you don't destroy it, but simply put it on a free-list so it can be reused when you need a new node in the next push operation. In this case tagged pointers are sufficient to prevent the ABA problem, but they are no solution to the _ memory reclamation problem_ that you face here.
Another object of some type will be constructed in the same location. Eventually, it will be destroyed and the memory would return to the pool.
This is the real issue - you are destroying the object and reusing the memory for something else. As many others have already explained in the comments this causes undefined behavior. I am not sure what you mean by "return to the pool" - return to the memory manager? Ignoring the UB for a moment - you are right that this race is usually benign (from the hardware perspective), but if you do release the memory at some point, you could actually run into a segmentation fault (e.g. in case the memory manager decides to return the memory to the OS).
How to avoid undefined behavior in this scenario
If you want to reuse the memory for something else, you have to use a memory reclamation scheme like lock-free reference counting, hazard pointers, epoch based reclamation or DEBRA. These can ensure that an object is only destroyed once it is guaranteed that all references to it have been dropped, so it can no longer be accessed by any thread.
My xenium library provides C++ implementations of various reclamation schemes (including all those previously mentioned) that you could use in this situation.
#include < iostream >
#include < pthread.h >
using namespace std;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void* Func(void *)
{
pthread_mutex_lock(&mutex);
cout << "First thread execution" << endl;
pthread_mutex_unlock(&mutex);
}
int main()
{
pthread_t th1;
pthread_create(&th1, NULL, Func, NULL);
pthread_mutex_lock(&mutex);
cout << "In main thread" << endl;
pthread_mutex_lock(&mutex);
// pthread_join(th1, NULL); // Note this code is commented
return 0;
}
I have executed following program on linux fedora 22 (also on http://www.cpp.sh/) around 20 times, and out of 20 execution I have found following outputs:-
Output1:
In main thread
First thread execution
Output2:
First thread execution
In main thread
Output3:
In main thread
Output4:
In main thread
First thread execution
First thread execution
Output 1 to 3 are expected as main thread is not waiting child thread to exit. Execution sequence of both the threads (main and child) is fully dependent on Kernel thread scheduling.
But output 4 is strange !!! First thread execution gets printed two times !!!
Now if I run program after un-commentting code 'pthread_join(th1, NULL)' or add 'pthread_exit(NULL)', I do not get strange output (i.e. First thread execution never printed twice) ever, even I run code 10000 times.
My questions to experts are:
Without pthread_join/pthread_exit what is happening behind the scene so that First thread execution have got printed 2 times?
Responsibility of pthread_join is to get exit code of a particular thread, and after successful call of pthread_join, kernel will free resources of that particular thread. If I do not call pthread_join on a joinable thread then it will result in resource leak, but why above mentioned strange behavior ??
We might say, this is un-defined behavior, but it would be great if any expert provide technical explanation on this.
How pthread_join/pthread_exit can prevent above mentioned strange behavior ? What hidden thing it is doing here due to that strange behavior doesn't appear ?
Thanks to experts in advance..
I've observed this kind of double printing in a similar situation. While your thread was waiting in the write system call doing its normal output, specifically, in this stack:
#0 0x00007ffff78f4640 in write () from /lib64/libc.so.6
#1 0x00007ffff788fb93 in _IO_file_write () from /lib64/libc.so.6
#2 0x00007ffff788fa72 in new_do_write () from /lib64/libc.so.6
#3 0x00007ffff7890e05 in _IO_do_write () from /lib64/libc.so.6
#4 0x00007ffff789114f in _IO_file_overflow () from /lib64/libc.so.6
the program was terminated normally, normal termination caused the output subsystem to flush all buffers. The output buffer on stdin was not yet marked free (the write system call didn't return yet), so it was written out again:
#0 0x00007ffff78f4640 in write () from /lib64/libc.so.6
#1 0x00007ffff788fb93 in _IO_file_write () from /lib64/libc.so.6
#2 0x00007ffff788fa72 in new_do_write () from /lib64/libc.so.6
#3 0x00007ffff7890e05 in _IO_do_write () from /lib64/libc.so.6
#4 0x00007ffff7890140 in _IO_file_sync () from /lib64/libc.so.6
#5 0x00007ffff7891f56 in _IO_default_setbuf () from /lib64/libc.so.6
#6 0x00007ffff7890179 in _IO_file_setbuf () from /lib64/libc.so.6
#7 0x00007ffff7892703 in _IO_cleanup () from /lib64/libc.so.6
#8 0x00007ffff78512f8 in __run_exit_handlers () from /lib64/libc.so.
In any case, join your threads (If you used C++ threads, it would have reminded you to do that) or otherwise synchronize access to the output stream.
The main thread might end earlier then the spawned thread.
Ending the main thread implies ending the whole process, along with all threads being brought down abruptly. This might invoke undefined behaviour, thus anything can happen.
To get around this
either join the spawned thread using pthread_join() from main(),
or end the main thread using pthread_exit(), which just ends the main thread and keeps the process from being ended.
I'm attempting to access a shared std::queue using a std::mutex and a std::lock_guard. The mutex (pending_md_mtx_) is a member variable of another object (whose address is valid). My code seems to be segfault'ing on the construction of the lock_guard.
Any ideas? Should I be using a std::unique_lock (or some other object) instead? Running GCC 4.6 (--std=c++0x) under Ubuntu Linux. I can't post the entire class, but the only accesses to the mutex and queue listed below.
template <typename ListenerT>
class Driver
{
public:
template <typename... Args>
Driver(Args&&... args) :
listener_(std::forward<Args>(args)...) {}
void enqueue_md(netw::Packet* packet)
{
std::lock_guard<std::mutex> lock(pending_md_mtx_);
pending_md_.push(packet);
}
void process_md()
{
std::lock_guard<std::mutex> lock(pending_md_mtx_);
while (pending_md_.size())
{
netw::Packet* pkt=pending_md_.front();
pending_md_.pop();
process_md(*pkt);
}
}
//... Other code which I can't post...
private:
ListenerT listener_;
std::mutex pending_md_mtx_;
std::queue<netw::Packet*> pending_md_;
};
GDB Stacktrace:
(gdb) bt
#0 __pthread_mutex_lock (mutex=0x2f20aa75e6f4000) at pthread_mutex_lock.c:50
#1 0x000000000041a2dc in __gthread_mutex_lock (__mutex=0xff282ceacb40) at /usr/include/c++/4.6/x86_64-linux-gnu/./bits/gthr-default.h:742
#2 lock (this=0xff282ceacb40) at /usr/include/c++/4.6/mutex:90
#3 lock_guard (__m=..., this=0x7f2874fc4db0) at /usr/include/c++/4.6/mutex:445
#4 driver::Driver<Listener, false>::enqueue_md (this=0xff282ceac8a0, packet=...) at exec/../../driver/Driver.hpp:95
I was getting a segfault on constructing the std::lock_guard, turns out my code was using an uninitialized std::shared_ptr<my_object_with_mutex>. Using a properly constructed my_object_with_mutex resolves the problem.
I recently encountered this problem. It was caused by line of code causing a buffer overrun after acquiring the lock. It would seem odd for a line of code below the lock to be causing a problem a few lines earlier, but I suppose the buffer overrun would cause some corruption that causes a problem on a second call to the function.
The issue rootcause in my case:
An object A references object B
On call to object B.func() I see a SegFault on lock_guard
Object B has never been set for object A (not initialized, a NULL pointer), leading to a SegFault on accessing a field (mutex, in my case).
The error could be diagnosed from GDB by noticing this=0x0:
...
#4 0x000055e3a9e14a3c in B<C>::write (this=0x4e2280, msg=0x55e3aac03be0) at /proj/B.hpp:35
#5 0x000055e3a9e206e6 in A::write (this=0x0, msg=0x55e3aac03be0) at /proj/A.cpp:286
#6 0x000055e3a9e2069a in A::write (this=0x7f21eae64010, msg=0x55e3aac03be0) at /proj/A.cpp:277
...
In my case the root cause was the same (object with mutex uninitialized), but the reason was not.
The object that had the mutex had a function reset. Guess what, shared_ptr has also a function named reset, and I called that instead!
Avoid using reset as a name, or double-check if you're not using obj.reset() instead of obj->reset()!
I am struggling with calling a clutter function from an extra thread.
I use boost::thread for threading and the clutter library 1.0.
To be specific, the thread contains a looped function that emits boost::signals2::signal with parameters of x and y coordinates every once in a while.
That signal is connected to a function that hands those variables to clutter, i.e. x,y in
clutter_stage_get_actor_at_pos(CLUTTER_STAGE(actor),
CLUTTER_PICK_ALL, x, y);
And that is where i get a segfault.
Apparently clutter has some thread-handling routines. I tried calling
g_thread_init(NULL);
clutter_threads_init();
before starting clutter_main(). I also tried enclosing the clutter function in
clutter_threads_enter();
clutter_stage_get_actor_at_pos(CLUTTER_STAGE(actor),
CLUTTER_PICK_ALL, x, y);
clutter_threads_leave();
but that does also not do the trick..
Every hint is appreciated, thank you in advance!
Addendum
I just forged a minimal sample of what I am trying to do. I already 'protected' the clutter_main() routine as suggested. Some functions of clutter seem to work (e.g setting stage color or setting actor position) from the seperate thread. Is there still something wrong with my code?
#include <clutter/clutter.h>
#include <boost/thread.hpp>
ClutterActor *stage;
ClutterActor* rect = NULL;
void receive_loop()
{
while(1)
{
sleep(1);
clutter_threads_enter();
ClutterActor* clicked = clutter_stage_get_actor_at_pos(CLUTTER_STAGE(stage), CLUTTER_PICK_ALL,300, 500);
clutter_threads_leave();
}
}
int main(int argc, char *argv[])
{
clutter_init(&argc, &argv);
g_thread_init(NULL);
clutter_threads_init();
stage = clutter_stage_get_default();
clutter_actor_set_size(stage, 800, 600);
rect = clutter_rectangle_new();
clutter_actor_set_size(rect, 256, 128);
clutter_actor_set_position(rect, 300, 500);
clutter_group_add (CLUTTER_GROUP (stage), rect);
clutter_actor_show(stage);
boost::thread thread = boost::thread(&receive_loop);
clutter_threads_enter();
clutter_main();
clutter_threads_leave();
return 0;
}
Well, I think I found the answer..
Clutter Docs Gerneral
It says in section "threading model":
The only safe and portable way to use the Clutter API in a multi-threaded environment is to never access the API from a thread that did not call clutter_init() and clutter_main().
The common pattern for using threads with Clutter is to use worker threads to perform blocking operations and then install idle or timeour sources with the result when the thread finished.
Clutter provides thread-aware variants of g_idle_add() and g_timeout_add() that acquire the Clutter lock before invoking the provided callback: clutter_threads_add_idle() and clutter_threads_add_timeout().
So my correction to the minimal sample code would be to alter the receive_loop() to
void receive_loop()
{
while(1)
{
sleep(1);
int pos[2];
pos[0] = 400;
pos[1] = 200;
clutter_threads_add_idle_full (G_PRIORITY_HIGH_IDLE,
get_actor,
&pos,
NULL);
}
}
and to add the get_actor function (as in the example code on the menitioned doc page)
static gboolean
get_actor (gpointer data)
{
int* pos = (int*) data;
ClutterActor* clicked = clutter_stage_get_actor_at_pos(CLUTTER_STAGE(stage), CLUTTER_PICK_ALL, pos[0], pos[1]);
return FALSE;
}
clutter_threads_add_idle_full takes care of thread lock etc..
I struggled with a very similar situation in the Python bindings for clutter. I was never able to make the Clutter thread support work the way I wanted.
What finally did the trick was using an idle proc (gobject.idle_add in python) to push the work I needed done into the main clutter thread. That way I have only 1 thread making clutter calls and everything is fine.
I played with your code and it seems you are doing everything ok, though I'm no expert in Clutter. I also ran your program under gdb and some interesting things showed up:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb799db70 (LWP 3023)]
0x002d97c6 in glDisable () from /usr/lib/nvidia-current/libGL.so.1
(gdb) thread apply all bt
Thread 2 (Thread 0xb799db70 (LWP 3023)):
#0 0x002d97c6 in glDisable () from /usr/lib/nvidia-current/libGL.so.1
#1 0x001b3ec3 in cogl_disable_fog () from /usr/lib/libclutter-glx-1.0.so.0
#2 0x0018b00a in ?? () from /usr/lib/libclutter-glx-1.0.so.0
#3 0x0019dc82 in clutter_stage_get_actor_at_pos () from /usr/lib/libclutter-glx-1.0.so.0
#4 0x080498de in receive_loop () at seg.cpp:19
Apparently the crash happened on glDisable () from /usr/lib/nvidia-current/libGL.so.1. Notice that I use NVIDIA's OpenGL driver on my GeForce 8600 GT.
Can you confirm that your application also crashes on computers with other video cards (not NVIDIA)? I doubt the crash is due to a bug on NVIDIA's OpenGL implementation.
For me it seems that *clutter_threads_enter/leave()* is not protecting *clutter_stage_get_actor_at_pos()* since I tested *receive_loop()* being called as a callback:
g_signal_connect(stage, "button-press-event", G_CALLBACK(receive_loop), NULL);
so we know that your code seems to be ok.
I encourage you to send your question to Clutter discussion and help mailing list: clutter-app-devel-list
a mailing list for application developers using Clutter, its integration libraries or toolkits based on Clutter.
You can either use clutter_threads_add_idle to update ClutterActor or you need to fix the clutter_threads_enter/leave to switch OpenGL context as well so that you can use it inside a thread.
The crash
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb799db70 (LWP 3023)]
0x002d97c6 in glDisable () from /usr/lib/nvidia-current/libGL.so.1
(gdb) thread apply all bt
Thread 2 (Thread 0xb799db70 (LWP 3023)):
#0 0x002d97c6 in glDisable () from /usr/lib/nvidia-current/libGL.so.1
#1 0x001b3ec3 in cogl_disable_fog () from /usr/lib/libclutter-glx-1.0.so.0
#2 0x0018b00a in ?? () from /usr/lib/libclutter-glx-1.0.so.0
#3 0x0019dc82 in clutter_stage_get_actor_at_pos () from /usr/lib/libclutter-glx-1.0.so.0
#4 0x080498de in receive_loop () at seg.cpp:19
is because the calling thread didn't acquire OpenGL context so it crashed.