Goal: I want to modify internal information and access this information from many threads synchronously as fast as possible
I simplified code bellow, but this is how I tried to achieve this.
I have 2 shared pointers.
One is called m_mutable_data and the other is called m_const_data.
m_mutable_data is updated in strand guarded way. m_const_data is updated with contents of m_mutable_data every 60s also in the strand guarded way.
This is the only place m_const_data shared pointer is reset with new data. m_const_data is read synchronously by many threads, 1000+ times per second.
Code
class black_list_container : public std::enable_shared_from_this<black_list_container>
{
struct meta_data
{
bool blacked;
}
struct black_list_data
{
std::unordered_map<uint32_t,meta_data> data;
}
public:
#pragma optimize( "", off )
bool is_blacked(uint32_t id)
{
// This call is called from many different threads (1000+ calls per second)
// should be synchronous and as fast as possible
auto c = m_const_data;
return c->data[id].blacked;
}
#pragma optimize( "", on )
#pragma optimize( "", off )
void update_const_data()
{
// Called internaly by timer every 60s to update m_const_data with contents of m_mutable_data
// Guarded with strand
m_strand->post([self{shared_from_this()}]{
auto snapshot = new black_list_data();
snapshot->data = m_mutable_data->data;
m_const_data.reset(snapshot);
});
}
#pragma optimize( "", on )
private:
void internal_modification_mutable_data()
{
// Called internaly by different metrics
// Guarded with strand
m_strand->post([self{shared_from_this()}]{
// .... do some modification on internal m_mutable_data
});
}
boost::asio::io_context::strand m_strand;
std::shared_ptr<black_list_data> m_mutable_data;
std::shared_ptr<black_list_data> m_const_data;
};
Very, very seldom this code crashes in method 'is_blacked' on line
auto c = m_const_data;
This is the backtrace
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./STRATUM-01'.
Program terminated with signal 6, Aborted.
#0 0x00007fe09aaf1387 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-307.el7.1.x86_64 libgcc-4.8.5-39.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64
(gdb) bt
#0 0x00007fe09aaf1387 in raise () from /lib64/libc.so.6
#1 0x00007fe09aaf2a78 in abort () from /lib64/libc.so.6
#2 0x00007fe09ab33ed7 in __libc_message () from /lib64/libc.so.6
#3 0x00007fe09ab3c299 in _int_free () from /lib64/libc.so.6
#4 0x00000000005fae36 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fe0440aeaa0) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/shared_ptr_base.h:154
#5 0x00000000006b9205 in ~__shared_count (this=<synthetic pointer>, __in_chrg=<optimized out>) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/shared_ptr_base.h:684
#6 ~__shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/shared_ptr_base.h:1123
#7 ~shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/shared_ptr.h:93
#8 black_list_container_impl::is_blacked (this=0x7fe08c287e50, id=23654) at /var/lib/jenkins/workspace/validator/src/black_list_container.cpp:69
I'm not exactly sure why destruction of shared_ptr is called in frame #7
Obviously I did not achieve my goal so please direct me into pattern that actually achieves my goal in thread safe way.
I know I could have used
std::atomic<std::shared_ptr<black_list_data>> m_const_data;
but would not this affect performance while reading from many different threads?
I think I found the answer to my question in this article Atomic Smart Pointers.
So I have to change code in update_const_data() to
auto snapshot = std::make_shared<black_list_data>();
snapshot->data = m_mutable_data->data;
std::atomic_store(&m_const_data, snapshot);
and code in is_blacked() to
auto c = std::atomic_load(&m_const_data);
Related
Previously I had asked a question regarding how to terminate thread blocked for I/O. I have used pthread_kill() instead of pthread_cancel() or writing to pipes, considering few advantages. After implementing the code to send signal (SIGUSR2) to the target thread using pthread_kill(), initially it worked fine and didn't face any issues.
static void signalHandler(int signum) {
LogInfo("SIGNAL HANDLER : %d",signum); //print info message using logger utility
}
void setSignalHandler() {
struct sigaction actions;
memset(&actions, 0, sizeof(actions));
sigemptyset(&actions.sa_mask);
actions.sa_flags = 0;
actions.sa_handler = signalHandler;
int rc = sigaction(SIGUSR2,&actions,NULL);
if(rc < 0) {
LogError("Error in sigaction"); //print error using logger utility
}
}
void stopThread() {
fStop = 1; //set the flag to stop the thread
if( ftid ) { //ftid is pthread_t variable of thread that needs to be terminated.
LogInfo("Before pthread_kill()");
int kill_result = pthread_kill(ftid,SIGUSR2); // send signal to interrupt the thread if blocked for poll() I/O
LogInfo("pthread_kill() returned : %d ( %s )",kill_result,strerror(kill_result));
int result = pthread_join( ftid, NULL );
LogInfo("Event loop stopped for %s", fStreamName);
LogInfo( "pthread_join() -> %d ( %s )\n", result, strerror(result) );
if( result == 0 ) ftid = 0;
}
}
Recently I have noticed a deadlock issue where few threads (which tried to log statements) blocked with below stacktrace
#0 0x00007fc1b2dc565c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007fc1b2d70f0c in _L_lock_2462 () from /lib64/libc.so.6
#2 0x00007fc1b2d70d47 in __tz_convert () from /lib64/libc.so.6
#3 0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
getCurrentTimestamp(char*) function is called from our logger module when LogInfo() or LogError() functions are invoked. This function internally calls gettimeofday() to print the current time in logs. So I'm suspecting LogInfo() function (which calls gettimeofday()) called inside signalHandler() might be causing the deadlock issue. But i'm not confident of this as this issue is not getting reproduced frequently and i did not get any information stating that gettimeofday() is not async signal safe.
gettimeofday() is thread safe and i'm considering it's not async signal safe as it was not listed under async signal safe functions.
1) Can I consider gettimeofday() is not async signal safe and it was the root cause of deadlock ?
2) Are there any known issues where use of pthread_kill() might cause deadlock ?
EDIT :
Below is the definition of getCurrentTimeStamp() function
char* Logger::getCurrentTimestamp ( char *tm_buff ) {
char curr_time[ 32 ] = { 0 };
time_t current = time( NULL );
struct timeval detail_time;
if ( tm_buff ) {
strftime( curr_time, sizeof(curr_time), "%Y-%m-%d-%H:%M:%S", localtime( ¤t ) );
gettimeofday( &detail_time, NULL );
sprintf( tm_buff, "%s:%03d", curr_time, (int) (detail_time.tv_usec / 1000) );
return (tm_buff);
}
return (NULL);
}
stack trace of the thread that was interrupted by signal.
#0 0x00007fc1b2dc565c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007fc1b2d70f0c in _L_lock_2462 () from /lib64/libc.so.6
#2 0x00007fc1b2d70d47 in __tz_convert () from /lib64/libc.so.6
#3 0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
#4 0x000000000046e80f in Log::write(Logger*, LogType, std::string const&, char const*) ()
#5 0x000000000046df74 in Log::i(Logger*, char const*, std::string const&, std::string const&, char const*, ...) ()
#6 0x0000000000456b67 in ?? ()
#7 <signal handler called>
#8 0x00007fc1b2da8dc5 in _xstat () from /lib64/libc.so.6
#9 0x00007fc1b2d71056 in __tzfile_read () from /lib64/libc.so.6
#10 0x00007fc1b2d703b4 in tzset_internal () from /lib64/libc.so.6
#11 0x00007fc1b2d70d63 in __tz_convert () from /lib64/libc.so.6
#12 0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
#13 0x000000000047030e in Logger::writeLog(int, std::string&, std::string&) ()
#14 0x0000000000470633 in Logger::Write(Logger*, int, std::string, std::string const&) ()
#15 0x000000000046eb02 in Log::write(Logger*, LogType, std::string const&, char const*) ()
#16 0x000000000046df74 in Log::i(Logger*, char const*, std::string const&, std::string const&, char const*, ...) ()
gettimeofday itself is not signal safe, as you already found out.
However, "POSIX.1-2008 marks gettimeofday() as obsolete, recommending the use of clock_gettime(2) instead."(source) and clock_gettime is signal-safe (and also thread-safe), so that might be an alternative for you.
gettimeofday is not defined as async-signal-safe, but if you pass 0 for the second (timezone) argument, it doesn't have to do anything that time and clock_gettime (both of which are officially async-signal-safe) don't also do, so as a matter of QoI it should be async-signal-safe in that case.
Your deadlock is inside the internal libc function __tz_convert.
#2 0x00007fc1b2d70d47 in __tz_convert () from /lib64/libc.so.6
#3 0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
It appears to have been called directly from Logger::getCurrentTimestamp, but that's because it was "tail called" from a documented API function. There are only four functions in GNU libc that call __tz_convert (I grepped the source code): localtime, localtime_r, gmtime, and gmtime_r. Therefore, your problem is not that you are calling gettimeofday, but that you are calling one of those functions.
localtime and gmtime are obviously not async-signal-safe since they write to a global variable. localtime_r and gmtime_r are not async-signal-safe either, because they have to look at the global database of timezone information (yes, even gmtime_r does this — it might be possible to change it not to need to do that, but it still wouldn't be a thing you could rely on cross-platform).
I don't think there's a good workaround. Formatted output from an async signal handler is going to trip over all kinds of other problems; my advice is to restructure your code so that you never need to call logging functions from async signal handlers.
I am working on a pthread code to do repeated matrix vector product. While doing so, I first wrote the serial matrix vector code for multiplication and then later I attempted to put the matrix vector product into separate threads.
The code https://github.com/viswans/parallel-computing-cs525/blob/pthread/pthread_page_rank/src/pthread/pagerankPthread.cpp does what I just described. Particularly when the number of threads is increased from 8 to 9, the binary results in a segmentation fault.
On debugging using gdb I noticed that there was a null pointer being dereferenced, and I added a watch point on that pointer to see if it is being set properly. What I noticed was that the argument to the function being called from pthread_create seems to be flushed and set to 0!
Old value = 37843
New value = 45242576
0x0000000000403436 in __gnu_cxx::new_allocator<(anonymous namespace)::ThreadStruct>::construct<(anonymous namespace)::ThreadStruct, (anonymous namespace)::ThreadStruct> (this=0x2b25970, __p=0x2b260e0) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.4/include/g++-v4/ext/new_allocator.h:120
120 { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
(gdb) c
Continuing.
[New Thread 0x7ffff2985700 (LWP 3390)]
[New Thread 0x7ffff2184700 (LWP 3391)]
[New Thread 0x7ffff1983700 (LWP 3392)]
[New Thread 0x7ffff1182700 (LWP 3393)]
Hardware watchpoint 3: *(0x2b260e8)
Old value = 45242576
New value = 0
0x00007ffff708eedb in __memset_sse2 () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff708eedb in __memset_sse2 () from /lib64/libc.so.6
#1 0x00007ffff7ded2e2 in allocate_dtv () from /lib64/ld-linux-x86-64.so.2
#2 0x00007ffff7ded9be in _dl_allocate_tls () from /lib64/ld-linux-x86-64.so.2
#3 0x00007ffff7bc9fc5 in pthread_create##GLIBC_2.2.5 () from /lib64/libpthread.so.0
#4 0x0000000000402b47 in PageRank::PageRankPthread::calculatePageRank (matrix=std::shared_ptr (count 1, weak 0) 0x2b258d0,�
input=std::vector of length 196591, capacity 196591 = {...}, num_threads=9, criterion=...) at src/pthread/pagerankPthread.cpp:84
#5 0x0000000000401d5d in mainPthread (argc=3, argv=0x7fffffffe6b8) at src/pthread/mainPthread.cpp:31
#6 0x000000000040be47 in main (argc=3, argv=0x7fffffffe6b8) at src/main.cpp:9
Any insight about why pthread_create would flush the arguments would be much appreciated.
Thanks
Sudharshan
You call push_back on the tstruct vector, which invalidates all pointers into that vector, causing the threads to access structures that have moved. One simple fix is to add tstruct.reserve(num_threads); after std::vector< ThreadStruct > tstruct;.
But you should really rethink this and do things in a more sensible way. Is a vector of structures a suitable collection to use when you need a pointer into the collection to remain valid as the collection is modified?
I'm communicating with a hardware device using QSerialPort. New data does not emit the "readyRead"-Signal, so I decided to write a read thread using QThread.
This is the code:
void ReadThread::run()
{
while(true){
readData();
if (buffer.size() > 0) parseData();
}
}
and
void ReadThread::readData()
{
buffer.append(device->readAll();
}
with buffer being an private QByteArray and device being a pointer to the QSerialPort. ParseData will parse the data and emit some signals. Buffer is cleared when parseData is left.
This works, however after some time (sometimes 10 seconds, sometimes 1 hour) the program crashes with SIGSEGV with the following trace:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff3498700 (LWP 24870)]
malloc_consolidate (av=av#entry=0x7fffec000020) at malloc.c:4151
(gdb) bt
#0 malloc_consolidate (av=av#entry=0x7fffec000020) at malloc.c:4151
#1 0x00007ffff62c2ee8 in _int_malloc (av=av#entry=0x7fffec000020, bytes=bytes#entry=32769) at malloc.c:3423
#2 0x00007ffff62c4661 in _int_realloc (av=av#entry=0x7fffec000020, oldp=oldp#entry=0x7fffec0013b0, oldsize=oldsize#entry=64, nb=nb#entry=32784) at malloc.c:4286
#3 0x00007ffff62c57b9 in __GI___libc_realloc (oldmem=0x7fffec0013c0, bytes=32768) at malloc.c:3029
#4 0x00007ffff70d1cdd in QByteArray::reallocData(unsigned int, QFlags<QArrayData::AllocationOption>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5 0x00007ffff70d1f07 in QByteArray::resize(int) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#6 0x00007ffff799f9fc in free (bytes=<optimized out>, this=0x609458)
at ../../include/QtSerialPort/5.3.2/QtSerialPort/private/../../../../../src/serialport/qt4support/include/private/qringbuffer_p.h:140
#7 read (maxLength=<optimized out>, data=<optimized out>, this=0x609458)
at ../../include/QtSerialPort/5.3.2/QtSerialPort/private/../../../../../src/serialport/qt4support/include/private/qringbuffer_p.h:326
#8 QSerialPort::readData (this=<optimized out>, data=<optimized out>, maxSize=<optimized out>) at qserialport.cpp:1341
#9 0x00007ffff722bdf0 in QIODevice::read(char*, long long) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#10 0x00007ffff722cbaf in QIODevice::readAll() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#11 0x00007ffff7bd0741 in readThread::readData (this=0x6066c0) at ../reader.cpp:212
#12 0x00007ffff7bc80d0 in readThread::run (this=0x6066c0) at ../reader.cpp:16
#13 0x00007ffff70cdd2e in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#14 0x00007ffff6e1c0a4 in start_thread (arg=0x7ffff3498700) at pthread_create.c:309
#15 0x00007ffff632f04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
I'm not sure how to reproduce the problem correctly, since it appears randomly. If I comment out the "readData()" in my while loop, the crashes do not appear anymore (of course no data can be parsed, then).
Does anyone have a clue what this could be?
What is the buffer? Could it be, another thread is reading the data from the buffer and clears it afterwards?
Try to lock it (and all other data shared between threads) e.g. with a mutex
QMutex mx; // could be also member of the ReadThread class
void ReadThread::readData()
{
mx.lock();
buffer.append(device->readAll();
mx.unlock();
}
And do the same in the code which reads and clears the buffer from another thread (I'm not doing the assumption, that this is parseData())
Another possibility could be, parseData() calls some code running in GUI-Thread. This doesn't work in Qt4 and probably also in Qt5
You're using the instance of a QObject from multiple threads at once. This generally speaking leads to undefined behavior, as you've just seen. QSerialPort will work just fine on the GUI thread. Only once you get it to work there, you can move it to a worker thread.
Note that if the event loop (app.exec() call in main() or QThread::run()) isn't executing, the signals won't be happening. It looks as if you tried to write pseudo synchronous code and have (predictably) failed. Don't do that.
Something like this is supposed to work:
#include <QtCore>
#include <QtSerialPort>
int main(int argc, char ** argv) {
QCoreApplication app(argc, argv);
QSerialPort port;
port.setPortName(...);
port.setBaudRate(...);
... // etc
if (! port.open(QIODevice::ReadWrite)) {
qWarning() << "can't open the port";
return 1;
}
... // set the port
connect(&port, &QIODevice::readyRead, [&]{
qDebug() << "got" << port.readAll().size() << "bytes";
});
return app.exec(); // the signals will be emitted from here
}
Ensure that all serial port related objects are initialized and used only in the separate thread. Send received data or parsed events to the UI thread by using signal/slot-mechanism.
Note also that if you inherit QThread in readThread, the constructor may be executed in the UI thread and other functions in the readThread. In that case, start the readThread and run separate initialization function before other functions (for example, by sending proper signal from the UI thread).
I wrote a program which uses massive parallel execution. I am working with an Array of objects and an Array of mutexes for synchronization. My code Looks something like this:
std::vector<MyObject> objects;
std::vector<std::mutex> mutexes;
void work(int data)
{
for(unsigned int i = 0; i < objects.size(); ++i)
{
//Check if data Needs to be processed for objects[i]
if(dontNeedToProcess)continue;
mutexes[i].lock();
//Work with data for objects[i]
mutexes[i].unlock();
}
}
The function "work" is called by multiple threads with different data. After some hours (sometimes even days) the program is stuck. When I run it with gdb I can see that the program hangs while locking the mutex.
The Problem is now that I compiled the progrem with optimization (-O2) and the mutexes and "i" are optimized out.
Is it possible that the optimization causes this behavior when using an Array of mutexes?
Edit:
All the threads are at the same Position. The Backtrace Looks like the following:
#0 0xb7779d3c in __kernel_vsyscall ()
#1 0xb67ff672 in __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xb67fb1c2 in _L_lock_920 ()
from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xb67fb043 in __GI___pthread_mutex_lock (mutex=0x9de56530)
at ../nptl/pthread_mutex_lock.c:114
#4 0xb69043c4 in pthread_mutex_lock (mutex=0x9de56530) at forward.c:192
#5 0xb72754f3 in pthread_mutex_lock ()
from /usr/lib/i386-linux-gnu/libasan.so.1
#6 0x0809ac85 in __gthread_mutex_lock (__mutex=<optimized out>)
at /usr/include/i386-linux-gnu/c++/4.9/bits/gthr-default.h:748
#7 lock (this=<optimized out>) at /usr/include/c++/4.9/mutex:135
...
I am trying to use std::shared_ptr to point to the data being produced by one thread and consumed by another. The storage field is a shared pointer to the base class,
Here's the simplest Google Test I could create that reproduced the problem:
#include "gtest/gtest.h"
#include <thread>
struct A
{
virtual ~A() {}
virtual bool isSub() { return false; }
};
struct B : public A
{
bool isSub() override { return true; }
};
TEST (SharedPointerTests, threadedProducerConsumer)
{
int loopCount = 10000;
shared_ptr<A> ptr;
thread producer([loopCount,&ptr]()
{
for (int i = 0; i < loopCount; i++)
ptr = make_shared<B>(); // <--- THREAD
});
thread consumer([loopCount,&ptr]()
{
for (int i = 0; i < loopCount; i++)
shared_ptr<A> state = ptr; // <--- THREAD
});
producer.join();
consumer.join();
}
When run, sometimes gives:
[ RUN ] SharedPointerTests.threadedProducerConsumer
pure virtual method called
terminate called without an active exception
Aborted (core dumped)
GDB shows the crash with two threads at the locations shown. The stacks follow:
Stack 1
#0 0x00000000006f430a in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fffe00008c0)
at /usr/include/c++/4.8/bits/shared_ptr_base.h:144
#1 0x00000000006f26a7 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffdf960bc8,
__in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:553
#2 0x00000000006f1692 in std::__shared_ptr<A, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffdf960bc0,
__in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:810
#3 0x00000000006f16ca in std::shared_ptr<A>::~shared_ptr (this=0x7fffdf960bc0, __in_chrg=<optimized out>)
at /usr/include/c++/4.8/bits/shared_ptr.h:93
#4 0x00000000006e7288 in SharedPointerTests_threadedProducerConsumer_Test::__lambda2::operator() (__closure=0xb9c940)
at /home/drew/dev/SharedPointerTests.hh:54
#5 0x00000000006f01ce in std::_Bind_simple<SharedPointerTests_threadedProducerConsumer_Test::TestBody()::__lambda2()>::_M_invoke<>(std::_Index_tuple<>) (this=0xb9c940) at /usr/include/c++/4.8/functional:1732
#6 0x00000000006efe13 in std::_Bind_simple<SharedPointerTests_threadedProducerConsumer_Test::TestBody()::__lambda2()>::operator()(void) (
this=0xb9c940) at /usr/include/c++/4.8/functional:1720
#7 0x00000000006efb7c in std::thread::_Impl<std::_Bind_simple<SharedPointerTests_threadedProducerConsumer_Test::TestBody()::__lambda2()> >::_M_run(void) (this=0xb9c928) at /usr/include/c++/4.8/thread:115
#8 0x00007ffff6d19ac0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00007ffff717bf8e in start_thread (arg=0x7fffdf961700) at pthread_create.c:311
#10 0x00007ffff647ee1d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Stack 2
#0 0x0000000000700573 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<B, std::allocator<B>, (__gnu_cxx::_Lock_policy)2> > >::_S_destroy<std::_Sp_counted_ptr_inplace<B, std::allocator<B>, (__gnu_cxx::_Lock_policy)2> > (__a=..., __p=0x7fffe00008f0)
at /usr/include/c++/4.8/bits/alloc_traits.h:281
#1 0x00000000007003b6 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<B, std::allocator<B>, (__gnu_cxx::_Lock_policy)2> > >::destroy<std::_Sp_counted_ptr_inplace<B, std::allocator<B>, (__gnu_cxx::_Lock_policy)2> > (__a=..., __p=0x7fffe00008f0)
at /usr/include/c++/4.8/bits/alloc_traits.h:405
#2 0x00000000006ffe76 in std::_Sp_counted_ptr_inplace<B, std::allocator<B>, (__gnu_cxx::_Lock_policy)2>::_M_destroy (
this=0x7fffe00008f0) at /usr/include/c++/4.8/bits/shared_ptr_base.h:416
#3 0x00000000006f434c in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fffe00008f0)
at /usr/include/c++/4.8/bits/shared_ptr_base.h:161
#4 0x00000000006f26a7 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffe8161b68,
__in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:553
#5 0x00000000006f16b0 in std::__shared_ptr<A, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffe8161b60,
__in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:810
#6 0x00000000006f4c3f in std::__shared_ptr<A, (__gnu_cxx::_Lock_policy)2>::operator=<B>(std::__shared_ptr<B, (__gnu_cxx::_Lock_policy)2>&&) (this=0x7fffffffdcb0, __r=<unknown type in /home/drew/dev/unittests, CU 0x0, DIE 0x58b8c>)
at /usr/include/c++/4.8/bits/shared_ptr_base.h:897
#7 0x00000000006f2d2a in std::shared_ptr<A>::operator=<B>(std::shared_ptr<B>&&) (this=0x7fffffffdcb0,
__r=<unknown type in /home/drew/dev/unittests, CU 0x0, DIE 0x55e1c>)
at /usr/include/c++/4.8/bits/shared_ptr.h:299
#8 0x00000000006e7232 in SharedPointerTests_threadedProducerConsumer_Test::__lambda1::operator() (__closure=0xb9c7a0)
at /home/drew/dev/SharedPointerTests.hh:48
#9 0x00000000006f022c in std::_Bind_simple<SharedPointerTests_threadedProducerConsumer_Test::TestBody()::__lambda1()>::_M_invoke<>(std::_Index_tuple<>) (this=0xb9c7a0) at /usr/include/c++/4.8/functional:1732
#10 0x00000000006efe31 in std::_Bind_simple<SharedPointerTests_threadedProducerConsumer_Test::TestBody()::__lambda1()>::operator()(void) (
this=0xb9c7a0) at /usr/include/c++/4.8/functional:1720
#11 0x00000000006efb9a in std::thread::_Impl<std::_Bind_simple<SharedPointerTests_threadedProducerConsumer_Test::TestBody()::__lambda1()> >::_M_run(void) (this=0xb9c788) at /usr/include/c++/4.8/thread:115
#12 0x00007ffff6d19ac0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007ffff717bf8e in start_thread (arg=0x7fffe8162700) at pthread_create.c:311
#14 0x00007ffff647ee1d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
I have tried various approaches here, including using std::dynamic_pointer_cast but I haven't had any luck.
In reality the producer stores many different subclasses of A by their type_id in a std::map<type_id const*,std::shared_ptr<A>> (one instance per type) which I look up from the consumer by type.
My understanding is that std::shared_ptr is threadsafe for these types of operations. What am I missing?
shared_ptr has thread-safety on its control block. When a shared_ptr is created and points to a newly created resource it creates a control block. According to MSDN this holds:
The shared_ptr objects that own a resource share a control block. The control block holds:
the number of shared_ptr objects that own the resource,
the number of weak_ptr objects that point to the resource,
the deleter for that resource if it has one,
the custom allocator for the control block if it has one.
This means that shared_ptr will ensure that there are no synchronization issues with multiple copies of shared_ptr pointing to the same memory. However, it does not manage the synchronization of the memory itself. See the section on thread safety (emphasis mine)
Multiple threads can read and write different shared_ptr objects at the same time, even when the objects are copies that share ownership.
Your code shares ptr which means you have a data race. Also note that it is possible for your producer thread to produce several objects before the consumer thread is scheduled to run, meaning that you lose some objects.
As has been pointed out in a comment, you can use atomic operations on shared_ptr. The producer thread then looks like:
thread producer([loopCount,&ptr]()
{
for (int i = 0; i < loopCount; i++)
{
auto p = std::make_shared<B>(); // <--- THREAD
std::atomic_store<A>( &ptr, p );
}
});
The object is created and then atomically stored into ptr. The consumer then needs to atomically load the object.
thread consumer([loopCount,&ptr]()
{
for (int i = 0; i < loopCount; i++)
{
auto state = std::atomic_load<A>( &ptr ); // <--- THREAD
}
});
This still has the disadvantage that objects will be lost when the producer thread is allowed to run for multiple iterations.
These examples were written in Visual Studio 2012. At this time, gcc hasn't fully implemented atomic shared_ptr access, as noted in section 20.7.2.5