multi-producers/consumers performance - c++

I've written an SharedQueue which is intended to work with several producers/consumers.
class SharedQueue : public boost::noncopyable
{
public:
SharedQueue(size_t size) : m_size(size){};
~SharedQueue(){};
int count() const {return m_container.size();};
void enqueue(int item);
bool enqueue(int item, int millisecondsTimeout);
private:
const size_t m_size;
boost::mutex m_mutex;
boost::condition_variable m_buffEmpty;
boost::condition_variable m_buffFull;
std::queue<int> m_container;
};
void SharedQueue::enqueue(int item)
{
{
boost::mutex::scoped_lock lock(m_mutex);
while(!(m_container.size() < m_size))
{
std::cout << "Queue is full" << std::endl;
m_buffFull.wait(lock);
}
m_container.push(item);
}
m_buffEmpty.notify_one();
}
int SharedQueue::dequeue()
{
int tmp = 0;
{
boost::mutex::scoped_lock lock(m_mutex);
if(m_container.size() == 0)
{
std::cout << "Queue is empty" << std::endl;
m_buffEmpty.wait(lock);
}
tmp = m_container.front();
m_container.pop();
}
m_buffFull.notify_one();
return tmp;
}
SharedQueue Sq(1000);
void producer()
{
int i = 0;
while(true)
{
Sq.enqueue(++i);
}
}
void consumer()
{
while(true)
{
std::cout << "Poping: " << Sq.dequeue() << std::endl;
}
}
int main()
{
boost::thread Producer(producer);
boost::thread Producer1(producer);
boost::thread Producer2(producer);
boost::thread Producer3(producer);
boost::thread Producer4(producer);
boost::thread Consumer(consumer);
Producer.join();
Producer1.join();
Producer2.join();
Producer3.join();
Producer4.join();
Consumer.join();
return 0;
}
As you can see I use boost::condition_variable. Is there any way to make the performance better? Perhaps I should consider any other synchronization method?

In real-live scenarios not synthetic tests I think your implementation is good enough.
If however you're expecting 106 or more operations per second, and you're developing for Windows, then your solution is not that good.
On Windows, Boost traditionally sucks really bad when you're using multithreading classes.
For mutexes, CriticalSection objects are usually much faster. For cond.vars,
authors of the boost are reinventing the wheel instead of using the correct Win32 API.
On Windows, I expect the native multi-producers/consumer queue object called "I/O completion port" to be several times more effective than any user-mode implementation possible. It's main goal is I/O, however it's perfectly OK calling PostQueuedCompletionStatus API to post anything you want to the queue. The only drawback - the queue has no upper limit, so you must limit the queue size yourself.

This is not a direct answer to your question, but it might be a good alternative.
Depending on how much you want to increase the performance, it may be worthwhile to take a look at the Disruptor Pattern: http://www.2robots.com/2011/08/13/a-c-disruptor/

Related

How to safely and properly use threads in C++?

I have a logging system for my application Now this is what i do:
static void Background() {
while(IsAlive){
while(!logs.empty()){
ShowLog(log.front());
log.pop();
}
while(logs.empty()){
Sleep(200);
}
}
}
static void Init(){
// Some Checks
Logger::backgroundThread = std::thread(Background);
backgroundThread.detach();
}
static void Log(std::string log){
logs.push(log);
}
static void ShowLog(std::string log){
// Actual implementation is bit complex but that does not involve threads so i guess it is irrelevant for this question
std::cout << log << std::endl;
}
Here is a log is a std::queue<std::string>.
Now i am not very sure about whether this is a good approach or not.
Is there any better way to achieve this.
Note i am using C++17
namespace { // Anonymous namespace instead of static functions.
std::mutex log_mutex;
void Background() {
while(IsAlive){
std::queue<std::string> log_records;
{
// Exchange data for minimizing lock time.
std::unique_lock lock(log_mutex);
logs.swap(log_records);
}
if (log_records.empty()) {
Sleep(200);
continue;
}
while(!log_records.empty()){
ShowLog(log_records.front());
log_records.pop();
}
}
}
void Log(std::string log){
std::unique_lock lock(log_mutex);
logs.push(std::move(log));
}
}

Is there something wrong with this rwLock implementation?

My program is deadlocking, and I have no idea why, given that it won't do it when I run it in a debugger, so my first suspect is my rwLock, I wrote my own version because I only wanted to use standard libraries--I don't think a rwLock is included until C++17--and this isn't the sort of thing I normally do.
class RwLock
{
std::mutex mutex;
std::unique_lock<std::mutex> unique_lock;
std::condition_variable condition;
int reading_threads;
bool writing_threads;
public:
RwLock();
~RwLock();
void read_lock();
void read_unlock();
void write_lock();
void write_unlock();
};
RwLock::RwLock() :
mutex(),
unique_lock(mutex, std::defer_lock),
condition(),
reading_threads(0),
writing_threads(false)
{
}
RwLock::~RwLock()
{
//TODO: find something smarter to do here.
write_lock();
}
void RwLock::read_lock()
{
unique_lock.lock();
while(writing_threads)
{
condition.wait(unique_lock);
}
++reading_threads;
unique_lock.unlock();
}
void RwLock::read_unlock()
{
unique_lock.lock();
if(--reading_threads == 0)
{
condition.notify_all();
}
unique_lock.unlock();
}
void RwLock::write_lock()
{
unique_lock.lock();
while(writing_threads)
{
condition.wait(unique_lock);
}
writing_threads = 1;
while(reading_threads)
{
condition.notify_all();
}
unique_lock.unlock();
}
void RwLock::write_unlock()
{
unique_lock.lock();
writing_threads = 0;
condition.notify_all();
unique_lock.unlock();
}
std::shared_timed_mutex exists prior to C++17: in C++14.
Use it instead, it will have fewer bugs and be faster almost certainly.
C++17 introduces shared_mutex which can be even faster. But I strongly doubt your ability to implement a faster shared rwlock than shared_timed_mutex using C++ standard primitives.
Looks good except for two issues in this code:
void RwLock::write_lock()
{
unique_lock.lock();
while(writing_threads)
{
condition.wait(unique_lock);
}
writing_threads = 1;
while(reading_threads)
{
condition.notify_all();
}
unique_lock.unlock();
}
First, you increment writing_threads too late. A reader could sneak in. It's possible that you don't mind or even want this, but typically this is undesired.
Second, your notify in the last while loop should be a wait. Putting it together, we get:
void RwLock::write_lock()
{
unique_lock.lock();
++writing_threads;
while((writing_threads > 1) || (reading_threads > 0))
{
condition.wait(unique_lock);
}
unique_lock.unlock();
}
void RwLock::write_unlock()
{
unique_lock.lock();
--writing_threads; // note change here
condition.notify_all();
unique_lock.unlock();
}
This is actually a bit simpler, which is nice.

slim reader writer lock Raii

I have a windows server application that uses multiple threads to handle requests. I needed a reader-writer lock to guard access to a shared std::unordered_map; and I wanted to do this in a manner similar to a std::unique_lock (resource acquisition is initialization). So I came up with this SRWRaii class.
class SRWRaii
{
public:
SRWRaii(const SRWLOCK& lock, bool m_exclusive = false)
:m_lock(lock), m_exclusive(m_exclusive)
{
if (m_exclusive)
{
AcquireSRWLockExclusive(const_cast<SRWLOCK*>(&m_lock));
}
else
{
AcquireSRWLockShared(const_cast<SRWLOCK*>(&m_lock));
}
}
~SRWRaii()
{
if (m_exclusive)
{
ReleaseSRWLockExclusive(const_cast<SRWLOCK*>(&m_lock));
}
else
{
ReleaseSRWLockShared(const_cast<SRWLOCK*>(&m_lock));
}
}
private:
const SRWLOCK& m_lock;
bool m_exclusive;
};
Then I use this as follows
SRWLOCK g_mutex;
void Initialize()
{
InitializeSRWLock(&g_mutex);
}
void Reader()
{
SRWRaii lock(g_mutex);
// Read from unordered_map
}
void Writer()
{
SRWRaii lock(g_mutex, true); // exclusive
// add or delete from unordered_map
}
Given my noviceness to c++, I am a little suspect of this critical code. Are there issues with the above approach of implementing an Raii wrapper over SRWLOCK? What improvements can be done to the above code?

Threads in C++11

I'm updating my C++ skills to C++11. I'm up to threads, always a problem area. Consider this testing code:
// threaded.h
class MyThreadedClass
{
public:
MyThreadClass();
bool StartThread();
bool IsThreadDone();
inline void WorkThread();
private:
std::thread* workThread;
atomic<bool> threadDone;
}
// threaded.cpp
MyThreadedClass::MyThreadedClass() {
workThread = nullptr;
threadDone.store(true);
}
bool MyThreadedClass::StartThread() {
if (!threadDone.load()) { return false; }
threadDone.store(false);
workThread = new std::thread(&MyThreadedClass:WorkThread, this);
workThread->detach();
return true;
}
bool MyThreadedClass:IsThreadDone() {
return threadDone.load();
}
inline void MyThreadedClass::WorkThread() {
while (some_condition) { /*Do work*/ }
threadDone.store(true);
}
// main.cpp
int main() {
MyThreadedClass testInstance;
testInstance.StartThread();
for (int x = 0; x < 10; x++) {
do {
// This is sometimes true:
if (testInstance.StartThread()) { return 1;}
} while (!testInstance.IsThreadDone())
}
return 0;
}
I wanted to look at a worst case scenario for this type of code therefore I'm pounding it continually in main while waiting for thread to terminate. Sometimes the failure condition in main is triggered. As with many threading problems it's not consistent and therefore not easy to debug.
The threadDone variable is used because I access a file in my actual code and don't want multiple threads accessing the same file.
Insight into what I'm missing or ways to redesign this with C++11 idioms welcome.
With std::mutex instead of std::atomic, the implementation is quite simple.
class MyThreadedClass
{
std::mutex _mutex;
std::unique_ptr<std::thread> _thread;
bool _done{false};
public:
MyThreadClass() = default;
bool StartThread()
{
std::lock_guard<std::mutex> lock(_mutex);
if (_thread || _done) return false;
_thread = std::make_unique<std::thread>(&MyThreadedClass, this);
return true;
}
bool IsThreadDone()
{
std::lock_guard<std::mutex> lock(_mutex);
return _done;
}
void WorkThread()
{
// do some work
std::lock_guard<std::mutex> lock(_mutex);
_done = true;
}
}
The best reference / learning book I know of for C++11 concurrency is here: C++ Concurrency in Action: Practical Multithreading by Anthony Williams.
You have a race condition (though not a "data race") in your main() function, which I've transcribed, inlined, and simplified below:
void MyThreadedClass::WorkThread() {
threadDone.store(true);
}
int main() {
MyThreadedClass testInstance;
testInstance.threadDone.store(false);
new std::thread(&MyThreadedClass::WorkThread, &testInstance)->detach();
// This sometimes fails:
assert(!testInstance.threadDone.load());
return 0;
}
That assertion will fail if WorkThread happens to run to completion before the main thread is scheduled again. Since the standard doesn't constrain the scheduling, you'd need to write some code to block if you need the worker thread to wait.

Multithreaded event system

I am trying to design a multithreaded event system in C++. In it, the objects may be located in different threads and every object should be able to queue events for other threads. Each thread has its own event queue and event dispatcher, as well as an event loop. It should be possible to change the thread affinity of the objects.
Let's say we have two threads: A and B, and an object myobj, which belongs to B. Obviously, A needs a pointer to myobj in order to be able to send events to it. A doesn't have any pointer to B, but it needs some way to get a reference to it in order to be able to lock the event queue and add the event to it.
I could store a pointer to B in myobj, but then I obviously need to protect myobj. If I place a mutex in myobj, myobj could be destructed while the mutex is being locked, thus causing a segmentation fault.
I could also use a global table where I associate each object with its corresponding thread. However, this would consume a lot of memory and cause any thread that wants to send an event to block until A has finish
ed.
What is the most efficient safe strategy to implement this? Is there perhaps some kind of design pattern for this?
Thanks in advance.
I've implemented a thread wrapper base class ThreadEventComponent for sending and processing events between instances of itself. Each ThreadEventComponent has it's own event queue that is automatically locked internally whenever used. The events themselves are negotiated by a static map of type map<EventKey, vector<ThreadEventComponent*>> that is also automatically locked whenever used. As you can see, multiple ThreadEventComponent derived instances can subscribe to the same event. Each event sent with SendEvent(Event*) is copied per instance to insure that multiple threads aren't fighting over the same data held within the event.
Admittedly, this is not the most efficient strategy, opposed to sharing memory. There are optimizations to be made regarding the addEvent(Event&)method. With drawbacks aside, it does work well for configuring a thread to do some operation outside of the main thread.
Both MainLoop() and ProcessEvent(Event*) are virtual functions to be implemented by the derived class. ProcessEvent(Event*) is called whenever an event is available in the queue. After that, MainLoop() is called regardless of the event queue state. MainLoop() is where you should tell your thread to sleep and where any other operations such as file reading/writing or network reading/writing should go.
The following code is something I've been working on for my own person use to get my head wrapped around threading in C++. This code has never been reviewed, so I'd love to hear any suggestions you have. I am aware of two elements that are less than desirable in this code sample. 1) I'm using new at run-time, the drawback being that finding memory takes time, but this can be mitigated by creating a memory buffer to construct new events over in the ThreadEventComponent base class. 2)Event casting to TEvent<T> can cause run-time errors if not implemented correctly in ProcessEvent. I'm not sure what the best solution for this is.
Note: I have EventKey implemented as a string, but you can change it to whatever type you wish as long as it has a default value along with the equality and assignment operators available.
Event.h
#include <string>
using namespace std;
typedef string EventKey;
class Event
{
public:
Event()
: mKey()
{
}
Event(EventKey key)
: mKey(key)
{
}
Event(const Event& e)
: mKey(e.mKey)
{
}
virtual ~Event()
{
}
EventKey GetKey()
{
return mKey;
}
protected:
EventKey mKey;
};
template<class T>
class TEvent : public Event
{
public:
TEvent()
: Event()
{
}
TEvent(EventKey type, T& object)
: Event(type), mObject(object)
{
}
TEvent(const TEvent<T>& e)
: Event(e.mKey), mObject(e.mObject)
{
}
virtual ~TEvent()
{
}
T& GetObject()
{
return mObject;
}
private:
T mObject;
};
ThreadEventComponent.h
#include "Event.h"
#include <thread>
#include <atomic>
#include <algorithm>
#include <vector>
#include <queue>
#include <map>
#include <mutex>
#include <assert.h>
class ThreadEventComponent
{
public:
ThreadEventComponent();
~ThreadEventComponent();
void Start(bool detached = false);
void Stop();
void ForceStop();
void WaitToFinish();
virtual void Init() = 0;
virtual void MainLoop() = 0;
virtual void ProcessEvent(Event* incoming) = 0;
template<class T>
void SendEvent(TEvent<T>& e)
{
sEventListLocker.lock();
EventKey key = e.GetKey();
for (unsigned int i = 0; i < sEventList[key].size(); i++)
{
assert(sEventList[key][i] != nullptr);
sEventList[key][i]->addEvent<T>(e);
}
sEventListLocker.unlock();
}
void SendEvent(Event& e);
void Subscribe(EventKey key);
void Unsubscribe(EventKey key);
protected:
template<class T>
void addEvent(TEvent<T>& e)
{
mQueueLocker.lock();
// The event gets copied per thread
mEventQueue.push(new TEvent<T>(e));
mQueueLocker.unlock();
}
void addEvent(Event& e);
thread mThread;
atomic<bool> mShouldExit;
private:
void threadLoop();
queue<Event*> mEventQueue;
mutex mQueueLocker;
typedef map<EventKey, vector<ThreadEventComponent*>> EventMap;
static EventMap sEventList;
static mutex sEventListLocker;
};
ThreadEventComponent.cpp
#include "ThreadEventComponent.h"
ThreadEventComponent::EventMap ThreadEventComponent::sEventList = ThreadEventComponent::EventMap();
std::mutex ThreadEventComponent::sEventListLocker;
ThreadEventComponent::ThreadEventComponent()
{
mShouldExit = false;
}
ThreadEventComponent::~ThreadEventComponent()
{
}
void ThreadEventComponent::Start(bool detached)
{
mShouldExit = false;
mThread = thread(&ThreadEventComponent::threadLoop, this);
if (detached)
mThread.detach();
}
void ThreadEventComponent::Stop()
{
mShouldExit = true;
}
void ThreadEventComponent::ForceStop()
{
mQueueLocker.lock();
while (!mEventQueue.empty())
{
delete mEventQueue.front();
mEventQueue.pop();
}
mQueueLocker.unlock();
mShouldExit = true;
}
void ThreadEventComponent::WaitToFinish()
{
if(mThread.joinable())
mThread.join();
}
void ThreadEventComponent::SendEvent(Event& e)
{
sEventListLocker.lock();
EventKey key = e.GetKey();
for (unsigned int i = 0; i < sEventList[key].size(); i++)
{
assert(sEventList[key][i] != nullptr);
sEventList[key][i]->addEvent(e);
}
sEventListLocker.unlock();
}
void ThreadEventComponent::Subscribe(EventKey key)
{
sEventListLocker.lock();
if (find(sEventList[key].begin(), sEventList[key].end(), this) == sEventList[key].end())
{
sEventList[key].push_back(this);
}
sEventListLocker.unlock();
}
void ThreadEventComponent::Unsubscribe(EventKey key)
{
sEventListLocker.lock();
// Finds event listener of correct type
EventMap::iterator mapIt = sEventList.find(key);
assert(mapIt != sEventList.end());
// Finds the pointer to itself
std::vector<ThreadEventComponent*>::iterator elIt =
std::find(mapIt->second.begin(), mapIt->second.end(), this);
assert(elIt != mapIt->second.end());
// Removes it from the event list
mapIt->second.erase(elIt);
sEventListLocker.unlock();
}
void ThreadEventComponent::addEvent(Event& e)
{
mQueueLocker.lock();
// The event gets copied per thread
mEventQueue.push(new Event(e));
mQueueLocker.unlock();
}
void ThreadEventComponent::threadLoop()
{
Init();
bool shouldExit = false;
while (!shouldExit)
{
if (mQueueLocker.try_lock())
{
if (mEventQueue.empty())
{
mQueueLocker.unlock();
if(mShouldExit)
shouldExit = true;
}
else
{
Event* e = mEventQueue.front();
mEventQueue.pop();
mQueueLocker.unlock();
ProcessEvent(e);
delete e;
}
}
MainLoop();
}
}
Example Class - A.h
#include "ThreadEventComponent.h"
class A : public ThreadEventComponent
{
public:
A() : ThreadEventComponent()
{
}
void Init()
{
Subscribe("a stop");
Subscribe("a");
}
void MainLoop()
{
this_thread::sleep_for(50ms);
}
void ProcessEvent(Event* incoming)
{
if (incoming->GetKey() == "a")
{
auto e = static_cast<TEvent<vector<int>>*>(incoming);
mData = e->GetObject();
for (unsigned int i = 0; i < mData.size(); i++)
{
mData[i] = sqrt(mData[i]);
}
SendEvent(TEvent<vector<int>>("a done", mData));
}
else if(incoming->GetKey() == "a stop")
{
StopWhenDone();
}
}
private:
vector<int> mData;
};
Example Class - B.h
#include "ThreadEventComponent.h"
int compare(const void * a, const void * b)
{
return (*(int*)a - *(int*)b);
}
class B : public ThreadEventComponent
{
public:
B() : ThreadEventComponent()
{
}
void Init()
{
Subscribe("b stop");
Subscribe("b");
}
void MainLoop()
{
this_thread::sleep_for(50ms);
}
void ProcessEvent(Event* incoming)
{
if (incoming->GetKey() == "b")
{
auto e = static_cast<TEvent<vector<int>>*>(incoming);
mData = e->GetObject();
qsort(&mData[0], mData.size(), sizeof(int), compare);
SendEvent(TEvent<vector<int>>("b done", mData));
}
else if (incoming->GetKey() == "b stop")
{
StopWhenDone();
}
}
private:
vector<int> mData;
};
Test Example - main.cpp
#include <iostream>
#include <random>
#include "A.h"
#include "B.h"
class Master : public ThreadEventComponent
{
public:
Master() : ThreadEventComponent()
{
}
void Init()
{
Subscribe("a done");
Subscribe("b done");
}
void MainLoop()
{
this_thread::sleep_for(50ms);
}
void ProcessEvent(Event* incoming)
{
if (incoming->GetKey() == "a done")
{
TEvent<vector<int>>* e = static_cast<TEvent<vector<int>>*>(incoming);
cout << "A finished" << endl;
mDataSetA = e->GetObject();
for (unsigned int i = 0; i < mDataSetA.size(); i++)
{
cout << mDataSetA[i] << " ";
}
cout << endl << endl;
}
else if (incoming->GetKey() == "b done")
{
TEvent<vector<int>>* e = static_cast<TEvent<vector<int>>*>(incoming);
cout << "B finished" << endl;
mDataSetB = e->GetObject();
for (unsigned int i = 0; i < mDataSetB.size(); i++)
{
cout << mDataSetB[i] << " ";
}
cout << endl << endl;
}
}
private:
vector<int> mDataSetA;
vector<int> mDataSetB;
};
int main()
{
srand(time(0));
A a;
B b;
a.Start();
b.Start();
vector<int> data;
for (int i = 0; i < 100; i++)
{
data.push_back(rand() % 100);
}
Master master;
master.Start();
master.SendEvent(TEvent<vector<int>>("a", data));
master.SendEvent(TEvent<vector<int>>("b", data));
master.SendEvent(TEvent<vector<int>>("a", data));
master.SendEvent(TEvent<vector<int>>("b", data));
master.SendEvent(Event("a stop"));
master.SendEvent(Event("b stop"));
a.WaitToFinish();
b.WaitToFinish();
// cin.get();
master.StopWhenDone();
master.WaitToFinish();
return EXIT_SUCCESS;
}
I have not used it myself, but Boost.Signals2 claims to be thread-safe.
The primary motivation for Boost.Signals2 is to provide a version of the original Boost.Signals library which can be used safely in a multi-threaded environment.
Of course, using this would make your project depend on boost, which might not be in your interest.
[edit] It seems slots are executed in the emitting thread (no queue), so this might not be what you had in mind after all.
I'd consider making the thread part of classes to encapsulate them. That way you can easily design your interfaces around the thread loops (provided as member functions of these classes) and have defined entry points to send data to the thread loop (e.g. using a std::queue protected with a mutex).
I don't know if this is a designated, well known design pattern, but that's what I'm using for my all day productive code at work, and I (and my colleagues) feel and experience pretty good with it.
I'll try to give you a point:
class A {
public:
A() {}
bool start();
bool stop();
bool terminate() const;
void terminate(bool value);
int data() const;
void data(int value);
private:
std::thread thread_;
void threadLoop();
bool terminate_;
mutable std::mutex internalDataGuard_;
int data_;
};
bool A::start() {
thread_ = std::thread(std::bind(this,threadLoop));
return true;
}
bool A::stop() {
terminate(true);
thread_.join();
return true;
}
bool A::terminate() const {
std::lock_guard<std::mutex> lock(internalDataGuard_);
return terminate_;
}
void A::terminate(bool value) {
std::lock_guard<std::mutex> lock(internalDataGuard_);
terminate_ = value;
}
int A::data() const {
std::lock_guard<std::mutex> lock(internalDataGuard_);
return data_;
}
void A::data(int value) {
std::lock_guard<std::mutex> lock(internalDataGuard_);
data_ = value;
// Notify thread loop about data changes
}
void A::threadLoop() {
while(!terminate())
{
// Wait (blocking) for data changes
}
}
To setup signalling of data changes there are several choices and (OS) constraints:
The simplest thing you could use to wake up the thread loop to process changed/new data is a semaphore. In c++11 the nearest approx for a semaphore is a condition variable. Advanced versions of the pthreads API also provide condition variable support. Anyway since only one thread should be waiting there, and no kind of event broadcasing is necessary, it should be easy to implement with simple locking mechanisms.
If you have the choice to use an advanced OS, you might prefer implementing event signalling using s.th. like poll(), which provides lock-free implementation at the user space.
Some frameworks like boost, Qt, Platinum C++, and others also support event handling by signal/slot abstractions, you might have a look at their documentation and implementation to get a grip what's necessary/state of the art.
Obviously, A needs a pointer to myobj in order to be able to send
events to it.
I question the above assumption -- To me, allowing thread A to have a pointer to an object that is controlled/owned/accessed by thread B is kind of asking for trouble... in particular, some code running in thread A might be tempted later on to use that pointer to directly call methods on myobj, causing race conditions and discord; or B might delete myobj, at which point A is holding a dangling-pointer and is thereby in a precarious state.
If I was designing the system, I would try to do it in such a way that cross-thread messaging was done without requiring pointers-to-objects-in-other-threads, for the reasons you mention -- they are unsafe, in particular such a pointer might become a dangling-pointer at any time.
So then the question becomes, how do I send a message to an object in another thread, if I don't have a pointer to that object?
One way would be to give each object a unique ID by which it can be specified. This ID could be an integer (either hard-coded or dynamically assigned using an atomic counter or similar), or perhaps a short string if you wanted it to be more easily human-readable.
Then instead of the code in thread A sending the message directly to myobj, it would send a message to thread B, and the message would include a field indicating the ID of the object that is intended to receive the message.
When thread B's event loop receives the message, it would use the included ID value to look up the appropriate object (using an efficient key-value lookup mechanism such as std::unordered_map) and call the appropriate method on that object. If the object had already been destroyed, then the key-value lookup would fail (because you'd have a mechanism to make sure that the object removed itself from its thread's object-map as part of its destructor), and thus trying to send a message to a destroyed-object would fail cleanly (as opposed to invoking undefined behavior).
Note that this approach does mean that thread A's code has to know which thread myobj is owned by, in order to know which thread to send the message to. Typically thread A would need to know that anyway, but if you're going for a design that abstracts away even the knowledge about which thread a given object is running in, you could include an owner-thread-ID as part of the object-ID, so that your postMessage() method could examine the destination-object-ID to figure out which thread to send the message to.