safely distributing a pointer update between threads

safely distributing a pointer update between threads - c++

tl;dr:
class Controller
{
public:
volatile Netconsole* nc;
void init(); //initialize the threads
void calculate(); // handler for the "mothership app"
void senderThreadLoop(); //also calls reinitNet() if connection is broken.
void listenerThreadLoop();
inline void reinitNet(){ delete nc; nc = new Netconsole(); }
}
// inside
Json::Value header = nc->Recv();
error: passing 'volatile Netconsole' as 'this' argument discards qualifiers [-fpermissive]
Pointer to an instance of a utility class (Netconsole) shared between two threads must be updated inside both threads if the utility class is re-instantiated, but declaring it as volatile generates the above error. If it's updated just inside one thread, the other thread may still use old, invalid pointer. How to assure it's updated in both but using methods through the pointer doesn't trigger the above error?
Extended info:
The "smart glue logic" library I'm writing is used to pass and convert messages between a 3rd party software and a custom device. It consists of three essential threads:
a handler: the main thread of the 3rd party app periodically calls a "calculate" function in my library to handle new updates - data to send, data received
a sender thread that converts and sends whatever the handler pushed into the send buffer
a listener thread that converts and pushes any data received from the device into receive buffer.
Both the sender and the listener threads use the same utility class that handles network communication with the device; upon initialization the class creates a connection to the device, and the two threads perform blocking reads or await for new data to send respectively. In case of any problems, the sender thread performs all "maintenance" work, while the listener thread enters a safe state awaiting return of connectivity.
Now, since the two threads share one connection to the device, they both share the same instance of the communication class, as a pointer to that class.
The problem is in the procedure of reconnect - it involves destroying and creating the helper class instance exploiting safe shutdown and initialization already present in the destructor and constructor. As result the pointer changes. Without volatile it's quite likely the listener won't receive the updated pointer. With volatile, it protests - needlessly, because nc (the pointer) won't change at a random moment - first the listener is notified of a problem, then it enters a safe state where it doesn't perform any operations on 'nc' and notifies the sender it's ready. Only then the sender performs the repair and notifies the listener to resume normal operation.
So what's the right solution in this situation?

What you need is a sequence of operations. The producing thread has 2 relevant operations : "initialize new Netconsole" and "write pointer". The consuming thread also has two operations: "read pointer" and "use new Netconsole object". Those 4 operations must be sequenced in exactly that order for the update to be visible.
By far the simplest way to achieve this are two memory barriers. A write barrier (std::memory_order_release on the pointer write) prevents the first two operations from being reordered, and the read barrier (std::memory_order_acquire on the pointer load) prevents the last two operations from being reordered.
As the two threads run independently, your program correctness shouldn't depend on whether a particular object update happened before a particular object use. The updating thread might just have been a bit slow, and that should not break your program. So the third ordering between write and read isn't really relevant and you shouldn't try to "fix" it.
To summarize: Yes, the 4 operations have to happen in exactly the right order for the result to be visible, but if the second and third operation are
reordered then the update is perfectly invisible to the consuming thread. It's an atomic update, all or nothing.
There's still a matter of cleaning up the old object. The producing thread cannot just assume that the consuming thread has already seen the pointer update. There must be synchronization to ensure both threads agree that the old object is unused. The easiest is if the producing thread strictly does not use the old object after the new object has been created (the memory barrier helps here), and the consuming thread cleans up the old object as soon as it knows there's a new object (because that happens strictly after the read barrier, thus after the write barrier and in turn after the last use by the producing thread)

Related

C++ constructor memory synchronization

Assume that I have code like:
void InitializeComplexClass(ComplexClass* c);
class Foo {
public:
Foo() {
i = 0;
InitializeComplexClass(&c);
}
private:
ComplexClass c;
int i;
};
If I now do something like Foo f; and hand a pointer to f over to another thread, what guarantees do I have that any stores done by InitializeComplexClass() will be visible to the CPU executing the other thread that accesses f? What about the store writing zero into i? Would I have to add a mutex to the class, take a writer lock on it in the constructor and take corresponding reader locks in any methods that accesses the member?
Update: Assume I hand a pointer over to a bunch of other threads once the constructor has returned. I'm not assuming that the code is running on x86, but could be instead running on something like PowerPC, which has a lot of freedom to do memory reordering. I'm essentially interested in what sorts of memory barriers the compiler has to inject into the code when the constructor returns.

In order for the other thread to be able to know about your new object, you have to hand over the object / signal other thread somehow. For signaling a thread you write to memory. Both x86 and x64 perform all memory writes in order, CPU does not reorder these operations with regards to each other. This is called "Total Store Ordering", so CPU write queue works like "first in first out".
Given that you create an object first and then pass it on to another thread, these changes to memory data will also occur in order and the other thread will always see them in the same order. By the time the other thread learns about the new object, the contents of this object was guaranteed to be available for that thread even earlier (if the thread only somehow knew where to look).
In conclusion, you do not have to synchronise anything this time. Handing over the object after it has been initialised is all the synchronisation you need.
Update: On non-TSO architectures you do not have this TSO guarantee. So you need to synchronise. Use MemoryBarrier() macro (or any interlocked operation), or some synchronisation API. Signalling the other thread by corresponding API causes also synchronisation, otherwise it would not be synchronisation API.
x86 and x64 CPU may reorder writes past reads, but that is not relevant here. Just for better understanding - writes can be ordered after reads since writes to memory go through a write queue and flushing that queue may take some time. On the other hand, read cache is always consistent with latest updates from other processors (that have went through their own write queue).
This topic has been made so unbelievably confusing for so many, but in the end there is only a couple of things a x86-x64 programmer has to be worried about:
- First, is the existence of write queue (and one should not at all be worried about read cache!).
- Secondly, concurrent writing and reading in different threads to same variable in case of non-atomic variable length, which may cause data tearing, and for which case you would need synchronisation mechanisms.
- And finally, concurrent updates to same variable from multiple threads, for which we have interlocked operations, or again synchronisation mechanisms.)

If you do :
Foo f;
// HERE: InitializeComplexClass() and "i" member init are guaranteed to be completed
passToOtherThread(&f);
/* From this point, you cannot guarantee the state/members
of 'f' since another thread can modify it */
If you're passing an instance pointer to another thread, you need to implement guards in order for both threads to interact with the same instance. If you ONLY plan to use the instance on the other thread, you do not need to implement guards. However, do not pass a stack pointer like in your example, pass a new instance like this:
passToOtherThread(new Foo());
And make sure to delete it when you are done with it.

Thread safeness at MongoDB C++ driver regarding indirect connection usage through cursor

This question is a kind of follow up of this one, regarding thread safeness at Mongo C++ driver. As reference, I'm using legacy-1.0.2 version of the driver.
Thus, after reading the answer to that question, it is clear that it is not safe that two threads use at the same time the same DBClientConnection. However, what about "indirect" usage of connections due to cursors? Let's me explain with an example.
Consider a program with a connection pool (i.e. an array of DBClientConnection objects) and a way of ensuring that only one thread at a time uses a given instance of the pool. Let's consider the following case:
Thread T1 takes connection C1 from the pool. Nobody except T1 is accesing C1 from that time.
Thread T1 do a query() operation using C1 and gets a DBClientCursor object (let's name it U1). To be clear, I'm referring to this particular operation.
Thread T1 returns C1 to the pool, as, once the U1 object is get, connection object itsef is no longer useful for him.
T1 starts processing the results of U1 in a long while(u1->more()) loop. Let's assume that the cursor is so large that not all the results can be returned in a first bunch (at wiring protocol level), so a connection to the DB is needed to return new results through the network wiring protocol implemented by the driver.
While T1 is doing that work, a new thread (T2) gets C1 from the pool, i.e. the connection previously used by T1.
T2 does some operation using C1 while T1 is still using U1 (a cursor generated through C1).
Would that case be problematic? Although, the program is honouring the thread safeness "contract" regarding DBClientConnection objects (i.e. only one thread at a time can access to the same DBClientConnection instance) and DBClientCursor (i.e. only one thread at a time can access to the same DBClientCursor instance), there could be indirect concurrent access to the same connection (at internal level) due to cursors and I don't know if that could be a problem.
Rationale of this question: I have a program in which the above case may occur and I'm getting weird and semi-random crashes due to exception at DBClientCursor methods, in particular next()/nextSafe() and more(). The case of more() is particularly weird... that's a method so simple that shouldn't fail (except for concurrency problems, of course)

It is not legal to do what you suggest: internally the DBClientCursor contains a pointer to the DBClientBase object, and it calls methods on it. Therefore, you must keep the connection out of the pool until you destroy the cursor.

interprocess object passing

I need to have a class with one activity that is performed once per 5 seconds in its own thread. It is a web service one, so it needs an endpoint to be specified. During the object runtime the main thread can change the endpoint. This is my class:
class Worker
{
public:
void setEndpoint(const std::string& endpoint);
private:
void activity (void);
mutex endpoint_mutex;
volatile std::auto_ptr<std::string> newEndpoint;
WebServiceClient client;
}
Does the newEndpoint object need to be declared volatile? I would certainly do it if the read was in some loop (to make the complier not optimize it out), but here I don't know.
In each run the activity() function checks for a new endpoint (if a new one is there, then passes it to the client and perform some reconnection steps) and do its work.
void Worker::activity(void)
{
endpoint_mutex.lock(); //don't consider exceptions
std::auto_ptr<std::string>& ep = const_cast<std::auto_ptr<string> >(newEndpoint);
if (NULL != ep.get())
{
client.setEndpoint(*ep);
ep.reset(NULL);
endpoint_mutex.unlock();
client.doReconnectionStuff();
client.doReconnectionStuff2();
}
else
{
endpoint_mutex.unlock();
}
client.doSomeStuff();
client.doAnotherStuff();
.....
}
I lock the mutex, which means that the newEndpoint object cannot change anymore, so I remove the volatile class specification to be able to invoke const methods.
The setEndpoint method (called from another threads):
void Worker::setEndpoint(const std::string& endpoint)
{
endpoint_mutex.lock(); //again - don't consider exceptions
std::auto_ptr<std::string>& ep = const_cast<std::auto_ptr<string> >(newEndpoint);
ep.reset(new std::string(endpoint);
endpoint_mutex.unlock();
}
Is this thing thread safe? If not, what is the problem? Do I need the newEndpoint object to be volatile?

volatile is used in the following cases per MSDN:
The volatile keyword is a type qualifier used to declare that an
object can be modified in the program by something such as the
operating system, the hardware, or a concurrently executing thread.
Objects declared as volatile are not used in certain optimizations
because their values can change at any time. The system always reads
the current value of a volatile object at the point it is requested,
even if a previous instruction asked for a value from the same object.
Also, the value of the object is written immediately on assignment.
The question in your case is, how often does your NewEndPoint actually change? You create a connection in thread A, and then you do some work. While this is going on, nothing else can fiddle with your endpoint, as it is locked by a mutex. So, per my analysis, and from what I can see in your code, this variable doesn't necessarily change enough.
I cannot see the call site of your class, so I don't know if you are using the same class instance 100 times or more, or if you are creating new objects.
This is the kind of analysis you need to make when asking whether something should be volatile or not.
Also, on your thread-safety, what happens in these functions:
client.doReconnectionStuff();
client.doReconnectionStuff2();
Are they using any of the shared state from your Worker class? Are they sharing and modifying any other state use by another thread? If yes, you need to do the appropriate synchronization.
If not, then you're ok.
Threading requires some thinking, you need to ask yourself these questions. You need to look at all state and wonder whether or not you're sharing. If you're dealing with pointers, then you need wonder who own's the pointer, and whether you're ever sharing it amongst threads, accidentally or not, and act accordingly. If you pass a pointer to a function that is run in a different thread, then you're sharing the object that the pointer points to. If you then alter what it points to in this new thread, you are sharing and need to synchronize.

how to pass data to running thread

When using pthread, I can pass data at thread creation time.
What is the proper way of passing new data to an already running thread?
I'm considering making a global variable and make my thread read from that.
Thanks

That will certainly work. Basically, threads are just lightweight processes that share the same memory space. Global variables, being in that memory space, are available to every thread.
The trick is not with the readers so much as the writers. If you have a simple chunk of global memory, like an int, then assigning to that int will probably be safe. Bt consider something a little more complicated, like a struct. Just to be definite, let's say we have
struct S { int a; float b; } s1, s2;
Now s1,s2 are variables of type struct S. We can initialize them
s1 = { 42, 3.14f };
and we can assign them
s2 = s1;
But when we assign them the processor isn't guaranteed to complete the assignment to the whole struct in one step -- we say it's not atomic. So let's now imagine two threads:
thread 1:
while (true){
printf("{%d,%f}\n", s2.a, s2.b );
sleep(1);
}
thread 2:
while(true){
sleep(1);
s2 = s1;
s1.a += 1;
s1.b += 3.14f ;
}
We can see that we'd expect s2 to have the values {42, 3.14}, {43, 6.28}, {44, 9.42} ....
But what we see printed might be anything like
{42,3.14}
{43,3.14}
{43,6.28}
or
{43,3.14}
{44,6.28}
and so on. The problem is that thread 1 may get control and "look at" s2 at any time during that assignment.
The moral is that while global memory is a perfectly workable way to do it, you need to take into account the possibility that your threads will cross over one another. There are several solutions to this, with the basic one being to use semaphores. A semaphore has two operations, confusingly named from Dutch as P and V.
P simply waits until a variable is 0 and the goes on, adding 1 to the variable; V subtracts 1 from the variable. The only thing special is that they do this atomically -- they can't be interrupted.
Now, do you code as
thread 1:
while (true){
P();
printf("{%d,%f}\n", s2.a, s2.b );
V();
sleep(1);
}
thread 2:
while(true){
sleep(1);
P();
s2 = s1;
V();
s1.a += 1;
s1.b += 3.14f ;
}
and you're guaranteed that you'll never have thread 2 half-completing an assignment while thread 1 is trying to print.
(Pthreads has semaphores, by the way.)

I have been using the message-passing, producer-consumer queue-based, comms mechanism, as suggested by asveikau, for decades without any problems specifically related to multiThreading. There are some advantages:
1) The 'threadCommsClass' instances passed on the queue can often contain everything required for the thread to do its work - member/s for input data, member/s for output data, methods for the thread to call to do the work, somewhere to put any error/exception messages and a 'returnToSender(this)' event to call so returning everything to the requester by some thread-safe means that the worker thread does not need to know about. The worker thread then runs asynchronously on one set of fully encapsulated data that requires no locking. 'returnToSender(this)' might queue the object onto a another P-C queue, it might PostMessage it to a GUI thread, it might release the object back to a pool or just dispose() it. Whatever it does, the worker thread does not need to know about it.
2) There is no need for the requesting thread to know anything about which thread did the work - all the requestor needs is a queue to push on. In an extreme case, the worker thread on the other end of the queue might serialize the data and communicate it to another machine over a network, only calling returnToSender(this) when a network reply is received - the requestor does not need to know this detail - only that the work has been done.
3) It is usually possible to arrange for the 'threadCommsClass' instances and the queues to outlive both the requester thread and the worker thread. This greatly eases those problems when the requester or worker are terminated and dispose()'d before the other - since they share no data directly, there can be no AV/whatever. This also blows away all those 'I can't stop my work thread because it's stuck on a blocking API' issues - why bother stopping it if it can be just orphaned and left to die with no possibility of writing to something that is freed?
4) A threadpool reduces to a one-line for loop that creates several work threads and passes them the same input queue.
5) Locking is restricted to the queues. The more mutexes, condVars, critical-sections and other synchro locks there are in an app, the more difficult it is to control it all and the greater the chance of of an intermittent deadlock that is a nightmare to debug. With queued messages, (ideally), only the queue class has locks. The queue class must work 100% with mutiple producers/consumers, but that's one class, not an app full of uncooordinated locking, (yech!).
6) A threadCommsClass can be raised anytime, anywhere, in any thread and pushed onto a queue. It's not even necessary for the requester code to do it directly, eg. a call to a logger class method, 'myLogger.logString("Operation completed successfully");' could copy the string into a comms object, queue it up to the thread that performs the log write and return 'immediately'. It is then up to the logger class thread to handle the log data when it dequeues it - it may write it to a log file, it may find after a minute that the log file is unreachable because of a network problem. It may decide that the log file is too big, archive it and start another one. It may write the string to disk and then PostMessage the threadCommsClass instance on to a GUI thread for display in a terminal window, whatever. It doesn't matter to the log requesting thread, which just carries on, as do any other threads that have called for logging, without significant impact on performance.
7) If you do need to kill of a thread waiting on a queue, rather than waiing for the OS to kill it on app close, just queue it a message telling it to teminate.
There are surely disadvantages:
1) Shoving data directly into thread members, signaling it to run and waiting for it to finish is easier to understand and will be faster, assuming that the thread does not have to be created each time.
2) Truly asynchronous operation, where the thread is queued some work and, sometime later, returns it by calling some event handler that has to communicate the results back, is more difficult to handle for developers used to single-threaded code and often requires state-machine type design where context data must be sent in the threadCommsClass so that the correct actions can be taken when the results come back. If there is the occasional case where the requestor just has to wait, it can send an event in the threadCommsClass that gets signaled by the returnToSender method, but this is obviously more complex than simply waiting on some thread handle for completion.
Whatever design is used, forget the simple global variables as other posters have said. There is a case for some global types in thread comms - one I use very often is a thread-safe pool of threadCommsClass instances, (this is just a queue that gets pre-filled with objects). Any thread that wishes to communicate has to get a threadCommsClass instance from the pool, load it up and queue it off. When the comms is done, the last thread to use it releases it back to the pool. This approach prevents runaway new(), and allows me to easily monitor the pool level during testing without any complex memory-managers, (I usually dump the pool level to a status bar every second with a timer). Leaking objects, (level goes down), and double-released objects, (level goes up), are easily detected and so get fixed.
MultiThreading can be safe and deliver scaleable, high-performance apps that are almost a pleasure to maintain/enhance, (almost:), but you have to lay off the simple globals - treat them like Tequila - quick and easy high for now but you just know they'll blow your head off tomorrow.
Good luck!
Martin

Global variables are bad to begin with, and even worse with multi-threaded programming. Instead, the creator of the thread should allocate some sort of context object that's passed to pthread_create, which contains whatever buffers, locks, condition variables, queues, etc. are needed for passing information to and from the thread.

You will need to build this yourself. The most typical approach requires some cooperation from the other thread as it would be a bit of a weird interface to "interrupt" a running thread with some data and code to execute on it... That would also have some of the same trickiness as something like POSIX signals or IRQs, both of which it's easy to shoot yourself in the foot while processing, if you haven't carefully thought it through... (Simple example: You can't call malloc inside a signal handler because you might be interrupted in the middle of malloc, so you might crash while accessing malloc's internal data structures which are only partially updated.)
The typical approach is to have your thread creation routine basically be an event loop. You can build a queue structure and pass that as the argument to the thread creation routine. Then other threads can enqueue things and the thread's event loop will dequeue it and process the data. Note this is cleaner than a global variable (or global queue) because it can scale to have multiple of these queues.
You will need some synchronization on that queue data structure. Entire books could be written about how to implement your queue structure's synchronization, but the most simple thing would have a lock and a semaphore. When modifying the queue, threads take a lock. When waiting for something to be dequeued, consumer threads would wait on a semaphore which is incremented by enqueuers. It's also a good idea to implement some mechanism to shut down the consumer thread.

Cleaning up threads referencing an object when deleting the object (in C++)

I have an object (Client * client) which starts multiple threads to handle various tasks (such as processing incoming data). The threads are started like this:
// Start the thread that will process incoming messages and stuff them into the appropriate queues.
mReceiveMessageThread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)receiveRtpMessageFunction, this, 0, 0);
These threads all have references back to the initial object, like so:
// Thread initialization function for receiving RTP messages from a newly connected client.
static int WINAPI receiveRtpMessageFunction(LPVOID lpClient)
{
LOG_METHOD("receiveRtpMessageFunction");
Client * client = (Client *)lpClient;
while(client ->isConnected())
{
if(client ->receiveMessage() == ERROR)
{
Log::log("receiveRtpMessageFunction Failed to receive message");
}
}
return SUCCESS;
}
Periodically, the Client object gets deleted (for various good and sufficient reasons). But when that happens, the processing threads that still have references to the (now deleted) object throw exceptions of one sort or another when trying to access member functions on that object.
So I'm sure that there's a standard way to handle this situation, but I haven't been able to figure out a clean approach. I don't want to just terminate the thread, as that doesn't allow for cleaning up resources. I can't set a property on the object, as it's precisely properties on the object that become inaccessible.
Thoughts on the best way to handle this?

I would solve this problem by introducing a reference count to your object. The worker thread would hold a reference and so would the creator of the object. Instead of using delete, you decrement from the reference count and whoever drops the last reference is the one that actually calls delete.
You can use existing reference counting mechanisms (shared_ptr etc.), or you can roll your own with the Win32 APIs InterlockedIncrement() and InterlockedDecrement() or similar (maybe the reference count is a volatile DWORD starting out at 1...).
The only other thing that's missing is that when the main thread releases its reference, it should signal to the worker thread to drop its own reference. One way you can do this is by an event; you can rewrite the worker thread's loop as calls to WaitForMultipleObjects(), and when a certain event is signalled, you take that to mean that the worker thread should clean up and drop the reference.

You don't have much leeway because of the running threads.
No combination of shared_ptr + weak_ptr may save you... you may call a method on the object when it's valid and then order its destruction (using only shared_ptr would).
The only thing I can imagine is to first terminate the various processes and then destroy the object. This way you ensure that each process terminate gracefully, cleaning up its own mess if necessary (and it might need the object to do that).
This means that you cannot delete the object out of hand, since you must first resynchronize with those who use it, and that you need some event handling for the synchronization part (since you basically want to tell the threads to stop, and not wait indefinitely for them).
I leave the synchronization part to you, there are many alternatives (events, flags, etc...) and we don't have enough data.
You can deal with the actual cleanup from either the destructor itself or by overloading the various delete operations, whichever suits you.

You'll need to have some other state object the threads can check to verify that the "client" is still valid.
One option is to encapsulate your client reference inside some other object that remains persistent, and provide a reference to that object from your threads.

You could use the observer pattern with proxy objects for the client in the threads. The proxies act like smart pointers, forwarding access to the real client. When you create them, they register themselves with the client, so that it can invalidate them from its destructor. Once they're invalidated, they stop forwarding and just return errors.

This could be handled by passing a (boost) weak pointer to the threads.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js