Shared memory - need for synchronization

Shared memory - need for synchronization - c++

I've seen a project where communication between processes was made using shared memory (e.g. using ::CreateFileMapping under Windows) and every time one of the processes wanted to notify that some data is available in shared memory, a synchronization mechanism using named events notified the interested party that the content of the shared memory changed.
I am concerned on the fact that the appropriate memory fences are not present for the process that reads the new information to know that it has to invalidate it's copy of the data and read it from main memory once it is "published" by the producer process.
Do you know how can this be accomplished on Windows using shared memory?
EDIT
Just wanted to add that after creating the file mapping the processes uses MapViewOfFile() API only once and every new modification to the shared data uses the pointer obtained by the initial call to MapViewOfFile() to read the new data sent over the shared memory. Does correct synchronization require that every time data changes in shared memory the process that reads data must create MapViewOfFile() every time ?

If you use a Windows Named Event for signaling changes, then everything should be OK.
Process A changes the data and calls SetEvent.
Process B waits for the event using WaitForSingleObject or similar, and sees that it is set.
Process B then reads the data. WaitForSingleObject contains all the necessary synchronization to ensure that the changes made by process A before the call to SetEvent are read by process B.
Of course, if you make any changes to the data after calling SetEvent, then these may or may not show up when process B reads the data.
If you don't want to use Events, you could use a Mutex created with CreateMutex, or you could write lock-free code using the Interlocked... functions such as InterlockedExchange and InterlockedIncrement.
However you do the synchronization, you do not need to call MapViewOfFile more than once.

What you're looking for for shared memory on windows is the InterlockedExchange function. See the msdn article here. The REALLY important part is quoted:
This function generates a full memory barrier (or fence) to ensure
that memory operations are completed in order.
This will function cross-process. I've worked with it before, and found it 100% reliable for implementing a mutex-like construct on top of shared memory.
How you do that is that you exchange it with the "set" value. If you get "clear" back, you have it (it was clear), but if you get "set" back, then somebody else had it. You loop, sleep between looping, etc, until you "get" it. Basically this:
#define LOCK_SET 1
#define LOCK_CLEAR 0
int* lock_location = LOCK_LOCATION; // ensure this is in shared memory
if (InterlockedExchange(lock_location, LOCK_SET) == LOCK_CLEAR)
{
return true; // got the lock
}
else
{
return false; // didn't get the lock
}
As above, and loop until you "get" it.

Let's call process A the data producer and process B the data consumer. Until now, you have a mechanism for process A to notify process B that new data has been produced. I suggest you created a reverse notification (from B to A) which tells process A that the data has been consumed. If, for performance reason, you don't want process A to wait for the data to be consumed, you could set up a ring-buffer in the shared memory.

Related

How to minimize the mutex locking for an object when only 1 thread mostly uses that object and the other thread(s) use it rarely?

Scenario
Suppose there are "Thread_Main" and "Thread_DB", with a shared SQLite database object. It's guaranteed that,
"Thread_main" seldom uses SQLite object for reading (i.e. SELECT())
"Thread_DB" uses the SQLite object most of the time for various INSERT, UPDATE, DELETE operations
To avoid data races and UB, SQLite should be compiled with SQLITE_THREADSAFE=1 (default) option. That means, before every operation, an internal mutex will be locked, so that DB is not writing when reading and vice versa.
"Thread_Main" "Thread_DB" no. of operation on DB
============= =========== ======================
something INSERT 1
something UPDATE 2
something DELETE 3
something INSERT 4
... ... ... (collapsed)
something INSERT 500
something DELETE 501
... ... ... (collapsed)
something UPDATE 1000
something UPDATE 1001
... ... ... (collapsed)
SELECT INSERT 1200 <--- here is a serious requirement of mutex
... ... ... (collapsed)
Problem
As seen in above, out of 100s of operations, the need of real mutex is required only once in a while. However to safeguard that small situation, we have to lock it for all the operations.
Question: Is there a way in which "Thread_DB" holds the mutex most of the time, so that every time locking is not required? The lock/unlocks can happen only when "Thread_Main" requests for it.
Notes
One way is to queue up the SELECT in the "Thread_DB". But in larger scenario with several DBs running, this will slow down the response and it won't be real time. Can't keep the main thread waiting for it.
I also considered to have a "Thread_Main" integer/boolean variable which will suggest that "Thread_Main" wants to SELECT. Now if any operation is running in "Thread_DB" at that time, it can unlock the mutex. This is fine. But if no writeable operation is running on that SQLite object, then "Thread_main" will keep waiting, as there is no one in "Thread_DB" to unlock. Which will again delay or even hang the "Thread_Main".

Here's a suggestion: modify your program somewhat so that Thread_Main has no access to the shared object; only Thread_DB is able to access it. Once you've done that, you won't need to do any serialization at all, and Thread_DB can work at full efficiency.
Of course the fly in the ointment is that Thread_Main does sometimes need to interact with the DB object; how can it do that if it doesn't have any access to it?
The solution to that issue is message-passing. When Thread_Main needs to do something with the DB, it should pass a Message object of some sort to Thread_DB. The Message object should contain all the details necessary to characterize the desired interaction. When Thread_DB receives the Message object, Thread_DB can call its execute(SQLite & db) method (or whatever you want to call it), at which point the necessary data insertion/extraction can occur from within the context of the Thread_DB thread. When the interaction has completed, any results can be stored inside the Message object and the Message object can then be passed back to the main thread for the main thread to deal with the results. (the main thread can either block waiting for the Message to be sent back, or continue to operate asynchronously to the DB thread, it's up to you)

Waiting for Memory Value to Change

I have two separate processes, a client and server process. They are linked using shared memory.
A client will begin his response by first altering a certain part of the shared memory to the input value and then flipping a bit indicating that the input is valid and that the value has not already been computed.
The server waits for a kill signal, or new data to come in. Right now the relevant server code looks like so:
while(!((*metadata)&SERVER_KILL)){
//while no kill signal
bool valid_client = ((*metadata)&CLIENT_REQUEST_VALID)==CLIENT_REQUEST_VALID;
bool not_already_finished = ((*metadata)&SERVER_RESPONSE_VALID)!=SERVER_RESPONSE_VALID;
if(valid_client & not_already_finished){
*int2 = sqrt(*int1);
*metadata = *metadata | SERVER_RESPONSE_VALID;
//place square root of input in memory, set
//metadata to indicate value has been found
}
}
The problem with this is that the while loop takes up too many resources.
Most solutions to this problem are usually with a multithreaded application in which case you can use condition variables and mutexes to control the progression of the server process. Since these are single threaded applications, this solution is not applicable. Is there a lightweight solution that allows for waiting for these memory locations to change all while not completely occupying a hardware thread?

You can poll or block... You can also wait for an interrupt, but that would probably also entail some polling or blocking.
Is message-passing on the table? That would allow you to block. Maybe a socket?
You can also send a signal from one process to another. You would write an interrupt handler for the receiving process.
Note that when an interrupt handler runs, it preempts the process' thread of execution. In other words, the main thread is paused while the handler runs. So your interrupt handler shouldn't grab a lock if there's a chance that the lock is already held, as it will create a deadlock situation. You can avoid this by using a re-entrant lock or one of the special lock types that disables interrupts before grabbing the lock. Things that grab locks: mutex.lock (obviously), I/O, allocating memory, condition.signal (much less obvious).

Cross-Process Mutex Read/Write Locking

I'm trying to make inter-process communication in C/C++ on Windows environment.
I am creating a shared memory page file and two processes get the handle to that file. It's like this:
Process1: Initialize shared memory area. Wait for Process2 to fill it.
Process2: Get handle to shared memory area. Put stuff in it.
I am creating a named mutex in process1 as well. Now process1 acquires the ownership of the mutex soon after creating it (using WaitSingleObject). Obviously, there is nothing in the memory area so I need to release the mutex. Now I need to wait until the memory is filled instead of trying to acquire the mutex again.
I was thinking of conditional variables. Process2 signals the condition variable once it fills in the memory area and process1 will acquire the information immediately.
However, as per MS Documentation on Condition Variables, they are not shared across processes which is clear from their initialization as they are not named.
Furthermore, the shared memory area can hold up to one element at any given moment which means process2 cannot refill after filling it unless process1 extracts its information.
From the given description it's clear that condition variables are the best for this purpose (or Monitors). So is there a way around this?

Conditional variables can be used with in the process, but not across the processes.
Try NamedPipe with PIPE_ACCESS_DUPLEX as open mode. So that you have communication options from both process.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365150(v=vs.85).aspx

I have used events for this before. Use 2 named auto reset events. 1 data ready event and one buffer ready event. Writer waits for buffer ready, writes data and sets the data ready event. Reader waits for data ready event, reads memory and sets the buffer ready event. If done properly you should not need the mutex.

safely distributing a pointer update between threads

tl;dr:
class Controller
{
public:
volatile Netconsole* nc;
void init(); //initialize the threads
void calculate(); // handler for the "mothership app"
void senderThreadLoop(); //also calls reinitNet() if connection is broken.
void listenerThreadLoop();
inline void reinitNet(){ delete nc; nc = new Netconsole(); }
}
// inside
Json::Value header = nc->Recv();
error: passing 'volatile Netconsole' as 'this' argument discards qualifiers [-fpermissive]
Pointer to an instance of a utility class (Netconsole) shared between two threads must be updated inside both threads if the utility class is re-instantiated, but declaring it as volatile generates the above error. If it's updated just inside one thread, the other thread may still use old, invalid pointer. How to assure it's updated in both but using methods through the pointer doesn't trigger the above error?
Extended info:
The "smart glue logic" library I'm writing is used to pass and convert messages between a 3rd party software and a custom device. It consists of three essential threads:
a handler: the main thread of the 3rd party app periodically calls a "calculate" function in my library to handle new updates - data to send, data received
a sender thread that converts and sends whatever the handler pushed into the send buffer
a listener thread that converts and pushes any data received from the device into receive buffer.
Both the sender and the listener threads use the same utility class that handles network communication with the device; upon initialization the class creates a connection to the device, and the two threads perform blocking reads or await for new data to send respectively. In case of any problems, the sender thread performs all "maintenance" work, while the listener thread enters a safe state awaiting return of connectivity.
Now, since the two threads share one connection to the device, they both share the same instance of the communication class, as a pointer to that class.
The problem is in the procedure of reconnect - it involves destroying and creating the helper class instance exploiting safe shutdown and initialization already present in the destructor and constructor. As result the pointer changes. Without volatile it's quite likely the listener won't receive the updated pointer. With volatile, it protests - needlessly, because nc (the pointer) won't change at a random moment - first the listener is notified of a problem, then it enters a safe state where it doesn't perform any operations on 'nc' and notifies the sender it's ready. Only then the sender performs the repair and notifies the listener to resume normal operation.
So what's the right solution in this situation?

What you need is a sequence of operations. The producing thread has 2 relevant operations : "initialize new Netconsole" and "write pointer". The consuming thread also has two operations: "read pointer" and "use new Netconsole object". Those 4 operations must be sequenced in exactly that order for the update to be visible.
By far the simplest way to achieve this are two memory barriers. A write barrier (std::memory_order_release on the pointer write) prevents the first two operations from being reordered, and the read barrier (std::memory_order_acquire on the pointer load) prevents the last two operations from being reordered.
As the two threads run independently, your program correctness shouldn't depend on whether a particular object update happened before a particular object use. The updating thread might just have been a bit slow, and that should not break your program. So the third ordering between write and read isn't really relevant and you shouldn't try to "fix" it.
To summarize: Yes, the 4 operations have to happen in exactly the right order for the result to be visible, but if the second and third operation are
reordered then the update is perfectly invisible to the consuming thread. It's an atomic update, all or nothing.
There's still a matter of cleaning up the old object. The producing thread cannot just assume that the consuming thread has already seen the pointer update. There must be synchronization to ensure both threads agree that the old object is unused. The easiest is if the producing thread strictly does not use the old object after the new object has been created (the memory barrier helps here), and the consuming thread cleans up the old object as soon as it knows there's a new object (because that happens strictly after the read barrier, thus after the write barrier and in turn after the last use by the producing thread)

how to pass data to running thread

When using pthread, I can pass data at thread creation time.
What is the proper way of passing new data to an already running thread?
I'm considering making a global variable and make my thread read from that.
Thanks

That will certainly work. Basically, threads are just lightweight processes that share the same memory space. Global variables, being in that memory space, are available to every thread.
The trick is not with the readers so much as the writers. If you have a simple chunk of global memory, like an int, then assigning to that int will probably be safe. Bt consider something a little more complicated, like a struct. Just to be definite, let's say we have
struct S { int a; float b; } s1, s2;
Now s1,s2 are variables of type struct S. We can initialize them
s1 = { 42, 3.14f };
and we can assign them
s2 = s1;
But when we assign them the processor isn't guaranteed to complete the assignment to the whole struct in one step -- we say it's not atomic. So let's now imagine two threads:
thread 1:
while (true){
printf("{%d,%f}\n", s2.a, s2.b );
sleep(1);
}
thread 2:
while(true){
sleep(1);
s2 = s1;
s1.a += 1;
s1.b += 3.14f ;
}
We can see that we'd expect s2 to have the values {42, 3.14}, {43, 6.28}, {44, 9.42} ....
But what we see printed might be anything like
{42,3.14}
{43,3.14}
{43,6.28}
or
{43,3.14}
{44,6.28}
and so on. The problem is that thread 1 may get control and "look at" s2 at any time during that assignment.
The moral is that while global memory is a perfectly workable way to do it, you need to take into account the possibility that your threads will cross over one another. There are several solutions to this, with the basic one being to use semaphores. A semaphore has two operations, confusingly named from Dutch as P and V.
P simply waits until a variable is 0 and the goes on, adding 1 to the variable; V subtracts 1 from the variable. The only thing special is that they do this atomically -- they can't be interrupted.
Now, do you code as
thread 1:
while (true){
P();
printf("{%d,%f}\n", s2.a, s2.b );
V();
sleep(1);
}
thread 2:
while(true){
sleep(1);
P();
s2 = s1;
V();
s1.a += 1;
s1.b += 3.14f ;
}
and you're guaranteed that you'll never have thread 2 half-completing an assignment while thread 1 is trying to print.
(Pthreads has semaphores, by the way.)

I have been using the message-passing, producer-consumer queue-based, comms mechanism, as suggested by asveikau, for decades without any problems specifically related to multiThreading. There are some advantages:
1) The 'threadCommsClass' instances passed on the queue can often contain everything required for the thread to do its work - member/s for input data, member/s for output data, methods for the thread to call to do the work, somewhere to put any error/exception messages and a 'returnToSender(this)' event to call so returning everything to the requester by some thread-safe means that the worker thread does not need to know about. The worker thread then runs asynchronously on one set of fully encapsulated data that requires no locking. 'returnToSender(this)' might queue the object onto a another P-C queue, it might PostMessage it to a GUI thread, it might release the object back to a pool or just dispose() it. Whatever it does, the worker thread does not need to know about it.
2) There is no need for the requesting thread to know anything about which thread did the work - all the requestor needs is a queue to push on. In an extreme case, the worker thread on the other end of the queue might serialize the data and communicate it to another machine over a network, only calling returnToSender(this) when a network reply is received - the requestor does not need to know this detail - only that the work has been done.
3) It is usually possible to arrange for the 'threadCommsClass' instances and the queues to outlive both the requester thread and the worker thread. This greatly eases those problems when the requester or worker are terminated and dispose()'d before the other - since they share no data directly, there can be no AV/whatever. This also blows away all those 'I can't stop my work thread because it's stuck on a blocking API' issues - why bother stopping it if it can be just orphaned and left to die with no possibility of writing to something that is freed?
4) A threadpool reduces to a one-line for loop that creates several work threads and passes them the same input queue.
5) Locking is restricted to the queues. The more mutexes, condVars, critical-sections and other synchro locks there are in an app, the more difficult it is to control it all and the greater the chance of of an intermittent deadlock that is a nightmare to debug. With queued messages, (ideally), only the queue class has locks. The queue class must work 100% with mutiple producers/consumers, but that's one class, not an app full of uncooordinated locking, (yech!).
6) A threadCommsClass can be raised anytime, anywhere, in any thread and pushed onto a queue. It's not even necessary for the requester code to do it directly, eg. a call to a logger class method, 'myLogger.logString("Operation completed successfully");' could copy the string into a comms object, queue it up to the thread that performs the log write and return 'immediately'. It is then up to the logger class thread to handle the log data when it dequeues it - it may write it to a log file, it may find after a minute that the log file is unreachable because of a network problem. It may decide that the log file is too big, archive it and start another one. It may write the string to disk and then PostMessage the threadCommsClass instance on to a GUI thread for display in a terminal window, whatever. It doesn't matter to the log requesting thread, which just carries on, as do any other threads that have called for logging, without significant impact on performance.
7) If you do need to kill of a thread waiting on a queue, rather than waiing for the OS to kill it on app close, just queue it a message telling it to teminate.
There are surely disadvantages:
1) Shoving data directly into thread members, signaling it to run and waiting for it to finish is easier to understand and will be faster, assuming that the thread does not have to be created each time.
2) Truly asynchronous operation, where the thread is queued some work and, sometime later, returns it by calling some event handler that has to communicate the results back, is more difficult to handle for developers used to single-threaded code and often requires state-machine type design where context data must be sent in the threadCommsClass so that the correct actions can be taken when the results come back. If there is the occasional case where the requestor just has to wait, it can send an event in the threadCommsClass that gets signaled by the returnToSender method, but this is obviously more complex than simply waiting on some thread handle for completion.
Whatever design is used, forget the simple global variables as other posters have said. There is a case for some global types in thread comms - one I use very often is a thread-safe pool of threadCommsClass instances, (this is just a queue that gets pre-filled with objects). Any thread that wishes to communicate has to get a threadCommsClass instance from the pool, load it up and queue it off. When the comms is done, the last thread to use it releases it back to the pool. This approach prevents runaway new(), and allows me to easily monitor the pool level during testing without any complex memory-managers, (I usually dump the pool level to a status bar every second with a timer). Leaking objects, (level goes down), and double-released objects, (level goes up), are easily detected and so get fixed.
MultiThreading can be safe and deliver scaleable, high-performance apps that are almost a pleasure to maintain/enhance, (almost:), but you have to lay off the simple globals - treat them like Tequila - quick and easy high for now but you just know they'll blow your head off tomorrow.
Good luck!
Martin

Global variables are bad to begin with, and even worse with multi-threaded programming. Instead, the creator of the thread should allocate some sort of context object that's passed to pthread_create, which contains whatever buffers, locks, condition variables, queues, etc. are needed for passing information to and from the thread.

You will need to build this yourself. The most typical approach requires some cooperation from the other thread as it would be a bit of a weird interface to "interrupt" a running thread with some data and code to execute on it... That would also have some of the same trickiness as something like POSIX signals or IRQs, both of which it's easy to shoot yourself in the foot while processing, if you haven't carefully thought it through... (Simple example: You can't call malloc inside a signal handler because you might be interrupted in the middle of malloc, so you might crash while accessing malloc's internal data structures which are only partially updated.)
The typical approach is to have your thread creation routine basically be an event loop. You can build a queue structure and pass that as the argument to the thread creation routine. Then other threads can enqueue things and the thread's event loop will dequeue it and process the data. Note this is cleaner than a global variable (or global queue) because it can scale to have multiple of these queues.
You will need some synchronization on that queue data structure. Entire books could be written about how to implement your queue structure's synchronization, but the most simple thing would have a lock and a semaphore. When modifying the queue, threads take a lock. When waiting for something to be dequeued, consumer threads would wait on a semaphore which is incremented by enqueuers. It's also a good idea to implement some mechanism to shut down the consumer thread.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js