I've been getting a little down to grips with the new Visual Studio native Concurrency runtime (ConcRT). Is it just an oversight, or is there a valid reason that no cross-thread movement of data has movement semantics? They're all copy semantics. You can't move into a concurrent queue, you can't move with asend, etc. You can't even move construct concurrent queues.
I don't know this specific framework, but generally for inter thread queues you must have copy semantics.
Imagine I create an object, take a reference/pointer to it then move it to the queue. Then the other thread moves it out of the queue. Then both threads can access it at the same time.
I think that in the general case it is only necessary to have copy on ever of add or remove, not both (ie only one copy needed). e.g. copy-in move-out, but this would be semantically the same as copy-in copy-out.
There are a number of areas where rvalue support could enhance ConcRT, agents and the PPL. Like any big software project, when you are building features that rely on other new features, there always some risk in being able to deliver everything at once.
PPL was a major step forward but we never said it was "done". :-)
If you have particular suggestions where ConcRT, PPL, or the Agents library should support move semantics, please open up a suggestion in connect.microsoft.com.
Related
What are some recommended strategies for future-proofing present-day C++ coding of concurrent access to std::shared_ptr(-like) and std::unique_ptr(-like) data structures, as the C++ language spec evolves in this area?
Background:
Circa 2021, available C++ language constructs for managing access to std::shared_ptr(-like) and std::unique_ptr(-like) smart pointers in concurrency-friendly ways are in flux. For example C++20 support for std::atomic<std::shared_ptr> hasn't made it very far into compiler in the wild yet, but the C++20 spec tells us it is coming.
I'm engaged in non-trivial multi-threaded development and need to be able to pass smart pointers (both shared and unique) between threads via (hopefully lock-free) queues and use them in various thread-safe ways. I'd like to develop this code in a way that allows it to be easily upgraded to modern language features once they become available. The ideal would be to be able to do this upgrade easily from a central place, such as would be the case if changing the definition of a CPP-macro and coding in terms of those macros.
Does anyone know of a good strategy (A good set of CPP macros perhaps?) for future-proofing present-day concurrency code?
[ CLARIFYING EDIT after some good comments (Thanks everyone) ]
From what I gather:
Different instances of std::shared_ptr and std::unique_ptr may be read/written, from different threads without an issue, (like when different instances are passed-in to different threads) but the object instance (or memory) they point to may NOT be safely accessed by multiple threads at the same time (so you should use a mutex, or another method to access the pointed-to-object if this is the use case). [ Thanks Alex Guteniev for that clarity ]
The SAME instance of a std::shared_ptr or std::unique_ptr may be read/written by threads in a safe way using (pre-C++20: std::atomic_load/store etc, AND post-C++20: std::atomic<std::shared_ptr> or std::atomic<std::unique_ptr> ) My thought is that this might be a place to use CPP MACROS, such as SHARED_GET, SHARED_SET, UNIQUE_GET, UNIQUE_SET that'd centralize the changes you'd need to make to go from C++17 to C++20. [ Thanks NicolBolas for the clarity on what is actually coming in C++20. As was pointed out: the link I provided in the comments below is outdated, so be careful not to consider it fact.]
If you are passing std::unique_ptr between threads using std::move to pass the pointed-to-memory along, and using queues to enforce that only a single thread has access at any given time, you can use both the std_unique pointers themselves AND the pointed-to-memory in the thread that receives the pointer without any mutexes, or other protection against resource contention.
Because of my confusion when asking, my original question was perhaps confusing. Now I'd rephrase the question as: I'm looking for a set of access CPP macros #defines, that detect C++17 and C++20 and use that version's cleanest/correct definition for the following operations:
MAKE_LOCAL_SHARED: Create/load a local std::shared_ptr instance from
a common/shared instance that the thread can read/write without
contention with the original. It should point to the same memory
that the common/shared one pointed to
BEGIN_USE_SHARED_TGT: Create an
hold a std::lock_guard/mutex within the scope of which the
pointed-to-memory of a local std::shared_ptr instance can be safely
used.
END_USE_SHARED_TGT: (probably just a closing brace?) Release
the std::lock_guard/mutex when done using the pointed-to-memory
BEGIN_USE_UNIQUE_TGT, END_USE_UNIQUE_TGT (same as above for
std::unique_ptr
I'm engaged in non-trivial multi-threaded development and need to be able to pass smart pointers (both shared and unique) between threads via (hopefully lock-free) queues and use them in various thread-safe ways.
When using a (lock-free) queue, you don't access produced and consumed elements at the same time.
For accessing different variables unique_ptr and shared_ptr are already safe. When two shared_ptrs point to the same object, there is a guarantee that manipulating these shared_ptrs in different threads is thread safe, usually implemented using reference counting. Different unique_ptrs don't point to the same object.
Just use shared_ptr and unique_ptr as usual if you just put them to a queue and don't really access the same variable from multiple threads at the same time.
I've been learning about functional programming and see that it can certainly make parallelism easier to handle, but I do not see how it makes handling shared resources easier. I've seen people talk about variable immutability being a key factor, but how does that help two threads accessing the same resource? Say two threads are adding a request to a queue. They both get a copy of the queue, make a new copy with their request added (since the queue is immutable), and then return the new queue. The first request to return will be overridden by the second, as the copies of the queue each thread got did not have the other thread's request present. So I assume there is a locking mechanism a la mutex available in functional languages? How then does that differ from an imperative approach to the problem? Or do practical applications of functional programming still require some imperative operations to handle shared resources?
As soon as your global data can be updated. you're breaking the pure functional paradigm. In that sense, you need some sort of imperative structure. However, this is important enough that most functional languages offer a way to do this, and you need it to be able to communicate with the rest of the world anyway. (The most complicated formal one is the IO monad of Haskell.) Apart from simple bindings for some other synchronization library, they would probably try to implement a lock-free, wait-free data structure if possible.
Some approaches include:
Data that is written only once and never altered can be accessed safely with no locks or waiting on most CPUs. (There is typically a memory fence instruction to ensure that the memory updates in the right order for both the producer and the consumer.)
Some data structures, such as a difference list, have the property that you can tack on updates without invalidating any existing data. Let's say you have the association list [(1,'a'), (2,'b'), (3,'c')] and you want to update by changing the third entry to 'g'. If you express this as (3,'g'):originalList, then you can update the current list with the new version, and keep originalList valid and unaltered. Any thread that saw it can still safely use it.
Even if you have to work around the garbage collector, each thread can make its own thread-local copy of the shared state so long as the original does not get deleted while it is being copied. The underlying low-level implementation would be a producer/consumer model that atomically updates a pointer to the state data and inserts memory-fence instructions before the update and the copy operations.
If the program has a way to compare-and-swap atomically, and the garbage collector is aware, each thread can use the receive-copy-update pattern. A thread-aware garbage collector will keep the older data around as long as any thread is using it, and recycle it when the last thread is done with it. This should not require locking in software (for example, on modern ISAs, incrementing or decrementing a word-sized counter is an atomic operation, and atomic compare-and-swap is wait-free).
The functional language can add an extension to call an IPC library written in some other language, and update data in place. In Haskell, this would be defined with the IO monad to ensure sequential memory consistency, but nearly every functional language has some way to exchange data with the system libraries.
So, a functional language does offer some guarantees that are useful for efficient concurrent programming. For example, most current ISAs impose no extra overhead on multiple reader threads when there is at most a single writer, certain consistency bugs cannot occur, and functional languages are well-suited to express this pattern.
I'm writing a generic pure C++14 implementation of a signal class, with lifetime tracking of the connected objects, and an additional plus that all this works with copy and move operations.
There are three types of implementations I have now discovered are possible:
Use a global connection_manager that, well, manages all connections. This would be terrible in a highly concurrent scenario where many signals and slots are connected and disconnected, and would need fine-tuned locking to alleviate some of these issues.
Each signal stores its connections, and proper move and copy semantics are implemented. This has two disadvantages: the signal object is large, and move operations are expensive (all the connections need to be updated when the signal or the connectee is moved. This burdens both sides to have proper move constructors.
Each signal stores a pointer to the real signal. This extra level of indirection makes moves a lot cheaper, as the real connected object stays put in memory behind the extra level of indirection. Same goes for the connectee, although each signal emission will need to dereference one extra pointer to get the real object. No move semantics need to be implemented explicitly.
Currently, I almost have version 2 ready. I don't know exactly what Qt does, but I know it covers almost everything I want to be able to do with signals/slots, albeit no copy/move semantics and a pre-compilation step.
Am I missing something or is number three the best way to go in the end if I don't want to have a huge impact on users of signals (i.e. I don't want to increase the size and complexity of their classes just because my signal/connectee classes do a lot of work each time they're moved/copied.
Note I am looking for real-world experience and suggestions, not opinions.
I need several STL containers, threadsafe.
Basically I was thinking I just need 2 methods added to each of the STL container objects,
.lock()
.unlock()
I could also break it into
.lockForReading()
.unlockForReading()
.lockForWriting()
.unlockForWriting()
The way that would work is any number of locks for parallel reading are acceptable, but if there's a lock for writing then reading AND writing are blocked.
An attempt to lock for writing waits until the lockForReading semaphore drops to 0.
Is there a standard way to do this?
Is how I'm planning on doing this wrong or shortsighted?
This is really kind of bad. External code will not recognize or understand your threading semantics, and the ease of availability of aliases to objects in the containers makes them poor thread-safe interfaces.
Thread-safety occurs at design time. You can't solve thread safety by throwing locks at the problem. You solve thread safety by not having two threads writing to the same data at the same time- in the general case, of course. However, it is not the responsibility of a specific object to handle thread safety, except direct threading synchronization primitives.
You can have concurrent containers, designed to allow concurrent use. However, their interfaces are vastly different to what's offered by the Standard containers. Less aliases to objects in the container, for example, and each individual operation is encapsulated.
The standard way to do this is acquire the lock in a constructor, and release it in the destructor. This is more commonly know as Resource Acquisition Is Initialization, or RAII. I strongly suggest you use this methodology rather than
.lock()
.unlock()
Which is not exception safe. You can easily forget to unlock the mutex prior to throwing, resulting in a deadlock the next time a lock is attempted.
There are several synchronization types in the Boost.Thread library that will be useful to you, notably boost::mutex::scoped_lock. Rather than add lock() and unlock() methods to whatever container you wish to access from multiple threads, I suggest you use a boost:mutex or equivalent and instantiate a boost::mutex::scoped_lock whenever accessing the container.
Is there a standard way to do this?
No, and there's a reason for that.
Is how I'm planning on doing this
wrong or shortsighted?
It's not necessarily wrong to want to synchronize access to a single container object, but the interface of the container class is very often the wrong place to put the synchronization (like DeadMG says: object aliases, etc.).
Personally I think both TBB and stuff like concurrent_vector may either be overkill or still the wrong tools for a "simple" synchronization problem.
I find that ofttimes just adding a (private) Lock object (to the class holding the container) and wrapping up the 2 or 3 access patterns to the one container object will suffice and will be much easier to grasp and maintain for others down the road.
Sam: You don't want a .lock() method because something could go awry that prevents calling the .unlock() method at the end of the block, but if .unlock() is called as a consequence of object destruction of a stack allocated variable then any kind of early return from the function that calls .lock() will be guaranteed to free the lock.
DeadMG:
Intel's Threading Building Blocks (open source) may be what you're looking for.
There's also Microsoft's concurrent_vector and concurrent_queue, which already comes with Visual Studio 2010.
Is it ok to check the current thread inside a function?
For example if some non-thread safe data structure is only altered by one thread, and there is a function which is called by multiple threads, it would be useful to have separate code paths depending on the current thread. If the current thread is the one that alters the data structure, it is ok to alter the data structure directly in the function. However, if the current thread is some other thread, the actual altering would have to be delayed, so that it is performed when it is safe to perform the operation.
Or, would it be better to use some boolean which is given as a parameter to the function to separate the different code paths?
Or do something totally different?
What do you think?
You are not making all too much sense. You said a non-thread safe data structure is only ever altered by one thread, but in the next sentence you talk about delaying any changes made to that data structure by other threads. Make up your mind.
In general, I'd suggest wrapping the access to the data structure up with a critical section, or mutex.
It's possible to use such animals as reader/writer locks to differentiate between readers and writers of datastructures but the performance advantage for typical cases usually wont merit the additional complexity associated with their use.
From the way your question is stated, I'm guessing you're fairly new to multithreaded development. I highly suggest sticking with the simplist and most commonly used approaches for ensuring data integrity (most books/articles you readon the issue will mention the same uses for mutexes/critical sections). Multithreaded development is extremely easy to get wrong and can be difficult to debug. Also, what seems like the "optimal" solution very often doesn't buy you the huge performance benefit you might think. It's usually best to implement the simplist approach that will work then worry about optimizing it after the fact.
There is a trick that could work in case, as you said, the other threads will only make changes only once in a while, although it is still rather hackish:
make sure your "master" thread can't be interrupted by the other ones (higher priority, non fair scheduling)
check your thread
if "master", just change
if other, put off scheduling, if needed by putting off interrupts, make change, reinstall scheduling
really test to see whether there are no issues in your setup.
As you can see, if requirements change a little bit, this could turn out worse than using normal locks.
As mentioned, the simplest solution when two threads need access to the same data is to use some synchronization mechanism (i.e. critical section or mutex).
If you already have synchronization in your design try to reuse it (if possible) instead of adding more. For example, if the main thread receives its work from a synchronized queue you might be able to have thread 2 queue the data structure update. The main thread will pick up the request and can update it without additional synchronization.
The queuing concept can be hidden from the rest of the design through the Active Object pattern. The activ object may also be able to publish the data structure changes through the Observer pattern to other interested threads.