Josuttis states ["Standard Library", 2nd ed, pg 1003]:
Futures allow you to block until data by another thread is provided or another thread is done. However, a future can pass data from one thread to another only once. In fact, a future's major purpose is to deal with return values or exceptions of threads.
On the other hand, a shared_future<void> can be used by multiple threads, to identify when another thread has done its job.
Also, in general, high-level concurrency features (such as futures) should be preferred to low-level ones (such as condition_variables).
Therefore, I'd like to ask: Is there any situation (requiring synchronization of multiple threads) in which a shared_future<void> won't suffice and a condition_variable is essential?
As already pointed out in the comments by #T.C. and #hlt, the use of futures/shared_futures is mostly limited in the sense that they can only be used once. So for every communication task you have to have a new future. The pros and cons are nicely explained by Scott Meyers in:
Item 39: Consider void futures for one-shot event
communication.
Scott Meyers: Effective Modern C++ (emphasis mine)
His conclusion is that using promise/future pairs dodges many of the problems with the use of condidition_variables, providing a nicer way of communicating one-shot events. The price to pay is that you are using dynamically allocated memory for the shared states and more importantly, that you have to have one promise/future pair for every event that you want to communicate.
While the notion of using high-level abstracts instead of low-level abstract is laudable, there is a misconception here. std::future is not a high-level replacement for std::conditional_variable. Instead, it is a specific high-level construct build for a specific use-case of std::condition_variable - namely, a one-time return of the value.
Obviously, not all uses of condition variable is for this scenario. For example, an message queue can not be implemented with std::future, no matter how much you try. Such a thread is another high-level construct built on low-level building block. So yes, shoot for high-level constructs, but do not expect a one-to-one map mapping between high and low level.
Related
I've been learning about functional programming and see that it can certainly make parallelism easier to handle, but I do not see how it makes handling shared resources easier. I've seen people talk about variable immutability being a key factor, but how does that help two threads accessing the same resource? Say two threads are adding a request to a queue. They both get a copy of the queue, make a new copy with their request added (since the queue is immutable), and then return the new queue. The first request to return will be overridden by the second, as the copies of the queue each thread got did not have the other thread's request present. So I assume there is a locking mechanism a la mutex available in functional languages? How then does that differ from an imperative approach to the problem? Or do practical applications of functional programming still require some imperative operations to handle shared resources?
As soon as your global data can be updated. you're breaking the pure functional paradigm. In that sense, you need some sort of imperative structure. However, this is important enough that most functional languages offer a way to do this, and you need it to be able to communicate with the rest of the world anyway. (The most complicated formal one is the IO monad of Haskell.) Apart from simple bindings for some other synchronization library, they would probably try to implement a lock-free, wait-free data structure if possible.
Some approaches include:
Data that is written only once and never altered can be accessed safely with no locks or waiting on most CPUs. (There is typically a memory fence instruction to ensure that the memory updates in the right order for both the producer and the consumer.)
Some data structures, such as a difference list, have the property that you can tack on updates without invalidating any existing data. Let's say you have the association list [(1,'a'), (2,'b'), (3,'c')] and you want to update by changing the third entry to 'g'. If you express this as (3,'g'):originalList, then you can update the current list with the new version, and keep originalList valid and unaltered. Any thread that saw it can still safely use it.
Even if you have to work around the garbage collector, each thread can make its own thread-local copy of the shared state so long as the original does not get deleted while it is being copied. The underlying low-level implementation would be a producer/consumer model that atomically updates a pointer to the state data and inserts memory-fence instructions before the update and the copy operations.
If the program has a way to compare-and-swap atomically, and the garbage collector is aware, each thread can use the receive-copy-update pattern. A thread-aware garbage collector will keep the older data around as long as any thread is using it, and recycle it when the last thread is done with it. This should not require locking in software (for example, on modern ISAs, incrementing or decrementing a word-sized counter is an atomic operation, and atomic compare-and-swap is wait-free).
The functional language can add an extension to call an IPC library written in some other language, and update data in place. In Haskell, this would be defined with the IO monad to ensure sequential memory consistency, but nearly every functional language has some way to exchange data with the system libraries.
So, a functional language does offer some guarantees that are useful for efficient concurrent programming. For example, most current ISAs impose no extra overhead on multiple reader threads when there is at most a single writer, certain consistency bugs cannot occur, and functional languages are well-suited to express this pattern.
The boost::asio library provides an interesting synchronization model using "strands" to serialize accesses to a resource that would normally require locks. This increases parallelism by essentially turning every lock operation into an enqueue.
Searching for "strands" only yields relevant results about asio, even though they seem like an exceptionally useful primitive for multithreading. Is there some other term for them that I'm missing?
Link to the asio strand documentation: http://www.boost.org/doc/libs/1_54_0/doc/html/boost_asio/reference/io_service__strand.html
I am not aware of an official name for the construct.
The proposal based on Boost.Asio (N2175 - Networking Library Proposal for TR2) documents the strand class, but does not reference any relevant material. Also, the Intel compiler documentation makes a few references to strand in its execution model, defining it as "any sequence of instructions without any parallel control structures."
I've started doing a bit of programming in the iOS and Mac OS X domain, they have a similar concept to strands called a serial dispatch queue from Grand Central Dispatch. The tasks are executed in the order they are added to the queue, just like a strand. Similarly, the thread that executes the task is not defined, just as with asio when multiple threads invoke io_service::run().
Are reading operations in the dense_hash_map thread safe?
A const C++ object of reentrant type (most are) is generally assumed to be thread-safe.
The documentation of dense_hash_map doesn't specify anything regarding thread-safety, so the most defensive approach would be to assume it isn't even reentrant. It takes unprotected global mutable state to make a class non-reentrant, though, and it's hard to find an argument for dense_hash_map to require that, but seeing as it stores its contents to disk, that might be all you can hope for. To assume the thing is thread-safe even on mutable operations is far-fetched without confirmation from the documentation.
Barring documentation, you might want to have a look at the implementation to see whether you can verify reentrancy for at least some subset of the API.
According to the paper Scalable, High Performance Ethernet Forwarding with CUCKOOSWITCH (2013), google::dense_hash_map is not thread-safe for reads and writes:
[...] We therefore also compare to three non-thread-safe hash tables: the STL’s hash_map and Google’s sparse_hash_map and dense_hash_map. [...] These non-thread-safe tables do not support concurrent reads and writes.
I could not find any other information about google::dense_hash_map being thread-safe or not.
I am looking for the optimal strategy to use STL containers (like std::map and std::vector) and pthreads.
What is the canonical way to go? A simple example:
std::map<string, vector<string>> myMap;
How do we guarantee concurrency?
mutex_lock;
write at myMap;
mutex_unlock;
Additionally, I would like to know if pthreads and STL face performance issues when used together.
System: Liunx, g++, pthreads, no boost, no Intel TBB
The C++03 Standard does not talk about concurrency at all, So the concurrency aspect is left out as an implementation detail for compilers. So the documentation that comes with your compiler is where one should look to for answers related to concurrency.
Most of the STL implementations are not thread safe as such.
Since STL containers do not provide any explicit Thread safety, So yes you will have to use your own synchronization mechanism. And while you are at it You should use RAII rather than manage the synchronization resource(mutex unlock etc) manually.
You can refer the Documentations here:
MSDN:
If a single object is being written to by one thread, then all reads and writes to that object on the same or other threads must be protected. For example, given an object A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.
GCC Documentation says:
We currently use the SGI STL definition of thread safety, which states:
The SGI implementation of STL is thread-safe only in the sense that simultaneous accesses to distinct containers are safe, and simultaneous read accesses to to shared containers are safe. If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses.
Point to Note: GCC's Standard Library is a derivative of SGI's STL code.
The canonical way to provide concurrency is to hold a lock while accessing the collection.
That works in 90% of the cases where access to the collection isn't performance-critical anyway. If you're accessing a shared collection so much that locking around it harms performance, you should rethink your design. (And odds are, your design is okay and it won't affect performance anywhere near as much as you might suspect.)
You should take a look at intel thread building blocks tbb ( http://threadingbuildingblocks.org/ ). They have a few very optimized data structures that handle concurrency internally using non-blocking strategies.
I need several STL containers, threadsafe.
Basically I was thinking I just need 2 methods added to each of the STL container objects,
.lock()
.unlock()
I could also break it into
.lockForReading()
.unlockForReading()
.lockForWriting()
.unlockForWriting()
The way that would work is any number of locks for parallel reading are acceptable, but if there's a lock for writing then reading AND writing are blocked.
An attempt to lock for writing waits until the lockForReading semaphore drops to 0.
Is there a standard way to do this?
Is how I'm planning on doing this wrong or shortsighted?
This is really kind of bad. External code will not recognize or understand your threading semantics, and the ease of availability of aliases to objects in the containers makes them poor thread-safe interfaces.
Thread-safety occurs at design time. You can't solve thread safety by throwing locks at the problem. You solve thread safety by not having two threads writing to the same data at the same time- in the general case, of course. However, it is not the responsibility of a specific object to handle thread safety, except direct threading synchronization primitives.
You can have concurrent containers, designed to allow concurrent use. However, their interfaces are vastly different to what's offered by the Standard containers. Less aliases to objects in the container, for example, and each individual operation is encapsulated.
The standard way to do this is acquire the lock in a constructor, and release it in the destructor. This is more commonly know as Resource Acquisition Is Initialization, or RAII. I strongly suggest you use this methodology rather than
.lock()
.unlock()
Which is not exception safe. You can easily forget to unlock the mutex prior to throwing, resulting in a deadlock the next time a lock is attempted.
There are several synchronization types in the Boost.Thread library that will be useful to you, notably boost::mutex::scoped_lock. Rather than add lock() and unlock() methods to whatever container you wish to access from multiple threads, I suggest you use a boost:mutex or equivalent and instantiate a boost::mutex::scoped_lock whenever accessing the container.
Is there a standard way to do this?
No, and there's a reason for that.
Is how I'm planning on doing this
wrong or shortsighted?
It's not necessarily wrong to want to synchronize access to a single container object, but the interface of the container class is very often the wrong place to put the synchronization (like DeadMG says: object aliases, etc.).
Personally I think both TBB and stuff like concurrent_vector may either be overkill or still the wrong tools for a "simple" synchronization problem.
I find that ofttimes just adding a (private) Lock object (to the class holding the container) and wrapping up the 2 or 3 access patterns to the one container object will suffice and will be much easier to grasp and maintain for others down the road.
Sam: You don't want a .lock() method because something could go awry that prevents calling the .unlock() method at the end of the block, but if .unlock() is called as a consequence of object destruction of a stack allocated variable then any kind of early return from the function that calls .lock() will be guaranteed to free the lock.
DeadMG:
Intel's Threading Building Blocks (open source) may be what you're looking for.
There's also Microsoft's concurrent_vector and concurrent_queue, which already comes with Visual Studio 2010.