Hi Guys I want to know what is the difference between thread safe Data and Thread Safe Containers
Thread safe data:
Generally refers to data which is protected using mutexes, semaphores or other similar constructs.
Data is considered thread safe if measures have been put in place to ensure that:
It can be modified from multiple threads in a controlled manner, to ensure the resultant data structure doesn't becoming corrupt, or lead to race conditions in the code.
It can be read in a reliable fashion without the data become corrupt during the read process. This is especially important with STL-style containers which use iterators.
Mutexes generally work by blocking access to other threads while one thread is modifying shared data. This is also known as a critical section, and RAII is a common design pattern used in conjunction with critical sections.
Depending on the CPU type, some primitive data types (e.g. int) and operations (increment) might not need mutex protection (e.g. if they resolve down to an atomic instruction in machine language). However:
It is bad practice to make any assumptions about CPU architecture.
You should always code defensively to ensure code will remain thread-safe regardless of the target platform.
Thread safe containers:
Are containers which have measures in place to ensure that any changes made to them occur in a thread-safe manner.
For example, a thread safe container may allow items to be inserted or removed using a specific set of public methods which ensure that any code which uses it is thread-safe.
In other words, the container class provides the mutex protection as a service to the caller, and the user doesn't have to roll their own.
Related
With std::atomic, there seems to be no standards-compliant way to sometimes read/write without atomicity. Boost has interlocked operations, but they are in the details namespace so I don't think I'm supposed to use it. But I don't know all of boost. Is there something in boost or stl that I could use? Or perhaps a proposal which would address this, like maybe adding a std::memory_order_no_synchronization? Or access an abstraction of the interlocked machinery?
It seems like the need for this would appear in many designs. Objects that require thread-safety might be held temporarily in a single-threaded context, making atomic access superfluous. For example, when first constructing an object, it would usually be visible only on the thread creating it, until placed somewhere accessible to multiple threads. So long as your object is only reachable by a single thread, you wouldn't need std::atomic's safety at all. But then once ready, you'd publish it to other threads, and suddenly you need locking and enforced atomic access.
In this particular application, I'm building large lockless trees. During construction, there is just zero need for interlocked, so the existing design (which calls os-provided interlocked functions) does not use interlocked until it becomes necessary. After it becomes visible to other threads, all threads should be using an interlocked view. I can't port to std::atomic without introducing a bunch of pointless synchronization.
The best I can even think of now is to use std::memory_order_relaxed, but on ARM, this is still not the same as non-atomic access. On x86/amd64 it is though. Another hack is placement new on std::atomic, which writes a new value without atomicity, but that doesn't provide any way to read back the value non-atomically.
I'm developing a multi-threaded plugin for a single-threaded application (which has a non-thread-safe API).
My current plugin has two threads: the main one which is application's thread and another one which is used for processing data of the main thread. Long story short, the first one creates objects, gives them an ID, inserts them into a map and sometimes even access and delete them (if application says so); the second one is reading data from that map and is altering objects.
My question is: What tehniques can I use in order to make my plugin thread-safe?
First, you have to identify where race conditions may exist. Then, you will have to use some mechanism to assure that the shared data is accessed in a safe way, hence achieving Thread Safety.
For your particular case, it seems the race condition will be on the shared map and possibly the objects (map's values) it contains as well (if it's possible that both threads attempt to alter the same object simultaneously).
My suggestion is that you use a well tested thread safe map implementation, and then if needed add the extra "protection" for the map's values themselves. This way you ensure the map is always in a consistent state for both threads, and if both threads attempt to modify the same object data (map's values), the data won't be corrupted or left inconsistent.
For the map itself, you can search for "Concurrent Hash Map" or "Atomic Hash Map" data structures for C++ and see if they are of good quality and are available for your compiler/platform. Good examples are Intel's TBB concurrent_hash_map or Facebook's folly AtomicHashMap. They both have advantages and disadvantages and you will have to analyze what's best for your situation.
As for the objects the map contains, you can use plain mutexes (simple, lock, modify data, unlock), atomic operations (trickier, only for simple datatypes) or other method, once more depending on your compiler/platform and speed requirements.
Hope this helps!
I have multiple preforked server processes which accept requests to modify a shared STL C++ list on a server. Each process simply pushes a new element at the end of the list and returns the iterator.
I'm not sure how should each process attempt to acquire lock on the list? Should it be on entire object or are STL Lists capable of handling concurrency since we're just pushing an element at the end of the list?
Assuming you meant threads rather than processes you can share the STL containers but you need to be careful with respect to synchronization. The STL containers are threads safe to some extend but you need to understand the thread safety guarantees given:
One container can be used by multiple readers concurrently.
If there is one writer for a container, there shall neither be concurrent readers nor concurrent writers.
The guarantees are per container, i.e., different containers can concurrently be used by threads without need of synchronization between them.
The reason for these restrictions is that the interface for the containers is geared towards efficient use within one thread and you don't want to impeded the processing of an unshared container with the potential of being shared across threads. Also, the container interface isn't suitable for any sort of container maintained concurrency mechanism. For example, just because v.empty() just returned false it doesn't mean that v.pop() works because the container can be empty by now: If there were internal synchronization any lock would have been released once empty() returned and the container can be changed by the time pop() is called.
It is relatively easy to create a queue to be used for communication between different threads. It would use a std::mutex and a suitable instantiation of std::condition_variable. I think there is something like this proposed for inclusion into the standard but it isn't, yet, part of the standard C++ library. Note, however, that such a class would not return an iterator to the inserted element because by the time you'd access it, the element may be gone again and it would be questionable what the iterator is used for anyway.
The mechanism for doing this kind of synchronisation between multiple processes requires that the developer deal with several issues. Firstly whatever is being shared between the processes needs to be set up outside of them. What this usually means in practice is the use of shared memory.
Then these processes need to communicate with each other with respect to accessing the memory being shared. After all if one thread starts to work on a data structure being shared, but gets swapped out before completing the operation it will leave the data inconsistent.
This synchronisation can be done using operating system constructs such as semaphores in linux, and will allow competing processes to coordinate.
See This for linux based IPC detail
See This for Windows based IPC detail
For some reference you can use the Boost.Interprocess documentation which provides a platform independent implementation of IPC mechanisms.
The standard library containers offer no automagic protection against concurrent modifications, so you need a global lock for every access of the queue.
You even have to be careful with the iterators or references to list elements, since you may not necessarily know when the corresponding element has been removed from the list.
I wanted to understand what does one mean by lock_free property of atomic variables in c++11. I did googled out and saw the other relevant questions on this forum but still having partial understanding. Appreciate if someone can explain it end-to-end and in simple way.
It's probably easiest to start by talking about what would happen if it was not lock-free.
The most obvious way to handle most atomic tasks is by locking. For example, to ensure that only one thread writes to a variable at a time, you can protect it with a mutex. Any code that's going to write to the variable needs obtain the mutex before doing the write (and release it afterwards). Only one thread can own the mutex at a time, so as long as all the threads follow the protocol, no more than one can write at any given time.
If you're not careful, however, this can be open to deadlock. For example, let's assume you need to write to two different variables (each protected by a mutex) as an atomic operation -- i.e., you need to ensure that when you write to one, you also write to the other). In such a case, if you aren't careful, you can cause a deadlock. For example, let's call the two mutexes A and B. Thread 1 obtains mutex A, then tries to get mutex B. At the same time, thread 2 obtains mutex B, and then tries to get mutex A. Since each is holding one mutex, neither can obtain both, and neither can progress toward its goal.
There are various strategies to avoid them (e.g., all threads always try to obtain the mutexes in the same order, or upon failure to obtain a mutex within a reasonable period of time, each thread releases the mutex it holds, waits some random amount of time, and then tries again).
With lock-free programming, however, we (obviously enough) don't use locks. This means that a deadlock like above simply cannot happen. When done properly, you can guarantee that all threads continuously progress toward their goal. Contrary to popular belief, it does not mean the code will necessarily run any faster than well written code using locks. It does, however, mean that deadlocks like the above (and some other types of problems like livelocks and some types of race conditions) are eliminated.
Now, as to exactly how you do that: the answer is short and simple: it varies -- widely. In a lot of cases, you're looking at specific hardware support for doing certain specific operations atomically. Your code either uses those directly, or extends them to give higher level operations that are still atomic and lock free. It's even possible (though only rarely practical) to implement lock-free atomic operations without hardware support (but given its impracticality, I'll pass on trying to go into more detail about it, at least for now).
Jerry already mentioned common correctness problems with locks, i.e. they're hard to understand and program correctly.
Another danger with locks is that you lose determinism regarding your execution time: if a thread that has acquired a lock gets delayed (e.g. descheduled by the operating system, or "swapped out"), then it is possible that the entire program is delayed because it is waiting for the lock. By contrast, a lock-free algorithm is always guaranteed to make some progress, even if any number of threads are held up somewhere else.
By and large, lock-free programming is often slower (sometimes significantly so) than locked programming using non-atomic operations, because atomic operations cause a significant hit on caching and pipelining; however, it offers determinism and upper bounds on latency (at least overall latency of your process; as #J99 observed, individual threads may still be starved as long as enough other threads are making progress). Your program may get a lot slower, but it never locks up entirely and always makes some progress.
The very nature of hardware architectures allows for certain small operations to be intrinsically atomic. In fact, this is a very necessity of any hardware that supports multitasking and multithreading. At the very heart of any synchronisation primitive, such as a mutex, you need some sort of atomic instruction that guarantees correct locking behaviour.
So, with that in mind, we now know that it is possible for certain types like booleans and machine-sized integers to be loaded, stored and exchanged atomically. Thus when we wrap such a type into an std::atomic template, we can expect that the resulting data type will indeed offer load, store and exchange operations that do not use locks. By contrast, your library implementation is always allowed to implement an atomic Foo as an ordinary Foo guarded by a lock.
To test whether an atomic object is lock-free, you can use the is_lock_free member function. Additionally, there are ATOMIC_*_LOCK_FREE macros that tell you whether atomic primitive types potentially have a lock-free instantiation. If you are writing concurrent algorithms that you want to be lock-free, you should thus include an assertion that your atomic objects are indeed lock-free, or a static assertion on the macro to have value 2 (meaning that every object of the corresponding type is always lock-free).
Lock-free is one of the non-blocking techniques. For an algorithm, it involves a global progress property: whenever a thread of the program is active, it can make a forward step in its action, for itself or eventually for the other.
Lock-free algorithms are supposed to have a better behavior under heavy contentions where threads acting on a shared resources may spent a lot of time waiting for their next active time slice. They are also almost mandatory in context where you can't lock, like interrupt handlers.
Implementations of lock-free algorithms almost always rely on Compare-and-Swap (some may used things like ll/sc) and strategy where visible modification can be simplified to one value (mostly pointer) change, making it a linearization point, and looping over this modification if the value has change. Most of the time, these algorithms try to complete jobs of other threads when possible. A good example is the lock-free queue of Micheal&Scott (http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html).
For lower-level instructions like Compare-and-Swap, it means that the implementation (probably the micro-code of the corresponding instruction) is wait-free (see http://www.diku.dk/OLD/undervisning/2005f/dat-os/skrifter/lockfree.pdf)
For the sake of completeness, wait-free algorithm enforces progression for each threads: each operations are guaranteed to terminate in a finite number of steps.
If the locks make sure only one thread accesses the locked data at a time, then what controls access to the locking functions?
I thought that boost::mutex::scoped_lock should be at the beginning of each of my functions so the local variables don't get modified unexpectedly by another thread, is that correct? What if two threads are trying to acquire the lock at very close times? Won't the lock's local variables used internally be corrupted by the other thread?
My question is not boost-specific but I'll probably be using that unless you recommend another.
You're right, when implementing locks you need some way of guaranteeing that two processes don't get the lock at the same time. To do this, you need to use an atomic instruction - one that's guaranteed to complete without interruption. One such instruction is test-and-set, an operation that will get the state of a boolean variable, set it to true, and return the previously retrieved state.
What this does is this allows you to write code that continually tests to see if it can get the lock. Assume x is a shared variable between threads:
while(testandset(x));
// ...
// critical section
// this code can only be executed by once thread at a time
// ...
x = 0; // set x to 0, allow another process into critical section
Since the other threads continually test the lock until they're let into the critical section, this is a very inefficient way of guaranteeing mutual exclusion. However, using this simple concept, you can build more complicated control structures like semaphores that are much more efficient (because the processes aren't looping, they're sleeping)
You only need to have exclusive access to shared data. Unless they're static or on the heap, local variables inside functions will have different instances for different threads and there is no need to worry. But shared data (stuff accessed via pointers, for example) should be locked first.
As for how locks work, they're carefully designed to prevent race conditions and often have hardware level support to guarantee atomicity. IE, there are some machine language constructs guaranteed to be atomic. Semaphores (and mutexes) may be implemented via these.
The simplest explanation is that the locks, way down underneath, are based on a hardware instruction that is guaranteed to be atomic and can't clash between threads.
Ordinary local variables in a function are already specific to an individual thread. It's only statics, globals, or other data that can be simultaneously accessed by multiple threads that needs to have locks protecting it.
The mechanism that operates the lock controls access to it.
Any locking primitive needs to be able to communicate changes between processors, so it's usually implemented on top of bus operations, i.e., reading and writing to memory. It also needs to be structured such that two threads attempting to claim it won't corrupt its state. It's not easy, but you can usually trust that any OS implemented lock will not get corrupted by multiple threads.