I'm developing a multi-threaded plugin for a single-threaded application (which has a non-thread-safe API).
My current plugin has two threads: the main one which is application's thread and another one which is used for processing data of the main thread. Long story short, the first one creates objects, gives them an ID, inserts them into a map and sometimes even access and delete them (if application says so); the second one is reading data from that map and is altering objects.
My question is: What tehniques can I use in order to make my plugin thread-safe?
First, you have to identify where race conditions may exist. Then, you will have to use some mechanism to assure that the shared data is accessed in a safe way, hence achieving Thread Safety.
For your particular case, it seems the race condition will be on the shared map and possibly the objects (map's values) it contains as well (if it's possible that both threads attempt to alter the same object simultaneously).
My suggestion is that you use a well tested thread safe map implementation, and then if needed add the extra "protection" for the map's values themselves. This way you ensure the map is always in a consistent state for both threads, and if both threads attempt to modify the same object data (map's values), the data won't be corrupted or left inconsistent.
For the map itself, you can search for "Concurrent Hash Map" or "Atomic Hash Map" data structures for C++ and see if they are of good quality and are available for your compiler/platform. Good examples are Intel's TBB concurrent_hash_map or Facebook's folly AtomicHashMap. They both have advantages and disadvantages and you will have to analyze what's best for your situation.
As for the objects the map contains, you can use plain mutexes (simple, lock, modify data, unlock), atomic operations (trickier, only for simple datatypes) or other method, once more depending on your compiler/platform and speed requirements.
Hope this helps!
Related
So, I have a ton ob objects, each having several fields, including a c-array, which are modified within their "Update()" method. Now I create several threads, each updating a section of these objects. As far as I understand calling lock() before calling the update function would be useless, since this would essentially cause the updates being called in a sequential order just like they would be without multithreading. Now, there objects have pointers, cross referencing to each other. Do I need to call lock every time ANY field is modified, or just before specific operations (like delete, re-initializing arrays, etc?)
Do I need to call lock every time ANY field is modified, or just before specific operations (like delete, re-initializing arrays, etc?)
Neither. You need to have a lock even to read, to make sure another thread isn't part way through modifying the data you're reading. You might want to use a many reader / one writer lock. I suggest you start by having a single lock (whether a simple mutex or the more elaborate multi-reader/writer lock) and get the code working so you can profile it and see whether you actually need more fine-grained locking, then you'll have a bit more experience and understanding of options and advice about how to manage that.
If you do need fine-grained locking, then the trick is to think about where the locks logically belong - for example - there could be one per object. You'll then need to learn about techniques for avoiding deadlocks. You should do some background reading too.
It depends on the consequences of the data changes you want to make. If each thread is, for example, changing well defined sub-blocks of data and each sub-block is entirely independent of all other sub-blocks then it might make sense to have a mutex per sub-block.
That would allow one thread to deal with one set of sub-blocks whilst another gets a different subset to process.
Having threads make changes without gaining a mutex lock first is going to lead to inconsistencies at best...
If the data and processing isn't subdivisible that way then you would probably have to start thinking about how you might handle whole objects in parallel, ie adopt a coarser granularity and one mutex per object. This is perhaps more likely to be possible - different objects are supposed to be independent of each other, so it should in theory be possible to process their data in parallel.
However the unavoidable truth is that some computer jobs require fast single thread performance. For that one starts seriously needing the right sort of supercomputer and perhaps some jolly long pipelines.
I have multiple preforked server processes which accept requests to modify a shared STL C++ list on a server. Each process simply pushes a new element at the end of the list and returns the iterator.
I'm not sure how should each process attempt to acquire lock on the list? Should it be on entire object or are STL Lists capable of handling concurrency since we're just pushing an element at the end of the list?
Assuming you meant threads rather than processes you can share the STL containers but you need to be careful with respect to synchronization. The STL containers are threads safe to some extend but you need to understand the thread safety guarantees given:
One container can be used by multiple readers concurrently.
If there is one writer for a container, there shall neither be concurrent readers nor concurrent writers.
The guarantees are per container, i.e., different containers can concurrently be used by threads without need of synchronization between them.
The reason for these restrictions is that the interface for the containers is geared towards efficient use within one thread and you don't want to impeded the processing of an unshared container with the potential of being shared across threads. Also, the container interface isn't suitable for any sort of container maintained concurrency mechanism. For example, just because v.empty() just returned false it doesn't mean that v.pop() works because the container can be empty by now: If there were internal synchronization any lock would have been released once empty() returned and the container can be changed by the time pop() is called.
It is relatively easy to create a queue to be used for communication between different threads. It would use a std::mutex and a suitable instantiation of std::condition_variable. I think there is something like this proposed for inclusion into the standard but it isn't, yet, part of the standard C++ library. Note, however, that such a class would not return an iterator to the inserted element because by the time you'd access it, the element may be gone again and it would be questionable what the iterator is used for anyway.
The mechanism for doing this kind of synchronisation between multiple processes requires that the developer deal with several issues. Firstly whatever is being shared between the processes needs to be set up outside of them. What this usually means in practice is the use of shared memory.
Then these processes need to communicate with each other with respect to accessing the memory being shared. After all if one thread starts to work on a data structure being shared, but gets swapped out before completing the operation it will leave the data inconsistent.
This synchronisation can be done using operating system constructs such as semaphores in linux, and will allow competing processes to coordinate.
See This for linux based IPC detail
See This for Windows based IPC detail
For some reference you can use the Boost.Interprocess documentation which provides a platform independent implementation of IPC mechanisms.
The standard library containers offer no automagic protection against concurrent modifications, so you need a global lock for every access of the queue.
You even have to be careful with the iterators or references to list elements, since you may not necessarily know when the corresponding element has been removed from the list.
While reading up on POSIX threading, I came across an example of thread-specific-data. I did have one area of confusion in my mind...
The thread-specific-data interface looks a little clunky, especially once you mix in having to use pthread_once, the various initializers, etc.
Is there any reason I can't just use a static std::map where the key is the pthread_self() id and the data value is held in the second part of the std::pair?
I can't think of a reason that this wouldn't work as long as it was wrapped in a mutex, but I see no suggestion of it or anything similar which confuses me given it sounds much easier than the provided API. I know threading can have alot of catch-22's so I thought I'd ask and see if I was about to step in... something unpleasant? :)
I can't think of a reason that this wouldn't work as long as it was wrapped in a mutex
That in itself is a very good reason; implemented properly, you can access your thread-specific data without preventing other threads from simultaneously creating or accessing theirs.
There's also general efficiency (constant time access, versus logarithmic time if you use std::map), no guarantee that pthread_t has a suitable ordering defined, and automatic cleanup along with all the other thread resources.
You could use C++11's thread_local keyword, or boost::thread_specific_ptr, if you don't like the Posix API.
pthread thread-specific-data existed before the standard library containers
thread-specific-data avoids the need of locking and makes sure no other thread messes with the data
The data is cleaned up automatically when the thread disappears
Having said that, nothing stops you from using your own solution. If you can be sure that the container is completely constructed before any threads are running (static threading model), you don't even need the mutex.
Is it ok to check the current thread inside a function?
For example if some non-thread safe data structure is only altered by one thread, and there is a function which is called by multiple threads, it would be useful to have separate code paths depending on the current thread. If the current thread is the one that alters the data structure, it is ok to alter the data structure directly in the function. However, if the current thread is some other thread, the actual altering would have to be delayed, so that it is performed when it is safe to perform the operation.
Or, would it be better to use some boolean which is given as a parameter to the function to separate the different code paths?
Or do something totally different?
What do you think?
You are not making all too much sense. You said a non-thread safe data structure is only ever altered by one thread, but in the next sentence you talk about delaying any changes made to that data structure by other threads. Make up your mind.
In general, I'd suggest wrapping the access to the data structure up with a critical section, or mutex.
It's possible to use such animals as reader/writer locks to differentiate between readers and writers of datastructures but the performance advantage for typical cases usually wont merit the additional complexity associated with their use.
From the way your question is stated, I'm guessing you're fairly new to multithreaded development. I highly suggest sticking with the simplist and most commonly used approaches for ensuring data integrity (most books/articles you readon the issue will mention the same uses for mutexes/critical sections). Multithreaded development is extremely easy to get wrong and can be difficult to debug. Also, what seems like the "optimal" solution very often doesn't buy you the huge performance benefit you might think. It's usually best to implement the simplist approach that will work then worry about optimizing it after the fact.
There is a trick that could work in case, as you said, the other threads will only make changes only once in a while, although it is still rather hackish:
make sure your "master" thread can't be interrupted by the other ones (higher priority, non fair scheduling)
check your thread
if "master", just change
if other, put off scheduling, if needed by putting off interrupts, make change, reinstall scheduling
really test to see whether there are no issues in your setup.
As you can see, if requirements change a little bit, this could turn out worse than using normal locks.
As mentioned, the simplest solution when two threads need access to the same data is to use some synchronization mechanism (i.e. critical section or mutex).
If you already have synchronization in your design try to reuse it (if possible) instead of adding more. For example, if the main thread receives its work from a synchronized queue you might be able to have thread 2 queue the data structure update. The main thread will pick up the request and can update it without additional synchronization.
The queuing concept can be hidden from the rest of the design through the Active Object pattern. The activ object may also be able to publish the data structure changes through the Observer pattern to other interested threads.
Hi Guys I want to know what is the difference between thread safe Data and Thread Safe Containers
Thread safe data:
Generally refers to data which is protected using mutexes, semaphores or other similar constructs.
Data is considered thread safe if measures have been put in place to ensure that:
It can be modified from multiple threads in a controlled manner, to ensure the resultant data structure doesn't becoming corrupt, or lead to race conditions in the code.
It can be read in a reliable fashion without the data become corrupt during the read process. This is especially important with STL-style containers which use iterators.
Mutexes generally work by blocking access to other threads while one thread is modifying shared data. This is also known as a critical section, and RAII is a common design pattern used in conjunction with critical sections.
Depending on the CPU type, some primitive data types (e.g. int) and operations (increment) might not need mutex protection (e.g. if they resolve down to an atomic instruction in machine language). However:
It is bad practice to make any assumptions about CPU architecture.
You should always code defensively to ensure code will remain thread-safe regardless of the target platform.
Thread safe containers:
Are containers which have measures in place to ensure that any changes made to them occur in a thread-safe manner.
For example, a thread safe container may allow items to be inserted or removed using a specific set of public methods which ensure that any code which uses it is thread-safe.
In other words, the container class provides the mutex protection as a service to the caller, and the user doesn't have to roll their own.