C++ thread safety - map reading - c++

I am working on a program that needs std::map and specifically one like this map<string,map<string,int>> - it is meant to be something like bank change rates - the first string is the original currency and the one in the second map is the desired one and the int is their rate. This whole map will be read only. Do I still need mutexes ? I am a bit confused about the whole thread safety, since this is my first bigger multi-threaded program.

If you are talking about the standard std::map† and no thread writes to it, no synchronization is required. Concurrent reads without writes are fine.
If however at least one thread performs writes on the map, you will indeed need some sort of protection like a mutex.
Be aware that std::map::operator[] counts as write, so use std::map::at (or std::map::find if the key may not exist in the map) instead. You can make the compiler protect you from accidental writes by only referring to the shared map via const map&.
†Was clarified to be the case in the OP. For completeness' sake: Note that other classes may have mutable members. For those, even access through const& may introduce a race. If in doubt, check the documentation or use something else for parallel programming.

The rule of thumb is if you have shared data and at least one thread will be a writer then you need synchronization. If one of the threads is a writer you must have synchronization as you do not want a reader to read an element that is being written to. This can cause issues as the reader might read part of the old value and part of the new value.
In your case since all the threads will only ever being reading data there is nothing they can do that will affect the map so you can have concurrent(unsynchronized) reads.

Wrap a std::map<std::string, std::map<std::string,int>> const in a custom class which has only const member functions [*].
This will make sure that all threads which use an object of the class after its creation will only read from it, which is guaranteed to be safe since C++11.
As documentation says:
All const member functions can be called concurrently by different
threads on the same container.
Wrapping containers in your own custom types is good practice anyway. Increased thread safety is just one positive side effect of that good practice. Other positive effects include increased readability of client code, reduction/adaption of container interface to required functionality, ease of adding additional constraints and checks.
Here is a brief example:
class BankChangeRates
{
public:
BankChangeRates(std::map<std::string, std::map<std::string,int>> const& data) : data(data) {}
int get(std::string const& key, std::string const& inner_key) const
{
auto const find_iter = data.find(key);
if (find_iter != data.end())
{
auto const inner_find_iter = find_iter->second.find(inner_key);
if (inner_find_iter != find_iter->second.end())
{
return inner_find_iter->second;
}
}
// error handling
}
int size() const
{
return data.size();
}
private:
std::map<std::string, std::map<std::string,int>> const data;
};
In any case, the thread-safety problem is then reduced to how to make sure that the constructor does not read from an object to which another thread writes. This is often achieved trivially; for example, the object may be constructed before multi-threading even begins, or it may be initialised with hard-coded initialisation lists. In many other cases, the code which creates the object will generally access only other thread-safe functions and local objects.
The point is that concurrent accesses to your object will always be safe once it has been created.
[*] Of course, the const member functions should keep their promise and not attempt "workarounds" with mutable or const_cast.

If your are completely sure that both the maps are ALWAYS READONLY, Then you never need mutexes.
But you have to be extra careful that no one can update the map by any means during the program execution. Make sure that you are initializing the map at the init stage of program and then never update it for any reason.
If you are confused that, In future you may need to update it in between the program execution, then its better to have macros around the map, which are empty right now. And in future, if you need mutexes around them, just change the macro definition.
PS:: I have used map in answer which can be easily replaced by shared resources. It was for the ease of understanding

Related

c++ multithreading const parameter question

I'm in the design phase of a multi threading problem I might implement in c++. Will be the first time implementing something multi threaded in c++. The question is quite simple: If I have a function with a const parameter as input, is it just the function under consideration that is not allowed to alter it? Or does c++ guarantee that the parameter will not change (even if another thread tries to access it mid-function)?
void someFunction(const SomeObject& mustNotChange){
bool check;
if(mustNotChange.getNumber()==0) check == true; //sleep for 10s
if(check && mustNotChange.getNumber()!=0) //CRASH!!!
}
In your example const doesn't make any difference for other threads as mustNotChange is in your current function stack space (or even a register) which should not be accessible by other threads.
I assume you are more interested in the case where other threads can access the memory, something like:
void someFunction(const int& mustNotChange)
{
//...
}
void someOtherFunction(int& mayCHange)
{
//...
}
int main()
{
int i = 0;
std::thread t0([&i](){someFunction(i);});
std::thread t1([&i](){someOtherFunction(i);});
t0.join();
t1.join();
return 0;
}
In this case the const ensure that someFunction can't change the value of mustNotChange and if it does it's undefined behavior, but this doesn't offer any guarantees about other functions that can access the same memory.
As a conclusion:
if you don't share data (as in the same memory location) between threads as you do in your example you don't have to worry about data races
if you share data, but no function can change the shared data (all functions receive data as const) you don't have to worry about data races. Please note that in current example if you change i before both threads join it's still a data race!
if you share data and at least one function can change the data you
must synchronization mechanisms.
It is not possible to say by just looking at the code if the referred-to object can change or not. This is why it's important to design the application with concurrency in mind, to, for example, minimize explicit sharing of writable data.
It doesn't matter that an object is accessed inside a function or through a const reference, the rules are the same. In the C++ memory model, accessing the same (non-atomic) value concurrently is possible only for reading. As soon as at least one writer is involved (in any thread), no other reads or writes may happen concurrently.
Furthermore, reads and writes in different threads must be synchronized; this is known as the "happens-before" semantics; locking and releasing a mutex or waiting on an atomic are examples of synchronization events which "release" writes to other threads, which subsequently "acquire" those writes.
For more details on C++ concurrency there is a very good book "C++ Concurrency in Action". Herb Sutter also has a nice atomic<> weapons talk.
Basically the answer is yes; You are passing the variable by const value and hence the function to which this variable is scoped is not allowed to alter it.
It is best to think of a const parameter in C++ as a promise by the function not to change the value. It doesn't mean someone else doesn't.

Guarded Data Design Pattern

In our application we deal with data that is processed in a worker thread and accessed in a display thread and we have a mutex that takes care of critical sections. Nothing special.
Now we thought about re-working our code where currently locking is done explicitely by the party holding and handling the data. We thought of a single entity that holds the data and only gives access to the data in a guarded fashion.
For this, we have a class called GuardedData. The caller can request such an object and should keep it only for a short time in local scope. As long as the object lives, it keeps the lock. As soon as the object is destroyed, the lock is released. The data access is coupled with the locking mechanism without any explicit extra work in the caller. The name of the class reminds the caller of the present guard.
template<typename T, typename Lockable>
class GuardedData {
GuardedData(T &d, Lockable &m) : data(d), guard(m) {}
boost::lock_guard<Lockable> guard;
T &data;
T &operator->() { return data; }
};
Again, a very simple concept. The operator-> mimics the semantics of STL iterators for access to the payload.
Now I wonder:
Is this approach well known?
Is there maybe a templated class like this already available, e.g. in the boost libraries?
I am asking because I think it is a fairly generic and usable concept. I could not find anything like it though.
Depending upon how this is used, you are almost guaranteed to end up with deadlocks at some point. If you want to operate on 2 pieces of data then you end up locking the mutex twice and deadlocking (unless each piece of data has its own mutex - which would also result in deadlock if the lock order is not consistent - you have no control over that with this scheme without making it really complicated). Unless you use a recursive mutex which may not be desired.
Also, how are your GuardedData objects passed around? boost::lock_guard is not copyable - it raises ownership issues on the mutex i.e. where & when it is released.
Its probably easier to copy parts of the data you need to the reader/writer threads as and when they need it, keeping the critical section short. The writer would similarly commit to the data model in one go.
Essentially your viewer thread gets a snapshot of the data it needs at a given time. This may even fit entirely in a cpu cache sitting near the core that is running the thread and never make it into RAM. The writer thread may modify the underlying data whilst the reader is dealing with it (but that should invalidate the view). However since the viewer has a copy it can continue on and provide a view of the data at the moment it was synchronized with the data.
The other option is to give the view a smart pointer to the data (which should be treated as immutable). If the writer wishes to modify the data, it copies it at that point, modifies the copy and when completes, switches the pointer to the data in the model. This would necessitate blocking all readers/writers whilst processing, unless there is only 1 writer. The next time the reader requests the data, it gets the fresh copy.
Well known, I'm not sure. However, I use a similar mechanism in Qt pretty often called a QMutexLocker. The distinction (a minor one, imho) is that you bind the data together with the mutex. A very similar mechanism to the one you've described is the norm for thread synchronization in C#.
Your approach is nice for guarding one data item at a time but gets cumbersome if you need to guard more than that. Additionally, it doesn't look like your design would stop me from creating this object in a shared place and accessing the data as often as I please, thinking that it's guarded perfectly fine, but in reality recursive access scenarios are not handled, nor are multi-threaded access scenarios if they occur in the same scope.
There seems to be to be a slight disconnect in the idea. Its use conveys to me that accessing the data is always made to be thread-safe because the data is guarded. Often, this isn't enough to ensure thread-safety. Order of operations on protected data often matters, so the locking is really scope-oriented, not data-oriented. You could get around this in your model by guarding a dummy object and wrapping your guard object in a temporary scope, but then why not just use one the existing mutex implementations?
Really, it's not a bad approach, but you need to make sure its intended use is understood.

Are non-mutating operations inherently thread safe (C++)?

This is probably a dumb question, but consider the following psuedo-code:
struct Person {
std::string name;
};
class Registry {
public:
const std::string& name(int id) const {return _people[id].name;}
void name(int id, const std::string& name) { [[scoped mutex]]; _people[id].name = name;}
private:
std::map<int, Person> _people;
};
In this simple example, assume Registry is a singleton that will be accessed by multiple threads. I'm locking during an operation that mutates the data, but not during non-mutating access.
Is this thread safe, or should I also lock during the read operation? I'm preventing multiple threads from trying to modify the data at the same time, but I don't know what would happen if a thread was trying to read at the same time another was writing.
If any thread can modify the data, then you need to lock for all access.
Otherwise, one of your "reading" threads could access the data when it is in an indeterminate state. Modifying a map, for example, requires manipulating several pointers. Your reading thread could acces the map while some - but not all - of the map has been adjusted.
If you can guarantee that the data is not being modified, multiple reads from multiple threads do not need to be locked, however that introduces a fragile scenario that you would have to keep a close eye on.
It's not thread safe to be reading the data while it is being modified. It is perfectly safe to have multiple threads reading the data at once.
This difference is what reader-writer locks are for; they will allow any number of readers but when a writer tries to lock the resource new readers will no longer be allowed and the writer will block until all the current readers are done. Then the writer will proceed and once it's done all the readers will be allowed access again.
The reason it's not safe to read data during modification is that the data can be or can appear to be in an inconsistent state (e.g., the object may temporarily not fulfill invariant). If the reader reads it at that point then it's just like there's a bug in the program failing to keep the data consistent.
// example
int array[10];
int size = 0;
int &top() {
return array[size-1];
}
void insert(int value) {
size++;
top() = value;
}
Any number of threads can call top() at the same time, but if one thread is running insert() then a problem occurs when the lines get interleaved like this:
// thread calling insert thread calling top
size++;
return array[size-1];
array[size-1] = value
The reading thread gets garbage.
Of course this is just one possible way things can go wrong. In general you can't even assume the program will behave as though lines of code on different threads will just interleave. In order to make that assumption valid the language simply tells you that you can't have data races (i.e., what we've been talking about; multiple threads accessing a (non-atomic) object with at least one thread modifying the object)*.
* And for completeness; that all atomic accesses use a sequentially consistent memory ordering. This doesn't matter for you since you're not doing low level work directly with atomic objects.
Is this thread safe, or should I also lock during the read operation?
It is not thread-safe.
Per Paragraph 1.10/4 of the C++11 Standard:
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
Moreover, per Paragraph 1.10/21:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. [...]
It's not thread safe.
The reading operation could be going through the map (it's a tree) to find the requested object, while a writing operation suddenly adds or removes the something from the map (or worse, the actual point the iterator is at).
If you're lucky you'll get an exception, otherwise it will be just undefined behavior while your map is in an inconsistant state.
I don't know what would happen if a thread was trying to read at the
same time another was writing.
Nobody knows. Chaos would ensue.
Multiple threads can share a read-only resource, but as soon as anyone wants to write it, it becomes unsafe for everyone to access it in any way until the writing is done.
Why?
Writes are not atomic. They happen over multiple clock cycles. A process attempting to read an object as it's written may find a half-modified version, temporary garbage.
So
Lock your reads if they are expected to be concurrent with your writes.
Absolutely not safe!
If you are changing Person::Name from "John Smith" to "James Watt", you may well read back a value of "Jame Smith" or "James mith". Or possibly even something completely different because the way that "change this value for that" may not just copy the new data into the existing place, but entirely replace it with a newly allocated piece of memory that contains some completely undefined content [including something that isn't a valid string].

Thread safety of multiple-reader/single-writer class

I am working on a set that is frequently read but rarely written.
class A {
boost::shared_ptr<std::set<int> > _mySet;
public:
void add(int v) {
boost::shared_ptr<std::set<int> > tmpSet(new std::set<int>(*_mySet));
tmpSet->insert(v); // insert to tmpSet
_mySet = tmpSet; // swap _mySet
}
void check(int v) {
boost::shared_ptr<std::set<int> > theSet = _mySet;
if (theSet->find(v) != theSet->end()) {
// do something irrelevant
}
}
};
In the class, add() is only called by one thread and check() is called by many threads. check() does not care whether _mySet is the latest or not. Is the class thread-safe? Is it possible that the thread executing check() would observe swap _mySet happening before insert to tmpSet?
This is an interesting use of shared_ptr to implement thread safety.
Whether it is OK depends on the thread-safety guarantees of
boost::shared_ptr. In particular, does it establish some sort of
fence or membar, so that you are guaranteed that all of the writes in
the constructor and insert functions of set occur before any
modification of the pointer value becomes visible.
I can find no thread safety guarantees whatsoever in the Boost
documentation of smart pointers. This surprizes me, as I was sure that
there was some. But a quick look at the sources for 1.47.0 show none,
and that any use of boost::shared_ptr in a threaded environment will
fail. (Could someone please point me to what I'm missing. I can't
believe that boost::shared_ptr has ignored threading.)
Anyway, there are three possibilities: you can't use the shared pointer
in a threaded environment (which seems to be the case), the shared
pointer ensures its own internal consistency in a threaded environment,
but doesn't establish ordering with regards to other objects, or the
shared pointer establishes full ordering. Only in the last case will
your code be safe as is. In the first case, you'll need some form of
lock around everything, and in the second, you'll need some sort of
fences or membar to ensure that the necessary writes are actually done
before publishing the new version, and that they will be seen before
trying to read it.
You do need synchronization, it is not thread safe. Generally it doesn't matter, even something as simple as shared += value; is not thread safe.
look here for example with regards to thread safety of shared_ptr: Is boost shared_ptr <XXX> thread safe?
I would also question your allocation/swapping in add() and use of shared_ptr in check()
update:
I went back and re-rad dox for shared_ptr ... It is most likely thread-safe in your particular since the reference counting for shared_ptr is thread-safe. However you are doing (IMHO) unnecessary complexity by not using read/write lock.
Eventually this code should be thread safe:
atomic_store(&_my_set,tmpSet);
and
theSet = atomic_load(&_mySet);
(instead of simple assignments)
But I don't know the current status of atomicity support for shared_ptr.
Note, that adding atomicity to shared_ptr in lock-free manner is really dificult thing; so even atomicity is implemented it may relay on mutexes or usermode spinlocks and, therefore, may sometimes suffer from performance issues
Edit: Perhaps, volatile qualifier for _my_set member variable should also be added.. but I'm not sure that it is strictly required by semantics of atomic operations

Thread safety of std::map for read-only operations

I have a std::map that I use to map values (field ID's) to a human readable string. This map is initialised once when my program starts before any other threads are started, and after that it is never modified again. Right now, I give every thread its own copy of this (rather large) map but this is obviously inefficient use of memory and it slows program startup. So I was thinking of giving each thread a pointer to the map, but that raises a thread-safety issue.
If all I'm doing is reading from the map using the following code:
std::string name;
//here N is the field id for which I want the human readable name
unsigned field_id = N;
std::map<unsigned,std::string>::const_iterator map_it;
// fields_p is a const std::map<unsigned, std::string>* to the map concerned.
// multiple threads will share this.
map_it = fields_p->find(field_id);
if (map_it != fields_p->end())
{
name = map_it->second;
}
else
{
name = "";
}
Will this work or are there issues with reading a std::map from multiple threads?
Note: I'm working with visual studio 2008 currently, but I'd like this to work acros most main STL implementations.
Update: Edited code sample for const correctness.
This will work from multiple threads as long as your map remains the same. The map you use is immutable de facto so any find will actually do a find in a map which does not change.
Here is a relevant link: http://www.sgi.com/tech/stl/thread_safety.html
The SGI implementation of STL is
thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe. If multiple threads access a
single container, and at least one
thread may potentially write, then the
user is responsible for ensuring
mutual exclusion between the threads
during the container accesses.
You fall into he "simultaneous read accesses to shared containers" category.
Note: this is true for the SGI implementation. You need to check if you use another implementation. Of the two implementations which seem widely used as an alternative, STLPort has built-in thread safety as I know. I don't know about the Apache implementation though.
It should be fine.
You can use const references to it if you want to document/enforce read-only behaviour.
Note that correctness isn't guaranteed (in principle the map could choose to rebalance itself on a call to find), even if you do use const methods only (a really perverse implementation could declare the tree mutable). However, this seems pretty unlikely in practise.
Yes it is.
See related post with same question about std::set:
Is the C++ std::set thread-safe?
For MS STL implementation
Thread Safety in the C++ Standard Library
The following thread safety rules apply to all classes in the C++ Standard Library—this includes shared_ptr, as described below. Stronger guarantees are sometimes provided—for example, the standard iostream objects, as described below, and types specifically intended for multithreading, like those in .
An object is thread-safe for reading from multiple threads. For example, given an object A, it is safe to read A from thread 1 and from thread 2 simultaneously.