Odd issue with std::map and thread safety

Odd issue with std::map and thread safety - c++

This isn't so much of a problem now as I've implemented my own collection but still a little curious on this one.
I've got a singleton which provides access to various common components, it holds instances of these components with thread ID's so each thread should (and does, I checked) have it's own instance of the component such as an Oracle database access library.
When running the system (which is a C++ library being called by a C# application) with multiple incoming requests everything seems to run fine for a while but then it crashes out with an AccessViolation exception. Stepping through the debugger the problem appears to be when one thread finishes and clears out it's session information (held in a std::map object) the session information held in a separate collection instance for the other thread also appears to be cleared out.
Is this something anyone else has encountered or knows about? I've tried having a look around but can't find anything about this kind of problem.
Cheers

Standard C++ containers do not concern themselves with thread safety much. Your code sounds like it is modifying the map instance from two different threads or modifying the map in one thread and reading from it in another. That is obviously wrong. Use some locking primitives to synchronize the access between the threads.

If all you want is a separate object for each thread, you might want to take a look at boost::thread_specific_ptr.

How do you manage giving each thread its own session information? Somewhere under there you have classes managing the lifetimes of these objects, and this is where it appears to be going wrong.

Related

Is it safe to change the reactor's state using the async API without manual synchronization?

Hey
I'm using gRPC with the async API. That requires constructing reactors based on classes like ClientBidiReactor or ServerBidiReactor
If I understand correctly, the gRPC works like this: It takes threads from some thread pool, and using these threads it executes certain methods of the reactors that are being used.
The problem
Now, the problem is when the reactors become stateful. I know that the methods of a single reactor will most probably be executed sequentially, but they may be run from different threads, is this correct? If so, then is it possible that we may encounter a problem described for instance here?
Long story short, if we have an unsynchronized state in such circumstances, is it possible that one thread will update the state, then a next method from the reactor will be executed from a different thread and it will see the not-updated value because the state's new value has not been flushed to the main memory yet?
Honestly, I'm a little confused about this. In the grpc examples here and here this doesn't seem to be addressed (the mutex is for a different purpose there and the values are not atomic).
I used/linked examples for the bidi reactors but this refers to all types of reactors.
Conclusion / questions
There are basically a couple of questions from me at this point:
Are the concerns valid here and do I properly understand everything or did I miss something? Does the problem exist?
Do we need to manually synchronize reactors' state or is it handled by the library somehow(I mean is flushing to the main memory handled)?
Are the library authors aware of this? Did they keep this in mind while they were coding examples I linked?
Thank you in advance for any help, all the best!

You're right that the examples don't showcase this very well, there's some room for improvement. The operation-completion reaction methods (OnReadInitialMetadataDone, OnReadDone, OnWriteDone, ...) can be called concurrently from different threads owned by the gRPC library, so if your code accesses any shared state, you'll want to coordinate that yourself (via synchronization, lock-free types, etc). In practice, I'm not sure how often it happens, or which callbacks are more likely to overlap.
The original callback API spec says a bit more about this, under a "Thread safety" clause: L67: C++ callback-based asynchronous API. The same is reiterated a few places in the callback implementation code itself - client_callback.h#L234-236 for example.

What can two threads/objects do in a collection of objects safely in C++?

So before I start working on a multi threaded program that should interact with multiple objects that are part of a collection.... I want to gain a clear understanding of the concepts involved...
My main concerns are things like deadlocks and such.
Say I have a collection of objects defined like so...
vector<MyObjects> m_objects;
Imagine it being populated with between 100-500 objects.
Now imagine that each of these objects needs to have the ability to communicate with all the other objects at some point. In english, they need to be able to both read and write to all the other objects safely...
I know that in order to write to an object it needs to be locked... But can one object read from another object safely without extra functionality? If so, can an object read from a locked object that is being written to? (My first guess to that last question is no, as that would not make sense)
If anyone has some easily understandable articles/reads on the subject I would love to dive into that for a bit...

You have to lock shared resources within an object. Each objects should keep its on state within them, if that states are not part of shared resources of system you don't need to lock them.

The safest solution would be to lock the whole vector, so only one thread at a time can modify the vector and the objects it contains. However, the question is, if there would still be a point in multithreading then...
Trying to protect every object in the vector with its own lock could easily lead to circular dependencies and is potentially dangerous.
Imagine following scenario, using your Boxer metaphor from the comments above:
Boxer1 tries to hit Boxer2. If the impact of the punch is dependent on Boxer1's current health level, you would have to lock Boxer1 first, because you don't want anyone to change his health while performing the punch operation. At the same time, you would have to lock Boxer2, because you don't want anyone to increase his health during Boxer1's punch (maybe Boxer1 would have knocked Boxer2 out...).
Now if Boxer2 tries to hit Boxer2 simultaneously tries to hit Boxer1, you would also lock Boxer2 first and then Boxer1.
So if both of the threads performing operations on your boxers get to the part where they lock their own boxer at the same time, they would be waiting forever to lock the other boxer and you would have a deadlock.
To prevent such deadlocks, you would have to work out some kind of locking hierarchy.

Thread-specific data - why can't I just use a static map with thread IDs?

While reading up on POSIX threading, I came across an example of thread-specific-data. I did have one area of confusion in my mind...
The thread-specific-data interface looks a little clunky, especially once you mix in having to use pthread_once, the various initializers, etc.
Is there any reason I can't just use a static std::map where the key is the pthread_self() id and the data value is held in the second part of the std::pair?
I can't think of a reason that this wouldn't work as long as it was wrapped in a mutex, but I see no suggestion of it or anything similar which confuses me given it sounds much easier than the provided API. I know threading can have alot of catch-22's so I thought I'd ask and see if I was about to step in... something unpleasant? :)

I can't think of a reason that this wouldn't work as long as it was wrapped in a mutex
That in itself is a very good reason; implemented properly, you can access your thread-specific data without preventing other threads from simultaneously creating or accessing theirs.
There's also general efficiency (constant time access, versus logarithmic time if you use std::map), no guarantee that pthread_t has a suitable ordering defined, and automatic cleanup along with all the other thread resources.
You could use C++11's thread_local keyword, or boost::thread_specific_ptr, if you don't like the Posix API.

pthread thread-specific-data existed before the standard library containers
thread-specific-data avoids the need of locking and makes sure no other thread messes with the data
The data is cleaned up automatically when the thread disappears
Having said that, nothing stops you from using your own solution. If you can be sure that the container is completely constructed before any threads are running (static threading model), you don't even need the mutex.

Scenario: Global variables in DLL which is used by Multi-threaded Application

Few months back, I had come across this interesting scenario asked by a guy (on orkut). Though, I've come up with a "non-portable" solution to this problem (have tested it with small code), but still would like to know what you guys have to say and suggest.
Suppose, I created a DLL, exporting some functionalities, written in C++, for single threaded client. This DLL declares lots of global variables, some maybe const variables (read-only) and others are modifiable.
Anyway, later things changed and now I want the same DLL to work with multi-threaded application (without modifying the DLL); that means, several threads access the functions and global variables from the DLL, and modify them.. and so on. All these may cause global variables to hold inconsistent values.
So the question is,
Can we do something in the client code to prevent multi-access of the DLL, and at the same time, ensuring that each thread runs in it's own context (meaning, when it gets access to the DLL, the DLL's global values are same as it was before)?

Sure, you can always create a wrapper-layer handling multi-threading specific tasks such as locking. You could even do so in a second DLL that links with the original one, and then have the final project link with that new DLL.
Be aware that no matter how you implement it, this won't be an easy task. You have to know exactly which thread is able to modify which value at what time, who is able to read what and when etc. unless you want to run into problems like deadlocks or race conditions.
If you're solution allows it, it's often best to assign a single thread to modify any data, and have all others just read and never write, as concurrent reading access is always easier to implement than concurrent writing access (Boost provides all basic functionality to do so, for example shared_mutex).

Can we do something in the client code to prevent multi-access of the DLL, and at the same time, ensuring that each thread runs in it's own context (meaning, when it gets access to the DLL, the DLL's global values are same as it was before)?
This is the hard part. I think the only way top do this would be to create a wrapper around teh existing DLL. When it is called, it would restore the state (global variables) for the current thread, and save them when the call to the DLL returns. You would need to know all of the state variables in the DLL, and be able to read/write them.
If performance is not an issue, a single lock for the entire DLL would suffice, and be the easiest to implement correctly. That would ensure that only one thread was accessing (reading or writing) the DLL at one time.

Boost: what exactly is not threadsafe in Boost.Signals?

I read at multiple places that Boost.Signals is not threadsafe but I haven't found much more details about it. This simple quote doesn't say really that much. Most applications nowadays have threads - even if they try to be single threaded, some of their libraries may use threads (for example libsdl).
I guess the implementation doesn't have problems with other threads not accessing the slot. So it is at least threadsafe in this sense.
But what exactly works and what would not work? Would it work to use it from multiple threads as long as I don't ever access it at the same time? I.e. if I build my own mutexes around the slot?
Or am I forced to use the slot only in that thread where I created it? Or where I used it for the first time?

I don't think it's too clear either, and one of the library reviewers said here:
I also don't liked the fact that only three times the word 'thread' was named.
Boost.signals2 wants to be a 'thread safe signals' library. Therefore some more
details and especially more examples concerning on that area should be given to
the user.
One way of figuring it out is to go to the source and see what they're using _mutex / lock() to protect. Then just imagine what would happen if those calls weren't there. :)
From what I can gather, it's ensuring simple things like "if one thread is doing connects or disconnects, that won't cause a different thread which is iterating through the slots attached to those signals to crash". Kind of like how using a thread-safe version of the C runtime library assures that if two threads make valid calls to printf at the same time then there won't be a crash. (Not to say the output you'll get will make any sense—you're still responsible for the higher order semantics.)
It doesn't seem to be like Qt, in which the thread a certain slot's code gets run on is based on the target slot's "thread affinity" (which means emitting a signal can trigger slots on many different threads to run in parallel.) But I guess not supporting that is why the boost::signal "combiners" can do things like this.

One problem I see is that one thread can connect or disconnect while another thread is signalling.
You can easily wrap your signal and connect calls with mutexes. However, it is non-trivial to wrap the connections. (connect returns connections which you can use to disconnect).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js