From what I understand, std::thread::swap swaps the thread ids of two threads. It doesn't change thread local storage or affect execution of either of the threads in any way. What is the purpose behind it? How/where/why is it used?
Don't think of it as "swapping the thread IDs of two threads", think of it as swapping two thread variables:
std::thread a(...); // starts thread 1
std::thread b(...); // starts thread 2
a.swap(b); // or std::swap(a, b);
// now variable a points to thread 2
// and variable b points to thread 1
Most variables can be swapped with std::swap, even the ones that can't be copied. I'm not sure why they also created a swap member function for std::thread.
This is useful because you can, for example, sort a vector of threads. The sort algorithm will use swap to rearrange the threads in the vector. Remember that objects can't be moved in C++, only their contents. swap, and move semantics, allow you to move the contents of most objects into different objects. For thread, it moves the "pointer" to the thread.
std::thread is internally referential type - similar to the way for example std::vector is. Internally, std::vector too simply refers to a memory buffer stored in dynamic memory.
What you describe is how swapping works with such types. Swapping a vector merely swaps pointers to the buffers (as well as some book keeping information) - none of the stored objects are affected by the operation. These pointers are "handles" to a resource. Similarly, swapping threads swaps the internal thread id - which is a handle to the thread resource - without affecting the state of the referred threads.
As for purpose of swapping... It's simply a useful thing to do. It is most notably used in sorting and other re-ordering algorithms. I don't know of a particular reason to re-order thread objects, but neither do I know of a reason why that shouldn't be possible.
Related
I need to spawn 4 threads, that basically do the same thing but with a different variable each. So I call ::CreateThread 4 times, giving the same threadProc and 'this' as a parameter. Now in threadProc, I need to pick the right variable to work with. I have a vector of objects, and I push into it the object immediately after each of the CreateThread call.
// at this point myVec has say, 2 items
HANDLE hThread = ::CreateThread( NULL, NULL, threadProc, (LPVOID)this, NULL, NULL );
myVecObj.threadHandle = hThread;
myVec.push_back(myVecObj); // myVec.Size = 3 now
DWORD CALLBACK myClass::threadProc(LPVOID lpContext)
{
myClass *pMyClass = (myClass *)lpContext;
int vecCount = pMyClass->myVec.size; // Is this 3??
char * whatINeed = (char*)pMyClass->myVec[vecCount-1].whatINeed;
}
My doubt/question is how fast does the threadProc fire - could it beat the call to myVec.push_back()? Is this a race condition that I'm introducing here? I'm trying to make the assumption that when each threadProc starts (they start at different times, not one after the other), I can safely take the last object in the class' vector.
I need to spawn 4 threads, that basically do the same thing but with a different variable each. So I call ::CreateThread 4 times, giving the same threadProc and this as a parameter.
Now in threadProc, I need to pick the right variable to work with.
Why not pass the thread a pointer to the actual object it needs to act on?
I have a vector of objects, and I push into it the object immediately after each of the CreateThread call.
That is the wrong way to handle this. And yes, that is a race condition. Not only for the obvious reason - the thread might start running before the object is pushed - but also because any push into a vector can potentially reallocate the internal array of the vector, which would be very bad for threads that have already obtained a pointer to their data inside the vector. The data would move around in memory behind the thread's back.
To solve this, you need to either:
push all of the objects into the vector first, then start your threads. You can pass a pointer to each vector element to its respective thread. But this works only if you don't modify the vector anymore while any thread is running, for the reason stated above.
start the threads in a suspended state first, and then resume them after you have pushed all the objects into the vector. This also requires that you don't modify the vector anymore. It also means you will have to pass each thread an index to a vector element, rather than passing it a pointer to the element.
get rid of the vector altogether (or at least change it to hold object pointers instead of actual objects). Dynamically allocate your objects using new and pass those pointers to each thread (and optionally to the vector) as needed. Let each thread delete its object before exiting (and optionally remove it from the vector, with proper synchronizing).
My doubt/question is how fast does the threadProc fire
That is entirely up to the OS scheduler to decide.
could it beat the call to myVec.push_back()?
Yes, that is a possibility.
Is this a race condition that I'm introducing here?
Yes.
I'm trying to make the assumption
Don't make assumptions!
that when each threadProc starts (they start at different times, not one after the other), I can safely take the last object in the class' vector.
That is not a safe assumption to make.
There is no synchronisation between the modification of myVec, i.e., the myVec.push_back() call, and reading the size of the object in another thread. I do realise that you don't use standard threads but applying the C++11 rules there is a data race and the program has undefined behaviour.
Note that the data race isn't just theoretical: there is a fair chance that you see the modification happen after the read. Creating a thread may not be fast but some implementations actually don't create OS level threads but rather keep a pool of threads around which are used when apparently spawning a new thread.
In similar contexts I heard the excellent argument "... but it only happens once in a million times!". This particular issue would have happened on the 48 core machine about 10 times per second, assuming the estimate "once in a million" were correct.
Assume that I have code like:
void InitializeComplexClass(ComplexClass* c);
class Foo {
public:
Foo() {
i = 0;
InitializeComplexClass(&c);
}
private:
ComplexClass c;
int i;
};
If I now do something like Foo f; and hand a pointer to f over to another thread, what guarantees do I have that any stores done by InitializeComplexClass() will be visible to the CPU executing the other thread that accesses f? What about the store writing zero into i? Would I have to add a mutex to the class, take a writer lock on it in the constructor and take corresponding reader locks in any methods that accesses the member?
Update: Assume I hand a pointer over to a bunch of other threads once the constructor has returned. I'm not assuming that the code is running on x86, but could be instead running on something like PowerPC, which has a lot of freedom to do memory reordering. I'm essentially interested in what sorts of memory barriers the compiler has to inject into the code when the constructor returns.
In order for the other thread to be able to know about your new object, you have to hand over the object / signal other thread somehow. For signaling a thread you write to memory. Both x86 and x64 perform all memory writes in order, CPU does not reorder these operations with regards to each other. This is called "Total Store Ordering", so CPU write queue works like "first in first out".
Given that you create an object first and then pass it on to another thread, these changes to memory data will also occur in order and the other thread will always see them in the same order. By the time the other thread learns about the new object, the contents of this object was guaranteed to be available for that thread even earlier (if the thread only somehow knew where to look).
In conclusion, you do not have to synchronise anything this time. Handing over the object after it has been initialised is all the synchronisation you need.
Update: On non-TSO architectures you do not have this TSO guarantee. So you need to synchronise. Use MemoryBarrier() macro (or any interlocked operation), or some synchronisation API. Signalling the other thread by corresponding API causes also synchronisation, otherwise it would not be synchronisation API.
x86 and x64 CPU may reorder writes past reads, but that is not relevant here. Just for better understanding - writes can be ordered after reads since writes to memory go through a write queue and flushing that queue may take some time. On the other hand, read cache is always consistent with latest updates from other processors (that have went through their own write queue).
This topic has been made so unbelievably confusing for so many, but in the end there is only a couple of things a x86-x64 programmer has to be worried about:
- First, is the existence of write queue (and one should not at all be worried about read cache!).
- Secondly, concurrent writing and reading in different threads to same variable in case of non-atomic variable length, which may cause data tearing, and for which case you would need synchronisation mechanisms.
- And finally, concurrent updates to same variable from multiple threads, for which we have interlocked operations, or again synchronisation mechanisms.)
If you do :
Foo f;
// HERE: InitializeComplexClass() and "i" member init are guaranteed to be completed
passToOtherThread(&f);
/* From this point, you cannot guarantee the state/members
of 'f' since another thread can modify it */
If you're passing an instance pointer to another thread, you need to implement guards in order for both threads to interact with the same instance. If you ONLY plan to use the instance on the other thread, you do not need to implement guards. However, do not pass a stack pointer like in your example, pass a new instance like this:
passToOtherThread(new Foo());
And make sure to delete it when you are done with it.
Qt containers are safe as read-only when used by multiple threads. But what about write access? Can I resize a QVector and use operator[] in different threads for writing? The size of the QVector will stay constant, each thread will write in different memory location (own index for each thread), so no same memory simultaneous access. The QVector will be a global variable.
The Qt documentation defines QVector's member functions as reentrant, which means that it's safe to call its methods from threads if you have different instances of QVector for each thread. This means that QVector isn't going to be thread safe the way you are intending to use it.
If you can guarantee that your writes to your QVector won't alter its length and won't overlap, you may find that you won't have problems. But if you know you're going to be writing to different areas of your vector, why not split the vector into subvectors and work on each subvector with a thread? This will allow you to make a guarantee that you won't have thread-related trouble. When your work is done, you can replace the vector as a single entity.
When I'm iterating through an std::map, is there a possibility that by for example adding an element to the map in another thread, the objects in it will be removed causing the iteration to be corrupt? (As the iterator will be pointing to a non-existing variable as it's moved)
In theory when you add an element to an std::map, all the iterators in that map should stay valid. But the problem is that the operations are not atomic. If the OS suspends the inserting thread in the middle of the operation and gives control back to the iterating thread, the state of std::map might be invalid.
You need to synchronize access to the map via mutex or something similar. Alternatively you could use concurrency friendly collection from TBB or another similar library. TBB provides concurrent_unordered_map and concurrent_hash_map.
STL containers aren't thread safe. No guarantees at all. So you need to synchronize access to any standard container if they are used by different threads.
Yes--if another thread may be modifying the vector, you'll need to use something like a mutex to assure that only one thread has access to the vector at any given time.
With a map, the effects of a modification are much more limited -- rather than potentially moving the entire contents of a vector, a modification only affects an individual node in the map. Nonetheless, if one thread deletes a node just as another thread is trying to read that node, bad things will happen, so you still need a mutex to assure that only one thread is operating on the map at any given time.
I wrote a threaded Renderer for SFML which takes pointers to drawable objects and stores them in a vector to be draw each frame. Starting out adding objects to the vector and removing objects to the vector would frequently cause Segmentation faults (SIGSEGV). To try and combat this, I would add objects that needed to be removed/added to a queue to be removed later (before drawing the frame). This seemed to fix it, but lately I have noticed that if I add many objects at one time (or add/remove them fast enough) I will get the same SIGSEGV.
Should I be using locks when I add/remove from the vector?
You need to understand the thread-safety guarantees the C++ standard (and implementations of C++2003 for possibly concurrent systems) give. The standard containers are a thread-safe in the following sense:
It is OK to have multiple concurrent threads reading the same container.
If there is one thread modifying a container there shall be no concurrent threads reading or writing the same container.
Different containers are independent of each other.
Many people misunderstand thread-safety of container to mean that these rules are imposed by the container implementation: they are not! It is your responsibility to obey these rules.
The reason these aren't, and actually can't, be imposed by the containers is that they don't have an interface suitable for this. Consider for example the following trivial piece of code:
if (!c.empty() {
auto value = c.back();
// do something with the read value
}
The container can control the access to the calls to empty() and back(). However, between these calls it necessarily needs to release any sort of synchronization facilities, i.e. by the time the thread tries to read c.back() the container may be empty again! There are essentially two ways to deal with this problem:
You need to use external locking if there is possibility that a concurrent thread may be changing the container to span the entire range of accesses which are interdependent in some form.
You change the interface of the containers to become monitors. However, the container interface isn't at all suitable to be changed in this direction because monitors essentially only support "fire and forget" style of interfaces.
Both strategies have their advantages and the standard library containers are clearly supporting the first style, i.e. they require external locking when using concurrently with a potential of at least one thread modifying the container. They don't require any kind of locking (neither internal or external) if there is ever only one thread using them in the first place. This is actually the scenario they were designed for. The thread-safety guarantees given for them are in place to guarantee that there are no internal facilities used which are not thread-safe, say one per-object iterator object or a memory allocation facility shared by multiple threads without being thread-safe, etc.
To answer the original question: yes, you need to use external synchronization, e.g. in the form of mutex locks, if you modify the container in one thread and read it in another thread.
Should I be using locks when I add/remove from the vector?
Yes. If you're using the vector from two threads at the same time and you reallocate, then the backing allocation may be swapped out and freed behind the other thread's feet. The other thread would be reading/writing to freed memory, or memory in use for another unrelated allocation.