Meaning of "slice" in the context of mutual exclusion problems - concurrency

What does "slice" mean in the following statement:
p0 tests lock (now, slice before actually setting lock)
The author is trying to show mutex is not met for this program.
It's from http://www.mcs.csueastbay.edu/~billard/os/mutex.txt
Thanks.

From context, I believe it means to context switch (i.e. that there is a timeslice boundary at the current point). Thus,
p0 tests lock (now, slice before actually setting lock)
p1 tests lock (this makes p1 think it is still available)
means
p0 tests lock
timeslice ends -- context switch to p1
p1 tests lock
...

Related

C++: std::memory_order in std::atomic_flag::test_and_set to do some work only once by a set of threads

Could you please help me to understand what std::memory_order should be used in std::atomic_flag::test_and_set to do some work only once by a set of threads and why? The work should be done by whatever thread gets to it first, and all other threads should just check as quickly as possible that someone is already going the work and continue working on other tasks.
In my tests of the example below, any memory order works, but I think that it is just a coincidence. I suspect that Release-Acquire ordering is what I need, but, in my case, only one memory_order can be used in both threads (it is not the case that one thread can use memory_order_release and the other can use memory_order_acquire since I do not know which thread will arrive to doing the work first).
#include <atomic>
#include <iostream>
#include <thread>
std::atomic_flag done = ATOMIC_FLAG_INIT;
const std::memory_order order = std::memory_order_seq_cst;
//const std::memory_order order = std::memory_order_acquire;
//const std::memory_order order = std::memory_order_relaxed;
void do_some_work_that_needs_to_be_done_only_once(void)
{ std::cout<<"Hello, my friend\n"; }
void run(void)
{
if(not done.test_and_set(order))
do_some_work_that_needs_to_be_done_only_once();
}
int main(void)
{
std::thread a(run);
std::thread b(run);
a.join();
b.join();
// expected result:
// * only one thread said hello
// * all threads spent as little time as possible to check if any
// other thread said hello yet
return 0;
}
Thank you very much for your help!
Following up on some things in the comments:
As has been discussed, there is a well-defined modification order M for done on any given run of the program. Every thread does one store to done, which means one entry in M. And by the nature of atomic read-modify-writes, the value returned by each thread's test_and_set is the value that immediately precedes its own store in the order M. That's promised in C++20 atomics.order p10, which is the critical clause for understanding atomic RMW in the C++ memory model.
Now there are a finite number of threads, each corresponding to one entry in M, which is a total order. Necessarily there is one such entry that precedes all the others. Call it m1. The test_and_set whose store is entry m1 in M must return the preceding value in M. That can only be the value 0 which initialized done. So the thread corresponding to m1 will see test_and_set return 0. Every other thread will see it return 1, because each of their modifications m2, ..., mN follows (in M) another modification, which must have been a test_and_set storing the value 1.
We may not be bothering to observe all of the total order M, but this program does determine which of its entries is first on this particular run. It's the unique one whose test_and_set returns 0. A thread that sees its test_and_set return 1 won't know whether it came 2nd or 8th or 96th in that order, but it does know that it wasn't first, and that's all that matters here.
Another way to think about it: suppose it were possible for two threads (tA, tB) both to load the value 0. Well, each one makes an entry in the modification order; call them mA and mB. M is a total order so one has to go before the other. And bearing in mind the all-important [atomics.order p10], you will quickly find there is no legal way for you to fill out the rest of M.
All of this is promised by the standard without any reference to memory ordering, so it works even with std::memory_order_relaxed. The only effect of relaxed memory ordering is that we can't say much about how our load/store will become visible with respect to operations on other variables. That's irrelevant to the program at hand; it doesn't even have any other variables.
In the actual implementation, this means that an atomic RMW really has to exclusively own the variable for the duration of the operation. We must ensure that no other thread does a store to that variable, nor the load half of a read-modify-write, during that period. In a MESI-like coherent cache, this is done by temporarily locking the cache line in the E state; if the system makes it possible for us to lose that lock (like an LL/SC architecture), abort and start again.
As to your comment about "a thread reading false from its own cache/buffer": the implementation mustn't allow that in an atomic RMW, not even with relaxed ordering. When you do an atomic RMW, you must read it while you hold the lock, and use that value in the RMW operation. You can't use some old value that happens to be in a buffer somewhere. Likewise, you have to complete the write while you still hold the lock; you can't stash it in a buffer and let it complete later.
relaxed is fine if you just need to determine the winner of the race to set the flag1, so one thread can start on the work and later threads can just continue on.
If the run_once work produces data that other threads need to be able to read, you'll need a release store after that, to let potential readers know that the work is finished, not just started. If it was instead just something like printing or writing to a file, and other threads don't care when that finishes, then yeah you have no ordering requirements between threads beyond the modification order of done which exists even with relaxed. An atomic RMW like test_and_set lets you determines which thread's modification was first.
BTW, you should check read-only before even trying to test-and-set; unless run() is only called very infrequently, like once per thread startup. For something like a static int foo = non_constant; local var, compilers use a guard variable that's loaded (with an acquire load) to see if init is already complete. If it's not, branch to code that uses an atomic RMW to modify the guard variable, with one thread winning, the rest effectively waiting on a mutex for that thread to init.
You might want something like that if you have data that all threads should read. Or just use a static int foo = something_to_run_once(), or some type other than int, if you actually have some data to init.
Or perhaps use C++11 std::call_once to solve this problem for you.
On normal systems, atomic_flag has no advantage over and atomic_bool. done.exchange(true) on a bool is equivalent to test_and_set of a flag. But atomic_bool is more flexible in terms of the operations it supports, like plain read that isn't part of an RMW test-and-set.
C++20 does add a test() method for atomic_flag. ISO C++ guarantees that atomic_flag is lock-free, but in practice so is std::atomic<bool> on all real-world systems.
Footnote 1: why relaxed guarantees a single winner
The memory_order parameter only governs ordering wrt. operations on other variables by the same thread.
Does calling test_and_set by a thread force somehow synchronization of the flag with values written by other threads?
It's not a pure write, it's an atomic read-modify-write, so the result of the one that went first is guaranteed to be visible to the one that happens to be second. That's the whole point of test-and-set as a primitive building block for mutual exclusion.
If two TAS operations could both load the original value (false), and then both store true, they would be atomic. They'd have overlapped with each other.
Two atomic RMWs on the same atomic object must happen in some order, the modification-order of that object. (Because they're not read-only: an RMW includes a modification. But also includes a read so you can see what the value was immediately before the new value; that read is tied to the modification order, unlike a plain read).
Every atomic object separately has a modification-order that all threads can agree on; this is guaranteed by ISO C++. (With orders less than seq_cst, ordering between objects can be different from source order, and not guaranteed that all threads even agree which store happened first, the IRIW problem.)
Being an atomic RMW guarantees that exactly one test_and_set will return false in thread A or B. Same for fetch_add with multiple threads incrementing a counter: the increments have to happen in some order (i.e. serialized with each other), and whatever that order is becomes the modification-order of that atomic object.
Atomic RMWs have to work this way to not lose counts. i.e. to actually be atomic.

Acquire/release semantics with 4 threads

I am currently reading C++ Concurrency in Action by Anthony Williams. One of his listing shows this code, and he states that the assertion that z != 0 can fire.
#include <atomic>
#include <thread>
#include <assert.h>
std::atomic<bool> x,y;
std::atomic<int> z;
void write_x()
{
x.store(true,std::memory_order_release);
}
void write_y()
{
y.store(true,std::memory_order_release);
}
void read_x_then_y()
{
while(!x.load(std::memory_order_acquire));
if(y.load(std::memory_order_acquire))
++z;
}
void read_y_then_x()
{
while(!y.load(std::memory_order_acquire));
if(x.load(std::memory_order_acquire))
++z;
}
int main()
{
x=false;
y=false;
z=0;
std::thread a(write_x);
std::thread b(write_y);
std::thread c(read_x_then_y);
std::thread d(read_y_then_x);
a.join();
b.join();
c.join();
d.join();
assert(z.load()!=0);
}
So the different execution paths, that I can think of is this:
1)
Thread a (x is now true)
Thread c (fails to increment z)
Thread b (y is now true)
Thread d (increments z) assertion cannot fire
2)
Thread b (y is now true)
Thread d (fails to increment z)
Thread a (x is now true)
Thread c (increments z) assertion cannot fire
3)
Thread a (x is true)
Thread b (y is true)
Thread c (z is incremented) assertion cannot fire
Thread d (z is incremented)
Could someone explain to me how this assertion can fire?
He shows this little graphic:
Shouldn't the store to y also sync with the load in read_x_then_y, and the store to x sync with the load in read_y_then_x? I'm very confused.
EDIT:
Thank you for your responses, I understand how atomics work and how to use Acquire/Release. I just don't get this specific example. I was trying to figure out IF the assertion fires, then what did each thread do? And why does the assertion never fire if we use sequential consistency.
The way, I am reasoning about this is that if thread a (write_x) stores to x then all the work it has done so far is synced with any other thread that reads x with acquire ordering. Once read_x_then_y sees this, it breaks out of the loop and reads y. Now, 2 things could happen. In one option, the write_y has written to y, meaning this release will sync with the if statement (load) meaning z is incremented and assertion cannot fire. The other option is if write_y hasn't run yet, meaning the if condition fails and z isn't incremented, In this scenario, only x is true and y is still false. Once write_y runs, the read_y_then_x breaks out of its loop, however both x and y are true and z is incremented and the assertion does not fire. I can't think of any 'run' or memory ordering where z is never incremented. Can someone explain where my reasoning is flawed?
Also, I know The loop read will always be before the if statement read because the acquire prevents this reordering.
You are thinking in terms of sequential consistency, the strongest (and default) memory order. If this memory order is used, all accesses to atomic variables constitute a total order, and the assertion indeed cannot be triggered.
However, in this program, a weaker memory order is used (release stores and acquire loads). This means, by definition that you cannot assume a total order of operations. In particular, you cannot assume that changes become visible to other threads in the same order. (Only a total order on each individual variable is guaranteed for any atomic memory order, including memory_order_relaxed.)
The stores to x and y occur on different threads, with no synchronization between them. The loads of x and y occur on different threads, with no synchronization between them. This means it is entirely allowed that thread c sees x && ! y and thread d sees y && ! x. (I'm just abbreviating the acquire-loads here, don't take this syntax to mean sequentially consistent loads.)
Bottom line: Once you use a weaker memory order than sequentially consistent, you can kiss your notion of a global state of all atomics, that is consistent between all threads, goodbye. Which is exactly why so many people recommend sticking with sequential consistency unless you need the performance (BTW, remember to measure if it's even faster!) and are certain of what you are doing. Also, get a second opinion.
Now, whether you will get burned by this, is a different question. The standard simply allows a scenario where the assertion fails, based on the abstract machine that is used to describe the standard requirements. However, your compiler and/or CPU may not exploit this allowance for one reason or another. So it is possible that for a given compiler and CPU, you may never see that the assertion is triggered, in practice. Keep in mind that a compiler or CPU may always use a stricter memory order than the one you asked for, because this can never introduce violations of the minimum requirements from the standard. It may only cost you some performance – but that is not covered by the standard anyway.
UPDATE in response to comment: The standard defines no hard upper limit on how long it takes for one thread to see changes to an atomic by another thread. There is a recommendation to implementers that values should become visible eventually.
There are sequencing guarantees, but the ones pertinent to your example do not prevent the assertion from firing. The basic acquire-release guarantee is that if:
Thread e performs a release-store to an atomic variable x
Thread f performs an acquire-load from the same atomic variable
Then if the value read by f is the one that was stored by e, the store in e synchronizes-with the load in f. This means that any (atomic and non-atomic) store in e that was, in this thread, sequenced before the given store to x, is visible to any operation in f that is, in this thread, sequenced after the given load. [Note that there are no guarantees given regarding threads other than these two!]
So, there is no guarantee that f will read the value stored by e, as opposed to e.g. some older value of x. If it doesn't read the updated value, then also the load does not synchronize with the store, and there are no sequencing guarantees for any of the dependent operations mentioned above.
I liken atomics with lesser memory order than sequentially consistent to the Theory of Relativity, where there is no global notion of simultaneousness.
PS: That said, an atomic load cannot just read an arbitrary older value. For example, if one thread performs periodic increments (e.g. with release order) of an atomic<unsigned> variable, initialized to 0, and another thread periodically loads from this variable (e.g. with acquire order), then, except for eventual wrapping, the values seen by the latter thread must be monotonically increasing. But this follows from the given sequencing rules: Once the latter thread reads a 5, anything that happened before the increment from 4 to 5 is in the relative past of anything that follows the read of 5. In fact, a decrease other than wrapping is not even allowed for memory_order_relaxed, but this memory order does not make any promises to the relative sequencing (if any) of accesses to other variables.
The release-acquire synchronization has (at least) this guarantee: side-effects before a release on a memory location are visible after an acquire on this memory location.
There is no such guarantee if the memory location is not the same. More importantly, there's no total (think global) ordering guarantee.
Looking at the example, thread A makes thread C come out of its loop, and thread B makes thread D come out of its loop.
However, the way a release may "publish" to an acquire (or the way an acquire may "observe" a release) on the same memory location doesn't require total ordering. It's possible for thread C to observe A's release and thread D to observe B's release, and only somewhere in the future for C to observe B's release and for D to observe A's release.
The example has 4 threads because that's the minimum example you can force such non-intuitive behavior. If any of the atomic operations were done in the same thread, there would be an ordering you couldn't violate.
For instance, if write_x and write_y happened on the same thread, it would require that whatever thread observed a change in y would have to observe a change in x.
Similarly, if read_x_then_y and read_y_then_x happened on the same thread, you would observe both changed in x and y at least in read_y_then_x.
Having write_x and read_x_then_y in the same thread would be pointless for the exercise, as it would become obvious it's not synchronizing correctly, as would be having write_x and read_y_then_x, which would always read the latest x.
EDIT:
The way, I am reasoning about this is that if thread a (write_x) stores to x then all the work it has done so far is synced with any other thread that reads x with acquire ordering.
(...) I can't think of any 'run' or memory ordering where z is never incremented. Can someone explain where my reasoning is flawed?
Also, I know The loop read will always be before the if statement read because the acquire prevents this reordering.
That's sequentially consistent order, which imposes a total order. That is, it imposes that write_x and write_y both be visible to all threads one after the other; either x then y or y then x, but the same order for all threads.
With release-acquire, there is no total order. The effects of a release are only guaranteed to be visible to a corresponding acquire on the same memory location. With release-acquire, the effects of write_x are guaranteed to be visible to whoever notices x has changed.
This noticing something changed is very important. If you don't notice a change, you're not synchronizing. As such, thread C is not synchronizing on y and thread D is not synchronizing on x.
Essentially, it's way easier to think of release-acquire as a change notification system that only works if you synchronize properly. If you don't synchronize, you may or may not observe side-effects.
Strong memory model hardware architectures with cache coherence even in NUMA, or languages/frameworks that synchronize in terms of total order, make it difficult to think in these terms, because it's practically impossible to observe this effect.
Let's walk through the parallel code:
void write_x()
{
x.store(true,std::memory_order_release);
}
void write_y()
{
y.store(true,std::memory_order_release);
}
There is nothing before these instructions (they are at start of parallelism, everything that happened before also happened before other threads) so they are not meaningfully releasing: they are effectively relaxed operations.
Let's walk through the parallel code again, nothing that these two previous operations are not effective releases:
void read_x_then_y()
{
while(!x.load(std::memory_order_acquire)); // acquire what state?
if(y.load(std::memory_order_acquire))
++z;
}
void read_y_then_x()
{
while(!y.load(std::memory_order_acquire));
if(x.load(std::memory_order_acquire))
++z;
}
Note that all the loads refer to variables for which nothing is effectively released ever, so nothing is effectively acquired here: we re-acquire the visibility over the previous operations in main that are visible already.
So you see that all operations are effectively relaxed: they provide no visibility (over what was already visible). It's like doing an acquire fence just after an acquire fence, it's redundant. Nothing new is implied that wasn't already implied.
So now that everything is relaxed, all bets are off.
Another way to view that is to notice that an atomic load is not a RMW operations that leaves the value unchanged, as a RMW can be release and a load cannot.
Just like all atomic stores are part of the modification order of an atomic variable even if the variable is an effective a constant (that is a non const variable whose value is always the same), an atomic RMW operation is somewhere in the modification order of an atomic variable, even if there was no change of value (and there cannot be a change of value because the code always compares and copies the exact same bit pattern).
In the modification order you can have release semantic (even if there was no modification).
If you protect a variable with a mutex you get release semantic (even if you just read the variable).
If you make all your loads (at least in functions that do more than once operation) release-modification-loads with:
either a mutex protecting the atomic object (then drop the atomic as it's now redundant!)
or a RMW with acq_rel order,
the previous proof that all operations are effectively relaxed doesn't work anymore and some atomic operation in at least one of the read_A_then_B functions will have to be ordered before some operation in the other, as they operate on the same objects. If they are in the modification order of a variable and you use acq_rel, then you have an happen before relation between one of these (obviously which one happens before which one is non deterministic).
Either way execution is now sequential, as all operations are effectively acquire and release, that is as operative acquire and release (even those that are effectively relaxed!).
If we change two if statements to while statements, it will make the code correct and z will be guaranteed to be equal to 2.
void read_x_then_y()
{
while(!x.load(std::memory_order_acquire));
while(!y.load(std::memory_order_acquire));
++z;
}
void read_y_then_x()
{
while(!y.load(std::memory_order_acquire));
while(!x.load(std::memory_order_acquire));
++z;
}

Why do condition variables sometimes erroneously wake up?

I've known for eons that the way you use a condition variable is
lock
while not task_done
wait on condition variable
unlock
Because sometimes condition variables will spontaneously wake. But I've never understood why that's the case. In the past I've read it's expensive to make a condition variable that doesn't have that behavior, but nothing more than that.
So... why do you need to worry about falsely being woken up when waiting on a condition variable?
It isn't that the condition variable will erroneously wake up; the condition variable will only wake up if it has been signalled from another thread. However, it is possible that by the time the thread has been re-scheduled for execution, some other thread has already managed to nab the resource on which you were waiting, and so it is necessary to double-check. For example, if a group of threads x,y,z are waiting on some resource R that w was previously holding, and x,y,z,w communicate through a condition variable... suppose w is done with R and signals x,y,z. So, x,y, and z will all be taken off of the wait queue and placed in the runqueue to be scheduled for execution. Suppose x is scheduled first... so then it acquires R, and then it might be put to sleep, and then y might be scheduled, and so when y is running, the resource R on which y was previously waiting is still not available, so it is necessary for y to go to sleep again. Then z wakes up, and z also finds that R is still in use, so z needs to go back to sleep again, etc.
If you have exactly two threads, and the condition variable is shared between just the two of them, there are sometimes situations where it is ok to not perform that check. However, if you want to make your application dynamic and capable of scaling up to an arbitrary number of threads, then it's good to be in the habit (not to mention much simpler and less worrisome) to do that extra check as it is required in most situations.
Threads can wake up without being signaled. This is called a spurious wakeup. However, just precisely why they occur is a question that seems to be mired in superstition and uncertainty. Reasons I have seen include being a side effect of the way threading implementations work, or being intentionally added to force programmers to properly use loops instead of conditionals around wait.

Design options for a C++ thread-safe object cache

I'm in the process of writing a template library for data-caching in C++ where concurrent read can be done and concurrent write too, but not for the same key. The pattern can be explained with the following environment:
A mutex for the cache write.
A mutex for each key in the cache.
This way if a thread requests a key from the cache and is not present can start a locked calculation for that unique key. In the meantime other threads can retrieve or calculate data for other keys but a thread that tries to access the first key get locked-wait.
The main constraints are:
Never calculate the value for a key at the same time.
Calculating the value for 2 different keys can be done concurrently.
Data-retrieval must not lock other threads from retrieve data from other keys.
My other constraints but already resolved are:
fixed (known at compile time) maximum cache size with MRU-based ( most recently used ) thrashing.
retrieval by reference ( implicate mutexed shared counting )
I'm not sure using 1 mutex for each key is the right way to implement this but i didn't find any other substantially different way.
Do you know of other patterns to implements this or do you find this a suitable solution? I don't like the idea of having about 100 mutexs. ( the cache size is around 100 keys )
You want to lock and you want to wait. Thus there shall be "conditions" somewhere (as pthread_cond_t on Unix-like systems).
I suggest the following:
There is a global mutex which is used only to add or remove keys in the map.
The map maps keys to values, where values are wrappers. Each wrapper contains a condition and potentially a value. The condition is signaled when the value is set.
When a thread wishes to obtain a value from the cache, it first acquires the global mutex. It then looks in the map:
If there is a wrapper for that key, and that wrapper contains a value, then the thread has its value and may release the global mutex.
If there is a wrapper for that key but no value yet, then this means that some other thread is currently busy computing the value. The thread then blocks on the condition, to be awaken by the other thread when it has finished.
If there is no wrapper, then the thread registers a new wrapper in the map, and then proceeds to computing the value. When the value is computed, it sets the value and signals the condition.
In pseudo code this looks like this:
mutex_t global_mutex
hashmap_t map
lock(global_mutex)
w = map.get(key)
if (w == NULL) {
w = new Wrapper
map.put(key, w)
unlock(global_mutex)
v = compute_value()
lock(global_mutex)
w.set(v)
signal(w.cond)
unlock(global_mutex)
return v
} else {
v = w.get()
while (v == NULL) {
unlock-and-wait(global_mutex, w.cond)
v = w.get()
}
unlock(global_mutex)
return v
}
In pthreads terms, lock is pthread_mutex_lock(), unlock is pthread_mutex_unlock(), unlock-and-wait is pthread_cond_wait() and signal is pthread_cond_signal(). unlock-and-wait atomically releases the mutex and marks the thread as waiting on the condition; when the thread is awaken, the mutex is automatically reacquired.
This means that each wrapper will have to contain a condition. This embodies your various requirements:
No threads holds a mutex for a long period of time (either blocking or computing a value).
When a value is to be computed, only one thread does it, the other threads which wish to access the value just wait for it to be available.
Note that when a thread wishes to get a value and finds out that some other thread is already busy computing it, the threads ends up locking the global mutex twice: once in the beginning, and once when the value is available. A more complex solution, with one mutex per wrapper, may avoid the second locking, but unless contention is very high, I doubt that it is worth the effort.
About having many mutexes: mutexes are cheap. A mutex is basically an int, it costs nothing more than the four-or-so bytes of RAM used to store it. Beware of Windows terminology: in Win32, what I call here a mutex is deemed an "interlocked region"; what Win32 creates when CreateMutex() is called is something quite different, which is accessible from several distinct processes, and is much more expensive since it involves roundtrips to the kernel. Note that in Java, every single object instance contains a mutex, and Java developers do not seem to be overly grumpy on that subject.
You could use a mutex pool instead of allocating one mutex per resource. As reads are requested, first check the slot in question. If it already has a mutex tagged to it, block on that mutex. If not, assign a mutex to that slot and signal it, taking the mutex out of the pool. Once the mutex is unsignaled, clear the slot and return the mutex to the pool.
One possibility that would be a much simpler solution would be to use a single reader/writer lock on the entire cache. Given that you know there is a maximum number of entries (and it is relatively small), it sounds like adding new keys to the cache is a "rare" event. The general logic would be:
acquire read lock
search for key
if found
use the key
else
release read lock
acquire write lock
add key
release write lock
// acquire the read lock again and use it (probably encapsulate in a method)
endif
Not knowing more about the usage patterns, I can't say for sure if this is a good solution. It is very simple, though, and if the usage is predominantly reads, then it is very inexpensive in terms of locking.

When is it necessary to implement locking when using pthreads in C++?

After posting my solution to my own problem regarding memory issues, nusi suggested that my solution lacks locking.
The following pseudo code vaguely represents my solution in a very simple way.
std::map<int, MyType1> myMap;
void firstFunctionRunFromThread1()
{
MyType1 mt1;
mt1.Test = "Test 1";
myMap[0] = mt1;
}
void onlyFunctionRunFromThread2()
{
MyType1 &mt1 = myMap[0];
std::cout << mt1.Test << endl; // Prints "Test 1"
mt1.Test = "Test 2";
}
void secondFunctionFromThread1()
{
MyType1 mt1 = myMap[0];
std::cout << mt1.Test << endl; // Prints "Test 2"
}
I'm not sure at all how to go about implementing locking, and I'm not even sure why I should do it (note the actual solution is much more complex). Could someone please explain how and why I should implement locking in this scenario?
One function (i.e. thread) modifies the map, two read it. Therefore a read could be interrupted by a write or vice versa, in both cases the map will probably be corrupted. You need locks.
Actually, it's not even just locking that is the issue...
If you really want thread two to ALWAYS print "Test 1", then you need a condition variable.
The reason is that there is a race condition. Regardless of whether or not you create thread 1 before thread 2, it is possible that thread 2's code can execute before thread 1, and so the map will not be initialized properly. To ensure that no one reads from the map until it has been initialized you need to use a condition variable that thread 1 modifies.
You also should use a lock with the map, as others have mentioned, because you want threads to access the map as though they are the only ones using it, and the map needs to be in a consistent state.
Here is a conceptual example to help you think about it:
Suppose you have a linked list that 2 threads are accessing. In thread 1, you ask to remove the first element from the list (at the head of the list), In thread 2, you try to read the second element of the list.
Suppose that the delete method is implemented in the following way: make a temporary ptr to point at the second element in the list, make the head point at null, then make the head the temporary ptr...
What if the following sequence of events occur:
-T1 removes the heads next ptr to the second element
- T2 tries to read the second element, BUT there is no second element because the head's next ptr was modified
-T1 completes removing the head and sets the 2nd element as the head
The read by T2 failed because T1 didn't use a lock to make the delete from the linked list atomic!
That is a contrived example, and isn't necessarily how you would even implement the delete operation; however, it shows why locking is necessary: it is necessary so that operations performed on data are atomic. You do not want other threads using something that is in an inconsistent state.
Hope this helps.
In general, threads might be running on different CPUs/cores, with different memory caches. They might be running on the same core, with one interrupting ("pre-empting" the other). This has two consequences:
1) You have no way of knowing whether one thread will be interrupted by another in the middle of doing something. So in your example, there's no way to be sure that thread1 won't try to read the string value before thread2 has written it, or even that when thread1 reads it, it is in a "consistent state". If it is not in a consistent state, then using it might do anything.
2) When you write to memory in one thread, there is no telling if or when code running in another thread will see that change. The change might sit in the cache of the writer thread and not get flushed to main memory. It might get flushed to main memory but not make it into the cache of the reader thread. Part of the change might make it through, and part of it not.
In general, without locks (or other synchronization mechanisms such as semaphores) you have no way of saying whether something that happens in thread A will occur "before" or "after" something that happens in thread B. You also have no way of saying whether or when changes made in thread A will be "visible" in thread B.
Correct use of locking ensures that all changes are flushed through the caches, so that code sees memory in the state you think it should see. It also allows you to control whether particular bits of code can run simultaneously and/or interrupt each other.
In this case, looking at your code above, the minimum locking you need is to have a synchronisation primitive which is released/posted by the second thread (the writer) after it has written the string, and acquired/waited on by the first thread (the reader) before using that string. This would then guarantee that the first thread sees any changes made by the second thread.
That's assuming the second thread isn't started until after firstFunctionRunFromThread1 has been called. If that might not be the case, then you need the same deal with thread1 writing and thread2 reading.
The simplest way to actually do this is to have a mutex which "protects" your data. You decide what data you're protecting, and any code which reads or writes the data must be holding the mutex while it does so. So first you lock, then read and/or write the data, then unlock. This ensures consistent state, but on its own it does not ensure that thread2 will get a chance to do anything at all in between thread1's two different functions.
Any kind of message-passing mechanism will also include the necessary memory barriers, so if you send a message from the writer thread to the reader thread, meaning "I've finished writing, you can read now", then that will be true.
There can be more efficient ways of doing certain things, if those prove too slow.
The whole idea is to prevent the program from going into an indeterminate/unsafe state due to multiple threads accessing the same resource(s) and/or updating/modifying the resource so that the subsequent state becomes undefined. Read up on Mutexes and Locking (with examples).
The set of instructions created as a result of compiling your code can be interleaved in any order. This can yield unpredictable and undesired results. For example, if thread1 runs before thread2 is selected to run, your output may look like:
Test 1
Test 1
Worse yet, one thread may get pre-empted in the middle of assigning - if assignment is not an atomic operation. In this case let's think of atomic as the smallest unit of work which can not be further split.
In order to create a logically atomic set of instructions -- even if they yield multiple machine code instructions in reality -- is to use a lock or mutex. Mutex stands for "mutual exclusion" because that's exactly what it does. It ensures exclusive access to certain objects or critical sections of code.
One of the major challenges in dealing with multiprogramming is identifying critical sections. In this case, you have two critical sections: where you assign to myMap, and where you change myMap[ 0 ]. Since you don't want to read myMap before writing to it, that is also a critical section.
The simplest answer is: you have to lock whenever there is an access to some shared resources, which are not atomics. In your case myMap is shared resource, so you have to lock all reading and writing operations on it.