access order of std::atomic bool variable - c++

I've a function that accesses(reads and writes to) a std::atomic<bool> variable. I'm trying to understand the order of execution of instructions so as to decide whether atomic will suffice or will I've to use mutexes here. The function is given below -
// somewhere member var 'executing' is defined as std::atomic<bool>`
int A::something(){
int result = 0;
// my intention is only one thread should enter next block
// others should just return 0
if(!executing){
executing = true;
...
// do some really long processing
...
result = processed;
executing = false;
}
return result;
}
I've read this page on cppreference which mentions -
Each instantiation and full specialization of the std::atomic template defines an atomic type. If one thread writes to an atomic object while another thread reads from it, the behavior is well-defined (see memory model for details on data races)
and on Memory model page the following is mentioned -
When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless either
both conflicting evaluations are atomic operations (see std::atomic)
one of the conflicting evaluations happens-before another (see std::memory_order)
If a data race occurs, the behavior of the program is undefined.
and slight below it reads -
When a thread reads a value from a memory location, it may see the initial value, the value written in the same thread, or the value written in another thread. See std::memory_order for details on the order in which writes made from threads become visible to other threads.
This is slightly confusing to me, which one of above 3 statements are actually happening here?
When I perform if(!executing){ is this instruction an atomic instruction here? and more important - is it guaranteed that no other thread will enter that if loop if one two threads will enter that if body since first one will set executing to true?
And if something's wrong with the mentioned code, how should I rewrite it so that it reflects original intention..

If I understand correctly, you are trying to ensure that only one thread will ever execute a stretch of code at the same time. This is exactly what a mutex does. Since you mentioned that you don't want threads to block if the mutex is not available, you probably want to take a look at the try_lock() method of std::mutex. See the documentation of std::mutex.
Now to why your code does not work as intended: Simplifying a little, std::atomic guarantees that there will be no data races when accessing the variable concurrently. I.e. there is a well defined read-write order.
This doesn't suffice for what you are trying to do. Just imagine the if branch:
if(!executing) {
executing = true;
Remember, only the read-write operations on executing are atomic. This leaves at least the negation ! and the if itself unsynchronized. With two threads, the execution order could be like this:
Thread 1 reads executing (atomically), value is false
Thread 1 negates the value read from executing, value = true
Thread 1 evaluates the condition and enters the branch
Thread 2 reads executing (atomically), value is false
Thread 1 set executing to true
Thread 2 negates the value, which was read as false and is now true again
Thread 2 enters the branch...
Now both threads have entered the branch.
I would suggest something along these lines:
std::mutex myMutex;
int A::something(){
int result = 0;
// my intention is only one thread should enter next block
// others should just return 0
if(myMutex.try_lock()){
...
// do some really long processing
...
result = processed;
myMutex.unlock();
}
return result;
}

Related

Why my std::atomic<int> variable isn't thread-safe?

I don't know why my code isn't thread-safe, as it outputs some inconsistent results.
value 48
value 49
value 50
value 54
value 51
value 52
value 53
My understanding of an atomic object is it prevents its intermediate state from exposing, so it should solve the problem when one thread is reading it and the other thread is writing it.
I used to think I could use std::atomic without a mutex to solve the multi-threading counter increment problem, and it didn't look like the case.
I probably misunderstood what an atomic object is, Can someone explain?
void
inc(std::atomic<int>& a)
{
while (true) {
a = a + 1;
printf("value %d\n", a.load());
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
}
int
main()
{
std::atomic<int> a(0);
std::thread t1(inc, std::ref(a));
std::thread t2(inc, std::ref(a));
std::thread t3(inc, std::ref(a));
std::thread t4(inc, std::ref(a));
std::thread t5(inc, std::ref(a));
std::thread t6(inc, std::ref(a));
t1.join();
t2.join();
t3.join();
t4.join();
t5.join();
t6.join();
return 0;
}
I used to think I could use std::atomic without a mutex to solve the multi-threading counter increment problem, and it didn't look like the case.
You can, just not the way you have coded it. You have to think about where the atomic accesses occur. Consider this line of code …
a = a + 1;
First the value of a is fetched atomically. Let's say the value fetched is 50.
We add one to that value getting 51.
Finally we atomically store that value into a using the = operator
a ends up being 51
We atomically load the value of a by calling a.load()
We print the value we just loaded by calling printf()
So far so good. But between steps 1 and 3 some other threads may have changed the value of a - for example to the value 54. So, when step 3 stores 51 into a it overwrites the value 54 giving you the output you see.
As #Sopel and #Shawn suggest in the comments, you can atomically increment the value in a using one of the appropriate functions (like fetch_add) or operator overloads (like operator ++ or operator +=. See the std::atomic documentation for details
Update
I added steps 5 and 6 above. Those steps can also lead to results that may not look correct.
Between the store at step 3. and the call tp a.load() at step 5. other threads can modify the contents of a. After our thread stores 51 in a at step 3 it may find that a.load() returns some different number at step 5. Thus the thread that set a to the value 51 may not pass the value 51 to printf().
Another source of problems is that nothing coordinates the execution of steps 5. and 6. between two threads. So, for example, imagine two threads X and Y running on a single processor. One possible execution order might be this …
Thread X executes steps 1 through 5 above incrementing a from 50 to 51 and getting the value 51 back from a.load()
Thread Y executes steps 1 through 5 above incrementing a from 51 to 52 and getting the value 52 back from a.load()
Thread Y executes printf() sending 52 to the console
Thread X executes printf() sending 51 to the console
We've now printed 52 on the console, followed by 51.
Finally, there's another problem lurking at step 6. because printf() doesn't make any promises about what happens if two threads call printf() at the same time (at least I don't think it does).
On a multiprocessor system threads X and Y above might call printf() at exactly the same moment (or within a few ticks of exactly the same moment) on two different processors. We can't make any prediction about which printf() output will appear first on the console.
Note The documentation for printf mentions a lock introduced in C++17 "… used to prevent data races when multiple threads read, write, position, or query the position of a stream." In the case of two threads simultaneously contending for that lock we still can't tell which one will win.
Besides the increment of a being done non-atomically, the fetch of the value to display after the increment is non-atomic with respect to the increment. It is possible that one of the other threads increments a after the current thread has incremented it but before the fetch of the value to display. This would possibly result in the same value being shown twice, with the previous value skipped.
Another issue here is that the threads do not necessarily run in the order they have been created. Thread 7 could execute its output before threads 4, 5, and 6, but after all four threads have incremented a. Since the thread that did the last increment displays its output earlier, you end up with the output not being sequential. This is more likely to happen on a system with fewer than six hardware threads available to run on.
Adding a small sleep between the various thread creates (e.g., sleep_for(10)) would make this less likely to occur, but would still not eliminate the possibility. The only sure way to keep the output ordered is to use some sort of exclusion (like a mutex) to ensure only one thread has access to the increment and output code, and treat both the increment and output code as a single transaction that must run together before another thread tries to do an increment.
The other answers point out the non-atomic increment and various problems. I mostly want to point out some interesting practical details about exactly what we see when running this code on a real system. (x86-64 Arch Linux, gcc9.1 -O3, i7-6700k 4c8t Skylake).
It can be useful to understand why certain bugs or design choices lead to certain behaviours, for troubleshooting / debugging.
Use int tmp = ++a; to capture the fetch_add result in a local variable instead of reloading it from the shared variable. (And as 1202ProgramAlarm says, you might want to treat the whole increment and print as an atomic transaction if you insist on having your counts printed in order as well as being done properly.)
Or you might want to have each thread record the values it saw in a private data structure to be printed later, instead of also serializing threads with printf during the increments. (In practice all trying to increment the same atomic variable will serialize them waiting for access to the cache line; ++a will go in order so you can tell from the modification order which thread went in which order.)
Fun fact: a.store(1 + a.load(std:memory_order_relaxed), std::memory_order_release) is what you might do for a variable that was only written by 1 thread, but read by multiple threads. You don't need an atomic RMW because no other thread ever modifies it. You just need a thread-safe way to publish updates. (Or better, in a loop keep a local counter and just .store() it without loading from the shared variable.)
If you used the default a = ... for a sequentially-consistent store, you might as well have done an atomic RMW on x86. One good way to compile that is with an atomic xchg, or mov+mfence is as expensive (or more).
What's interesting is that despite the massive problems with your code, no counts were lost or stepped on (no duplicate counts), merely printing reordered. So in practice the danger wasn't encountered because of other effects going on.
I tried it on my own machine and did lose some counts. But after removing the sleep, I just got reordering. (I copy-pasted about 1000 lines of the output into a file, and sort -u to uniquify the output didn't change the line count. It did move some late prints around though; presumably one thread got stalled for a while.) My testing didn't check for the possibility of lost counts, skipped by not saving the value being stored into a, and instead reloading it. I'm not sure there's a plausible way for that to happen here without multiple threads reading the same count, which would be detected.
Store + reload, even a seq-cst store which has to flush the store buffer before it can reload, is very fast compared to printf making a write() system call. (The format string includes a newline and I didn't redirect output to a file so stdout is line-buffered and can't just append the string to a buffer.)
(write() system calls on the same file descriptor are serializing in POSIX: write(2) is atomic. Also, printf(3) itself is thread-safe on GNU/Linux, as required by C++17, and probably by POSIX long before that.)
Stdio locking in printf happens to be enough serialization in almost all cases: the thread that just unlocked stdout and left printf can do the atomic increment and then try to take the stdout lock again.
The other threads were all blocked trying to take the lock on stdout. One (other?) thread can wake up and take the lock on stdout, but for its increment to race with the other thread it would have to enter and leave printf and load a the first time before that other thread commits its a = ... seq-cst store.
This does not mean it's actually safe
Just that testing this specific version of the program (at least on x86) doesn't easily reveal the lack of safety. Interrupts or scheduling variations, including competition from other things running on the same machine, certainly could block a thread at just the wrong time.
My desktop has 8 logical cores so there were enough for every thread to get one, not having to get descheduled. (Although normally that would tend to happen on I/O or when waiting on a lock anyway).
With the sleep there, it is not unlikely for multiple threads to wake up at nearly the same time and race with each other in practice on real x86 hardware. It's so long that timer granularity becomes a factor, I think. Or something like that.
Redirecting output to a file
With stdout open on a non-TTY file, it's full-buffered instead of line-buffered, and doesn't always make a system call while holding the stdout lock.
(I got a 17MiB file in /tmp from hitting control-C a fraction of a second after running ./a.out > output.)
This makes it fast enough for threads to actually race with each other in practice, showing the expected bugs of duplicate values. (A thread reads a but loses ownership of the cache line before it stores (tmp)+1, resulting in two or more threads doing the same increment. And/or multiple threads reading the same value when they reload a after flushing their store buffer.)
1228589 unique lines (sort -u | wc) but total output of
1291035 total lines. So ~5% of the output lines were duplicates.
I didn't check if it was usually one value duplicated multiple times or if it was usually only one duplicate. Or how far backward the value ever jumped. If a thread happened to be stalled by an interrupt handler after loading but before storing val+1, it could be quite far. Or if it actually slept or blocked for some reason, it could rewind indefinitely far.

Correctness of 'Concurrency in Action' atomic operation example

I've been studing 'Concurency in Action' position for some time and I have a problem with understanding following example of code (Listing 5.2):
#include <vector>
#include <atomic>
#include <iostream>
std::vector<int> data;
std::atomic<bool> data_ready(false);
void reader_thread()
{
while(!data_ready.load())
{
std::this_thread::sleep(std::milliseconds(1));
}
std::cout<<”The answer=”<<data[0]<<”\n”;
}
void writer_thread()
{
data.push_back(42); //write of data
data_ready=true; //write to data_ready flag
}
The book explaines:
(...) The write of the data happens-before the write to the data_ready
flag (...)
My concern is that the sentence does not cover the out-of-order execution. From my understanding out of order execution may happen when at least two instruction do not have depended operands. Taking this into account:
data_ready=true
does not need anything from
data.push_back(42)
to be executed. As a result of that it is not guaranteed that:
The write of the data happens-before the write to the data_ready flag
Is my understadning correct or there is something in out-of-order execution that I don't understand causing misunderstaning of given example?
EDIT
Thank you for answers, it was helpful. My misunserstanding was a result of not knowing that atomic types not only prevents from partialy channing a variable, but also acts as memory barrier.
For example following code may be reordered in many combinations by either compiler or processor:
d=0;
b=5;
a=10
c=1;
Resulting with following order (one of many possibilities):
b=5;
a=10
c=1;
d=0;
It it is not a problem with single-thread code since none of expressions have depended operands on other, but on multithreaded application may result of undefined behaviour. For example following code (initial values: x=0 and y=0):
Thread 1: Thread 2:
x=10; while(y!=15);
y=15; assert(x==10);
Without reordering of code by compiler or reordering execution by processor we could say: "Since assigement y=15 allways happens after assigement x=10 and assert happens after while loop the assert will never fail" But it's not true. The real execution order may be as below (one of many possible combinations):
Thread 1: Thread 2:
x=10; (4) while(y!=15); (3)
y=15; (1) assert(x==10); (2)
By default an atomic variable ensures sequentionally consistent ordering. If y in example above was atomic with memory_order_seq_cst default parameter following sentences are true:
- what happens before in thread 1 (x=10) it is also visible in thread 2 as happening before.
- what happens after while(y!=15) in thread 2 it is also visible in thread 1 as happening after
As a result of it assert will never fail.
Some of sources that may help with understaning:
Memory model synchronization modes - GCC
CppCon 2015: Michael Wong “C++11/14/17 atomics and memory
model..."
Memory barriers in C
I understand your concerns, but the code from book is fine. Every operation with atomics is by default memory_order_seq_cst, which means that everything that happened before the write in one of threads happens before read in the rest. You can imagine atomic operations with this std::memory_order like this:
std::atomic<bool> a;
//equivalent of a = true
a.assign_and_make_changes_from_thread_visible(true);
//equvalent of a.load()
a.get_value_and_changes_from_threads();
From Effective Modern C++, Item 40, it says "std::atomics imposes restrictions on how code can be reordered, and one such restriction is that no code that, in the source cod, precedes a write of std::atomic variable may take place afterwards." The note is this is true for when using sequential consistency which is a fair assumption.

C++ : std::atomic<bool> and volatile bool

I'm just reading the C++ concurrency in action book by Anthony Williams.
There is this classic example with two threads, one produce data, the other one consumes the data and A.W. wrote that code pretty clear :
std::vector<int> data;
std::atomic<bool> data_ready(false);
void reader_thread()
{
while(!data_ready.load())
{
std::this_thread::sleep(std::milliseconds(1));
}
std::cout << "The answer=" << data[0] << "\n";
}
void writer_thread()
{
data.push_back(42);
data_ready = true;
}
And I really don't understand why this code differs from one where I'd use a classic volatile bool instead of the atomic one.
If someone could open my mind on the subject, I'd be grateful.
Thanks.
A "classic" bool, as you put it, would not work reliably (if at all). One reason for this is that the compiler could (and most likely does, at least with optimizations enabled) load data_ready only once from memory, because there is no indication that it ever changes in the context of reader_thread.
You could work around this problem by using volatile bool to enforce loading it every time (which would probably seem to work) but this would still be undefined behavior regarding the C++ standard because the access to the variable is neither synchronized nor atomic.
You could enforce synchronization using the locking facilities from the mutex header, but this would introduce (in your example) unnecessary overhead (hence std::atomic).
The problem with volatile is that it only guarantees that instructions are not omitted and the instruction ordering is preserved. volatile does not guarantee a memory barrier to enforce cache coherence. What this means is that writer_thread on processor A can write the value to it's cache (and maybe even to the main memory) without reader_thread on processor B seeing it, because the cache of processor B is not consistent with the cache of processor A. For a more thorough explanation see memory barrier and cache coherence on Wikipedia.
There can be additional problems with more complex expressions than x = y (i.e. x += y) that would require synchronization through a lock (or in this simple case an atomic +=) to ensure the value of x does not change during processing.
x += y for example is actually:
read x
compute x + y
write result back to x
If a context switch to another thread occurs during the computation this can result in something like this (2 threads, both doing x += 2; assuming x = 0):
Thread A Thread B
------------------------ ------------------------
read x (0)
compute x (0) + 2
<context switch>
read x (0)
compute x (0) + 2
write x (2)
<context switch>
write x (2)
Now x = 2 even though there were two += 2 computations. This effect is known as tearing.
The big difference is that this code is correct, while the version with bool instead of atomic<bool> has undefined behavior.
These two lines of code create a race condition (formally, a conflict) because they read from and write to the same variable:
Reader
while (!data_ready)
And writer
data_ready = true;
And a race condition on a normal variable causes undefined behavior, according to the C++11 memory model.
The rules are found in section 1.10 of the Standard, the most relevant being:
Two actions are potentially concurrent if
they are performed by different threads, or
they are unsequenced, and at least one is performed by a signal handler.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
You can see that whether the variable is atomic<bool> makes a very big difference to this rule.
Ben Voigt's answer is completely correct, still a little theoretical, and as I've been asked by a colleague "what does this mean for me", I decided to try my luck with a little more practical answer.
With your sample, the "simplest" optimization problem that could occur is the following:
According to the Standard, an optimized execution order may not change the functionality of a program. Problem is, this is only true for single threaded programs, or single threads in multithreaded programs.
So, for writer_thread and a (volatile) bool
data.push_back(42);
data_ready = true;
and
data_ready = true;
data.push_back(42);
are equivalent.
The result is, that
std::cout << "The answer=" << data[0] << "\n";
can be executed without having pushed any value into data.
An atomic bool does prevent this kind of optimization, as per definition it may not be reordered. There are flags for atomic operations which allow statements to be moved in front of the operation but not to the back, and vice versa, but those require a really advanced knowledge of your programming structure and the problems it can cause...

C++11 atomic: why does this code work?

Let's take this struct:
struct entry {
atomic<bool> valid;
atomic_flag writing;
char payload[128];
}
Two treads A and B concurrently access this struct this way (let e be an instance of entry):
if (e.valid) {
// do something with e.payload...
} else {
while (e.writing.test_and_set(std::memory_order_acquire));
if (!e.valid) {
// write e.payload one byte at a time
// (the payload written by A may be different from the payload written by B)
e.valid = true;
e.writing.clear(std::memory_order_release);
}
}
I guess that this code is correct and does not present issues, but I want to understand why it works.
Quoting the C++ standard (29.3.13):
Implementations should make atomic stores visible to atomic loads
within a reasonable amount of time.
Now, bearing this in mind, imagine that both thread A and B enter the else block. Is this interleave possible?
Both A and B enter the else branch, because valid is false
A sets the writing flag
B starts to spin lock on the writing flag
A reads the valid flag (which is false) and enters the if block
A writes the payload
A writes true on the valid flag; obviously, if A reads valid again, it would read true
A clears the writing flag
B sets the writing flag
B reads a stale value of the valid flag (false) and enters the if block
B writes its payload
B writes true on the valid flag
B clears the writing flag
I hope this is not possible but when it comes to actually answer the question "why it is not possible?", I'm not sure of the answer. Here is my idea.
Quoting from the standard again (29.3.12):
Atomic read-modify-write operations shall always read the last value
(in the modification order) written before the write associated with
the read-modify-write operation.
atomic_flag::test_and_set() is an atomic read-modify-write operation, as stated in 29.7.5.
Since atomic_flag::test_and_set() always reads a "fresh value", and I'm calling it with the std::memory_order_acquire memory ordering, then I cannot read a stale value of the valid flag, because I must see all the side-effects caused by A before the atomic_flag::clear() call (which uses std::memory_order_release).
Am I correct?
Clarification. My whole reasoning (wrong or correct) relies on 29.3.12. For what I understood so far, if we ignore the atomic_flag, reading stale data from valid is possible even if it's atomic. atomic doesn't seem to mean "always immediately visible" to every thread. The maximum guarantee you can ask for is a consistent order in the values you read, but you can still read stale data before getting the fresh one. Fortunately, atomic_flag::test_and_set() and every exchange operation have this crucial feature: they always read fresh data. So, only if you acquire/release on the writing flag (not only on valid), then you get the expected behavior. Do you see my point (correct or not)?
EDIT: my original question included the following few lines that gained too much attention if compared to the core of the question. I leave them for consistency with the answers that have been already given, but please ignore them if you are reading the question right now.
Is there any point in valid being an atomic<bool> and
not a plain bool? Moreover, if it should be an atomic<bool>,
what is its 'minimum' memory ordering constraint that will not present
issues?
Inside the else branch valid should be protected by the acquire/release semantics imposed by the operations on waiting. However this does not obliviate the need to make valid an atomic:
You forgot to include the first line (if (e.valid)) in your analysis. If valid was an bool instead of atomic<bool> this access would be completely unprotected. Therefore you could have the situation where a change of valid becomes visible to other threads before the payload is completely written/visible. This means that a thread B could evaluate e.valid to true and enter the do something with e.payload branch while the payload isn't completely written yet.
Other then that your analysis seems somewhat reasonable but not entirely correct to me. The thing to remember with memory ordering is that acquire and release semantics will pair up. Everything written before a release operation can safely be read after an acquire operation on the same veriable reads the modified value. With that in mind the release semantics on waiting.clear(...) ensure that the write to valid must be visible when the loop on writing.test_and_set(...) exits, since the later reads the change of waiting(the write done inwaiting.clear(...)`) with acquire semantics and doesn't exit before that change is visible.
Regarding §29.3.12: It is relevant to the correctness of your code, but unrelated to the reading a stale valid flag. You can't set the flag before the clear, so acquire-release semantics will ensure correctness there. §29.3.12 protects you from the following scenario:
Both A and B enter the else branch, because valid is false
A sets the writing flag
B sees a stale value for writing and also sets it
Both A and B read the valid flag (which is false), enter the if block and write the payload creating a race condition
Edit: For the minimal Ordering constraints: acquire for the loads and release for the stores should probably do the job, however depending on your target hardware you might as well stay with sequential consistency. For the difference between those semantics look here.
Section 29.3.12 has nothing to do with why this code is correct or incorrect. The section you want (in the draft version of the standard available online) is Section 1.10: "Multi-threaded executions and data races." Section 1.10 defines a happens-before relation on atomic operations, and on non-atomic operations with respect to atomic operations.
Section 1.10 says that if there are two non-atomic operations where you can not determine the happens-before relationship then you have a data-race. It further declares (Paragraph 21) that any program with a data-race has undefined behavior.
If e.valid is not atomic then you have a data race between the first line of code and the line e.valid=true. So all of your reasoning about the behavior in the else clause is incorrect (the program has no defined behavior so there is nothing to reason about.)
On the other hand if all of your accesses to e.valid were protected by atomic operations on e.writing (like if the else clause was your whole program) then your reasoning would be correct. Event 9 in your list could not happen. But the reason is not Section 29.3.12, it is again Section 1.10, which says that your non-atomic operations will appear to be sequentially consistent if there are no dataraces.
The pattern you are using is called double checked locking‌​. Before C++11 it was impossible to implement double checked locking portably. In C++11 you can make double checked locking work correctly and portably. The way you do it is by declaring valid to be atomic.
If valid is not atomic then the initial read of e.valid on the first line conflicts with the assignment to e.valid.
There is no guarantee both threads have already done that read before one of them gets the spinlock, i.e steps 1 and 6 are not ordered.
The store to e.valid needs to a release and the load in the condition needs to be an acquire. Otherwise, the compiler/processor are free to order setting e.valid above writing the payload.
There is an opensource tool, CDSChecker, for verifying code like this against the C/C++11 memory model.

Is read/write of a bool value guaranteed to be one instruction in C/C++

I can't imagine an architecture would design an access to its smallest data type in multiple instructions, but maybe there is some problem with pipelining that I am not considering?
Whether a bool object is read and written in a single operation is not guaranteed by the C++ standard, because that would put constraints on the underlying hardware, which C and C++ try to minimize.
However, note that in multi-threading scenarios the question whether reading/writing a data type is atomic is only one half of the problem. The other half is whether changes to some address are reflected in all caches (i.e. those local to different cores), and whether they are reflected across all threads in the same order. For that you will need memory barriers.
No it is not guaranteed.
C89 and C99 have no means to express atomicity. C11 has atomic objects.
Compiler usually provide extensions to have atomicity: e.g. for gcc:
http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
The better is to use some primitives of the pthreads library.
for example you have 2 threads which uses same data.
your thread 1 must be as in the followings. lets name is as "i" :
while (true) {
flag[i] = TRUE;
turn = j;
while ( flag[j] && turn == j);
CRITICAL SECTION
flag[i] = FALSE;
REMAINDER SECTION
}
your thread 2 must be as in the followings. lets name it as "j" :
while (true) {
flag[j] = TRUE;
turn = i;
while ( flag[i] && turn == i);
CRITICAL SECTION
flag[i] = FALSE;
REMAINDER SECTION
}
flag variable controls the entrance to the critical section for each thread.
the codes runs like :
1- each thread wants to enter critical section by setting its flag true.
2- for example thread "i" gives its pass to thread "j" by setting turn. turn variable stores the thread who entered critical section.
3- since turn variable is capable of storing only one value. it is guarantee that one thread can enter critical section at a time. no other thread can enter critical section if there exists one.
4- thread j sees flag points the pass is its own and wants to enter. therefore it can enter critical section. while thread i waits.
4- after thread j run. it sets its flag variable false by determining itself not to want enter critical section.
5- thread i was held up in the beginning of its while loop.
6- as soon as the thread j gives its turn to other thread by turning its beginning. thread i enters critical section.
this code satisfies. mutex, progress and boundry waiting conditions.
this code can run all environments which supports threading and can be used with any C based language.