Why does this program terminate on my system but not on playground? - concurrency

Consider this program:
package main
import "fmt"
import "time"
import "runtime"
func main() {
x := 0
go func() {
time.Sleep(500 * time.Millisecond)
x = 1
}()
for x == 0 {
runtime.Gosched()
}
fmt.Println("it works!")
}
Why does it terminate locally but not on Playground? Does the termination of my program rely on undefined behavior?

The Go playground uses a special implementation of time.Sleep designed to prevent individual programs from monopolising the back end resources of the website.
As described in this article about the how the playground is implemented, goroutines that call time.Sleep() are put to sleep. The playground back end waits until all other goroutines are blocked (what would otherwise be a deadlock), and then wakes the goroutine with the shortest time out.
In your program, you have two goroutines: the main one, and one that calls time.Sleep. Since the main goroutine never blocks, the time.Sleep call will never return. The program continues until it exceeds the CPU time allocated to it and is then terminated.

The Go Memory Model does not guarantee that the value written to x in the goroutine will ever be observed by the main program. A similarly erroneous program is given as an example in the section on go routine destruction. The Go Memory Model also specifically calls out busy waiting without synchronization as an incorrect idiom in this section.
You need to do some kind of synchronization in the goroutine in order to guarantee that x=1 happens before one of the iterations of the for loop in main.
Here is a version of the program that is guaranteed to work as intended.
http://play.golang.org/p/s3t5_-Q73W
package main
import (
"fmt"
"time"
)
func main() {
c := make(chan bool)
x := 0
go func() {
time.Sleep(500 * time.Millisecond)
x = 1
close(c) // 1
}()
for x == 0 {
<-c // 2
}
fmt.Println("it works!")
}
The Go Memory Model guarantees that the line marked with // 1 happens before the line marked with // 2. As a result, the for loop is guaranteed to terminate before its second iteration.

That code doesn't offer much guarantees. It's relying almost entirely on implementation details around undefined behavior.
In most multi-threaded systems, there's no guarantee that a change in one thread with no barrier will be seen in another. You've got a goroutine which could be running on another processor altogether writing a value to a variable nobody's ever guaranteed to read.
The for x == 0 { could quite easily be rewritten to for { since there's never a guarantee any changes to that variable might be visible.
The race detector will also probably report this issue. You should really not expect this to work. If you want a sync.WaitGroup you should just use one as it properly coordinates across threads.

Related

C++ atomics reading stale value

I'm reading the C++ Concurrency in Action book and I'm having trouble understanding the visibility of writes to atomic variables.
Lets say we have a
std::atomic<int> x = 0;
and we read/write with sequential consistent ordering
1. ++x;
// <-- thread 2
2. if (x == 1) {
// <-- thread 1
}
If we have 2 threads that execute the code above.
Is it possible that thread 1 arrives at line 2. and reads x == 1, after thread 2 already executed line 1.?
So does the sequential consistent ++x of thread 2 instantly gets propagated to thread t1 or is it possible that thread 1 reads a stale value x == 1?
I think if we use relaxed_ordering or acq/rel the above situation is possible, but how about the sequential consistent ordering?
If you're thinking that multiple atomic operations are somehow safely grouped, you're wrong. They'll always occur in order within that thread, and they'll be visible in that order, but there is no guarantee that two separate operations will occur in one thread before either occurs in the other.
So for your specific question "Is it possible that thread 1 arrives at line 2. and reads x == 1, after thread 2 already executed line 1.?", the answer is yes, thread 1 could reach the x == 1 test after thread 2 has incremented x as well, so x would already be 2 and neither thread would see x == 1 as true.
The simplest way to think about this is to imagine a single processor system, and consider what happens if the running thread is switched out at any time aside from the middle of a single atomic operation.
So in this case, the operations (inc1 and test1 for thread 1 and inc2 and test2 for thread 2) could occur in any of the following orders:
inc1 test1 inc2 test2
inc1 inc2 test1 test2
inc1 inc2 test2 test1
inc2 inc1 test1 test2
inc2 inc1 test2 test1
inc2 test2 inc1 test1
As you see, there is no possibility of either test occurring before either increment, nor can both tests pass (because the only way a test passes is if the increment associated with it on that thread has occurred but not the increment on the other thread), but there's no guarantee any test passes (both increments could precede both tests, causing both tests to test against the value 2 and neither test to pass). The race window is narrow, so most of the time you'd probably see exactly one test pass, but it's wholly possible to get unlucky and have neither pass.
If you want to make this work reliably, you need to make sure you both modify and test in a single operation, so exactly one thread will see the value as being 1:
if (++x == 1) { // The first thread to get here will do the stuff
// Do stuff
}
In this case, the increment and read are a single atomic operation, so the first thread to get to that line (which might be thread 1 or thread 2, no guarantees) will perform the first increment with ++x atomically returning the new value which is tested. Two threads can't both see x become 1, because we kept both increment and test as one operation.
That said, if you're relying on the content of that if being completed before any thread executes code after the if, that won't work; the first thread could enter the if, while the second thread arrives nanoseconds later and skips it, realizing it wasn't the first to get there, and it would immediately begin executing the code after the if even if the first thread hasn't finished. Simple use of atomics like this is not suited for a "run only once" scenario that people often write this code for when the "run only once" code must be run exactly once before dependent code is executed.
Let's simplify your question.
When 2 threads execute func()
std::atomic <int> x=0;
void func()
{
++x;
std::cout << x;
}
Following result is possible?
11
And the answer is NO!
Only "12" or "21" is possible.
The sequential consistency on a atomic variable works as you want on this simple case.

output 10 with memory_order_seq_cst

When i run this program i get output as 10 which seems to be impossible for me. I'm running this on x86_64 core i3 ubuntu.
If the output is 10, then 1 must have come from either c or d.
Also in thread t[0], we assign c as 1. Now a is 1 since it occurs before c=1. c is equal to b which was set to 1 by thread 1. So when we store d it should be 1 as a=1.
Can output 10 happen with memory_order_seq_cst ? I tried inserting a atomic_thread_fence(seq_cst) on both thread between 1st (variable =1 ) and 2nd line (printf) but it still didn't work.
Uncommenting both the fence doesn't work.
Tried running with g++ and clang++. Both give the same result.
#include<thread>
#include<unistd.h>
#include<cstdio>
#include<atomic>
using namespace std;
atomic<int> a,b,c,d;
void foo(){
a.store(1,memory_order_seq_cst);
// atomic_thread_fence(memory_order_seq_cst);
c.store(b,memory_order_seq_cst);
}
void bar(){
b.store(1,memory_order_seq_cst);
// atomic_thread_fence(memory_order_seq_cst);
d.store(a,memory_order_seq_cst);
}
int main(){
thread t[2];
t[0]=thread(foo); t[1]=thread(bar);
t[0].join();t[1].join();
printf("%d%d\n",c.load(memory_order_seq_cst),d.load(memory_order_seq_cst));
}
bash$ while [ true ]; do ./a.out | grep "10" ; done
10
10
10
10
10 (c=1, d=0) is easily explained: bar happened to run first, and finished before foo read b.
Quirks of inter-core communication to get threads started on different cores means it's easily possible for this to happen even though thread(foo) ran first in the main thread. e.g. maybe an interrupt arrived at the core the OS chose for foo, delaying it from actually getting into that code1.
Remember that seq_cst only guarantees that some total order exists for all seq_cst operations which is compatible with the sequenced-before order within each thread. (And any other happens-before relationship established by other factors). So the following order of atomic operations is possible without even breaking out the a.load2 in bar separately from the d.store of the resulting int temporary.
b.store(1,memory_order_seq_cst); // bar1. b=1
d.store(a,memory_order_seq_cst); // bar2. a.load reads 0, d=0
a.store(1,memory_order_seq_cst); // foo1
c.store(b,memory_order_seq_cst); // foo2. b.load reads 1, c=1
// final: c=1, d=0
atomic_thread_fence(seq_cst) has no impact anywhere because all your operations are already seq_cst. A fence basically just stops reordering of this thread's operations; it doesn't wait for or sync with fences in other threads.
(Only a load that sees a value stored by another thread can create synchronization. But such a load doesn't wait for the other store; it has no way of knowing there is another store. If you want to keep loading until you see the value you expect, you have to write a spin-wait loop.)
Footnote 1:
Since all your atomic vars are probably in the same cache line, even if execution did reach the top of foo and bar at the same time on two different cores, false-sharing is likely going to let both operations from one thread happen while the other core is still waiting to get exclusive ownership. Although seq_cst stores are slow enough (on x86 at least) that hardware fairness stuff might relinquish exclusive ownership after committing the first store of 1. Anyway, lots of ways for both operations in one thread to happen before the other thread and get 10 or 01. Even possible to get 11 if we get b=1 then a=1 before either load. Using seq_cst does stop the hardware from doing the load early (before the store is globally visible), so it's very possible.
Footnote 2: The lvalue-to-rvalue evaluation of bare a uses the overloaded (int) conversion which is equivalent to a.load(seq_cst). The operations from foo could happen between that load and the d.store that gets a temporary value from it. d.store(a) is not an atomic copy; it's equivalent to int tmp = a; d.store(tmp);. That isn't necessary to explain your observations.
The printf statements are unsynchronized so output of 10 can be just a reordered 01.
01 happens when the functions before the printf run serially.

Odd thread behaviors

The next code normally prints BA but sometimes it can print BBAA, BAAB, ... How is it possible to get two A or B with it?! However this code never prints three A or B. Both functions (produce and consume) run a lot of threads. Many thanks in advance.
int permission;
void set_permission(int v) {
permission = v;
printf("%c", v + 'A');fflush(stdin);
}
void* produce(void*) {
for (;;) {
pthread_mutex_lock(&mr1);
set_permission(1);
while (permission == 1);
pthread_mutex_unlock(&mr1);
}
}
void* consume(void*) {
for (;;) {
pthread_mutex_lock(&mr2);
while (permission == 0);
set_permission(0);
pthread_mutex_unlock(&mr2);
}
}
Your threads are not synchronized, as they are not using the same mutex.
The other thread can by chance only mange to set permission to 1 or 0, but not manage to produce output yet. In which case it appears as if the first thread ran two full rounds.
The write by the corresponding thread can also get entirely lost, when the memory content is synchronized between cores and both threads wrote. The mutex also prevents this from happening, because it establishes a strict memory access order, which, to put it simple, guarantees that everything which has happened under the protection of one mutex is fully visible to the next user of the same mutex.
Printing the same character 3 or more times would be very unlikely, as there is at most one write happening in between, so at most one lost write, or one out of order output. That's not guaranteed though.
If you are working on a system with no implicit memory synchronisation at all, your code could also just straight out deadlock, as the writes done under one mutex never propagate to the users of the other one. ( Doesn't actually happen because there is still some synchronisation introduced by the IO operations. )

Strange behavior of go routine

I just tried the following code, but the result seems a little strange. It prints odd numbers first, and then even numbers. I'm really confused about it. I had hoped it outputs odd number and even number one after another, just like 1, 2, 3, 4... . Who can help me?
package main
import (
"fmt"
"time"
)
func main() {
go sheep(1)
go sheep(2)
time.Sleep(100000)
}
func sheep(i int) {
for ; ; i += 2 {
fmt.Println(i,"sheeps")
}
}
More than likely you are only running with one cpu thread. so it runs the first goroutine and then the second. If you tell go it can run on multiple threads then both will be able to run simultaneously provided the os has spare time on a cpu to do so. You can demonstrate this by setting GOMAXPROCS=2 before running your binary. Or you could try adding a runtime.Gosched() call in your sheep function and see if that triggers the runtime to allow the other goroutine to run.
In general though it's better not to assume ordering semantics between operations in two goroutines unless you specify specific synchronization points using a sync.Mutex or communicating between them on channels.
Unsynchronized goroutines execute in a completely undefined order. If you want to print out something like
1 sheeps
2 sheeps
3 sheeps
....
in that exact order, then goroutines are the wrong way to do it. Concurrency works well when you don't care so much about the order in which things occur.
You could impose an order in your program through synchronization (locking a mutex around the fmt.Println calls or using a channel), but it's pointless since you could more easily just write code that uses a single goroutine.

pthreads - previously created thread uses new value (updated after thread creation)

So here's my scenario. First, I have a structure -
struct interval
{
double lower;
double higher;
}
Now my thread function -
void* thread_function(void* i)
{
interval* in = (interval*)i;
double a = in->lower;
cout << a;
pthread_exit(NULL)
}
In main, let's say I create these 2 threads -
pthread_t one,two;
interval i;
i.lower = 0; i.higher = 5;
pthread_create(&one,NULL,thread_function,&i);
i.lower=10; i.higher = 20;
pthread_create(&two,NULL,thread_function, &i);
pthread_join(one,NULL);
pthread_join(two,NULL);
Here's the problem. Ideally, thread "one" should print out 0 and thread "two" should print out 10. However, this doesn't happen. Occasionally, I end up getting two 10s.
Is this by design? In other words, by the time the thread is created, the value in i.lower has been changed already in main, therefore both threads end up using the same value?
Is this by design?
Yes. It's unspecified when exactly the threads start and when they will access that value. You need to give each one of them their own copy of the data.
Your application is non-deterministic.
There is no telling when a thread will be scheduled to run.
Note: By creating a thread does not mean it will start executing immediately (or even first). The second thread created may actually start running before the first (it is all dependant on the OS and hardware).
To get deterministic behavior each thread must be given its own data (that is not modified by the main thread).
pthread_t one,two;
interval oneData,twoData
oneData.lower = 0; oneData.higher = 5;
pthread_create(&one,NULL,thread_function,&oneData);
twoData.lower=10; twoData.higher = 20;
pthread_create(&two,NULL,thread_function, &twoData);
pthread_join(one,NULL);
pthread_join(two,NULL);
I would not call it by design.
I would rather refer to it as a side-effect of scheduling policy. But the observed behavior is what I would expect.
This is the classic 'race condition'; where the results vary depending on which thread wins the 'race'. You have no way of knowing which thread will 'win' each time.
Your analysis of the problem is correct; you simply don't have any guarantees that the first thread created will be able to read i.lower before the data is changed on the next line of your main function. This is in some sense the heart of why it can be hard to think about multithreaded programming at first.
The straight forward solution to your immediate problem is to keep different intervals with different data, and pass a separate one to each thread, i.e.
interval i, j;
i.lower = 0; j.lower = 10;
pthread_create(&one,NULL,thread_function,&i);
pthread_create(&two,NULL,thread_function,&j);
This will of course solve your immediate problem. But soon you'll probably wonder what to do if you want multiple threads actually using the same data. What if thread 1 wants to make changes to i and thread 2 wants to take these into account? It would hardly be much point in doing multithreaded programming if each thread would have to keep its memory separate from the others (well, leaving message passing out of the picture for now). Enter mutex locks! I thought I'd give you a heads up that you'll want to look into this topic sooner rather than later, as it'll also help you understand the basics of threads in general and the required change in mentality that goes along with multithreaded programming.
I seem to recall that this is a decent short introduction to pthreads, including getting started with understanding locking etc.