What exactly does runtime.Gosched do? - concurrency

In a version prior to the release of go 1.5 of the Tour of Go website, there's a piece of code that looks like this.
package main
import (
func say(s string) {
for i := 0; i < 5; i++ {
func main() {
go say("world")
The output looks like this:
What is bothering me is that when runtime.Gosched() is removed, the program no longer prints "world".
Why is that so? How does runtime.Gosched() affect the execution?

As of Go 1.5, GOMAXPROCS is set to the number of cores of the hardware: golang.org/doc/go1.5#runtime, below the original answer before 1.5.
When you run Go program without specifying GOMAXPROCS environment variable, Go goroutines are scheduled for execution in single OS thread. However, to make program appear to be multithreaded (that's what goroutines are for, aren't they?), the Go scheduler must sometimes switch the execution context, so each goroutine could do its piece of work.
As I said, when GOMAXPROCS variable is not specified, Go runtime is only allowed to use one thread, so it is impossible to switch execution contexts while goroutine is performing some conventional work, like computations or even IO (which is mapped to plain C functions). The context can be switched only when Go concurrency primitives are used, e.g. when you switch on several chans, or (this is your case) when you explicitly tell the scheduler to switch the contexts - this is what runtime.Gosched is for.
So, in short, when execution context in one goroutine reaches Gosched call, the scheduler is instructed to switch the execution to another goroutine. In your case there are two goroutines, main (which represents 'main' thread of the program) and additional, the one you have created with go say. If you remove Gosched call, the execution context will never be transferred from the first goroutine to the second, hence no 'world' for you. When Gosched is present, the scheduler transfers the execution on each loop iteration from first goroutine to the second and vice versa, so you have 'hello' and 'world' interleaved.
FYI, this is called 'cooperative multitasking': goroutines must explicitly yield the control to other goroutines. The approach used in most contemporary OSes is called 'preemptive multitasking': execution threads are not concerned with control transferring; the scheduler switches execution contexts transparently to them instead. Cooperative approach is frequently used to implement 'green threads', that is, logical concurrent coroutines which do not map 1:1 to OS threads - this is how Go runtime and its goroutines are implemented.
I've mentioned GOMAXPROCS environment variable but didn't explain what is it. It's time to fix this.
When this variable is set to a positive number N, Go runtime will be able to create up to N native threads, on which all green threads will be scheduled. Native thread a kind of thread which is created by the operating system (Windows threads, pthreads etc). This means that if N is greater than 1, it is possible that goroutines will be scheduled to execute in different native threads and, consequently, run in parallel (at least, up to your computer capabilities: if your system is based on multicore processor, it is likely that these threads will be truly parallel; if your processor has single core, then preemptive multitasking implemented in OS threads will create a visibility of parallel execution).
It is possible to set GOMAXPROCS variable using runtime.GOMAXPROCS() function instead of pre-setting the environment variable. Use something like this in your program instead of the current main:
func main() {
go say("world")
In this case you can observe interesting results. It is possible that you will get 'hello' and 'world' lines printed interleaved unevenly, e.g.
This can happen if goroutines are scheduled to separate OS threads. This is in fact how preemptive multitasking works (or parallel processing in case of multicore systems): threads are parallel, and their combined output is indeterministic. BTW, you can leave or remove Gosched call, it seems to have no effect when GOMAXPROCS is bigger than 1.
The following is what I got on several runs of the program with runtime.GOMAXPROCS call.
hyperplex /tmp % go run test.go
hyperplex /tmp % go run test.go
hyperplex /tmp % go run test.go
hyperplex /tmp % go run test.go
See, sometimes output is pretty, sometimes not. Indeterminism in action :)
Another update
Looks like that in newer versions of Go compiler Go runtime forces goroutines to yield not only on concurrency primitives usage, but on OS system calls too. This means that execution context can be switched between goroutines also on IO functions calls. Consequently, in recent Go compilers it is possible to observe indeterministic behavior even when GOMAXPROCS is unset or set to 1.

Cooperative scheduling is the culprit. Without yielding, the other (say "world") goroutine may legally get zero chances to execute before/when main terminates, which per specs terminates all gorutines - ie. the whole process.


C++ - Execute function every X milliseconds

I can't seem to find a good answer to this:
I'm making a game, and I want the logic loop to be separate from the graphics loop. In other words I want the game to go through a loop every X milliseconds regardless of how many frames/second it is displaying.
Obviously they will both be sharing a lot of variables, so I can't have a thread/timer passing one variable back and forth... I'm basically just looking for a way to have a timer in the background that every X milliseconds sends out a flag to execute the logic loop, regardless of where the graphics loop is.
I'm open to any suggestions. It seems like the best option is to have 2 threads, but I'm not sure what the best way to communicate between them is, without constantly synchronizing large amounts of data.
You can very well do multithreading by having your "world view" exchanged every tick. So here is how it works:
Your current world view is pointed to by a single smart pointer and is read only, so no locking is necessary.
Your logic creates your (first) world view, publishes it and schedules the renderer.
Your renderer grabs a copy of the pointer to your world view and renders it (remember, read-only)
In the meantime, your logic creates a new, slightly different world view.
When it's done it exchanges the pointer to the current world view, publishing it as the current one.
Even if the renderer is still busy with the old world view there is no locking necessary.
Eventually the renderer finishes rendering the (old) world. It grabs the new world view and starts another run.
In the meantime, ... (goto step 4)
The only locking you need is for the time when you publish or grab the pointer to the world. As an alternative you can do atomic exchange but then you have to make sure you use smart pointers that can do that.
Most toolkits have an event loop (built above some multiplexing syscall like poll(2) -or the obsolete select-...), e.g. GTK has g_application_run (which is above:) gtk_main which is built above Glib main event loop (which in fact does a poll or something similar). Likewise, Qt has QApplication and its exec methods.
Very often, you can register timers within the event loop. For GTK, use GTimers, g_timeout_add etc. For Qt learn about its timers.
Very often, you can also register some idle or background processing, which is one of your function which is started by the event loop after other events and timeouts have been processed. Your idle function is expected to run quickly (usually it does a small step of some computation in a few milliseconds, to keep the GUI responsive). For GTK, use g_idle_add etc. IIRC, in Qt you can use a timer with a 0 delay.
So you could code even a (conceptually) single threaded application, using timeouts and idle processing.
Of course, you could use multi-threading: generally the main thread is running the event loop, and other threads can do other things. You have synchronization issues. On POSIX systems, a nice synchronization trick could be to use a pipe(7) to self: you set up a pipe before running the event loop, and your computation threads may write a few bytes on it, while the main event loop is "listening" on it (with GTK, using g_source_add_poll or async IO or GUnixInputStream etc.., with Qt, using QSocketNotifier etc....). Then, in the input handler running in the main loop for that pipe, you could access traditional global data with mutexes etc...
Conceptually, read about continuations. It is a relevant notion.
You could have a Draw and Update Method attached to all your game components. That way you can set it that while your game is running the update is called and the draw is ignored or any combination of the two. It also has the benefit of keeping logic and graphics completely separate.
Couldn't you just have a draw method for each object that needs to be drawn and make them globals. Then just run your rendering thread with a sleep delay in it. As long as your rendering thread doesn't write any information to the globals you should be fine. Look up sfml to see an example of it in action.
If you are running on a unix system you could use usleep() however that is not available on windows so you might want to look here for alternatives.

g_main_loop uses 100% CPU

I have built my first application using glibmm. I'm using a lot of threads as it does heavy processing. I have tried to follow the guidelines concerning multithreading, i.e. not doing any GUI updates from other threads than the one where g_main_loop is running.
I do a lot of graphics rendering in worker threads but I always only update a PixBuf which is later drawn by the widgets on_draw() from the main loop.
All was fine as long as the data I render was read from files. When I started streaming data from a server which I render at regular intervals then the problems started.
Every now and then, especially when executing multiple instances of my application simultaneously, I see that the main threads takes 100% CPU time. Running strace on the process shows that g_main_loop has ended up in an eternal loop calling poll:
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=10, events=POLLIN}, {fd=8, events=POLLIN}], 4, 100) = 1 ([{fd=10, revents=POLLIN}])
In proc I get this for file-descriptor 10: 10 -> socket:[1132750]
The poll always returns immediately as file-descriptor 10 has something to offer. This goes on forever so I assume that the file-descriptor is never read. The odd thing is that running 5 applications will almost always lead to all 5 ending up in the infinite poll loop after just a couple of minutes while running only instance one seems to work more than 30 minutes most of the times I try.
Why is this happening and is there any way to debug this?
My mistake was that I called queue_draw() from one of my worker threads. Given that the function is called "queue", I assumed it would queue a redraw which would later be executed by the g_main_loop. As it turned out, this was what broke the g_main_loop. I wish libgtkmm would have a little more detail about these multithreading restrictions in its reference manual.
My solution, to the problem was adding Glib::Dispatcher queueRedraw to my Widget and connecting it to the queue_draw() function:
queueRedraw.connect(sigc::mem_fun(*this, &MyWidgetClass::queue_draw))
Calling queueRedraw() signals the main thread to call the queue_draw() function.
I don't know if this is the best approach, but it solves the problem.

Multiple CUDA streams crashing GPU

This is a continuation of this post.
It seems as though a special case has been solved by adding volitile but now something else has broken. If I add anything between the two kernel calls, the system reverts back to the old behavior, namely freezing and printing everything at once. This behavior is shown by adding sleep(2); between set_flag and read_flag. Also, when put in another program, this causes the GPU to lock up. What am I doing wrong now?
Thanks again.
There is an interaction with X and the display driver, as well as the standard output queue and it's interaction with the graphical display driver.
A few experiments you can try, (with the sleep(2); added between the set_flag and read_flag kernels):
Log into your machine over the network via ssh from another machine. I think your program will work. (X is not involved in the display in this case)
comment out the line that prints out "Starting..." I think your
program will then work. (This avoids the display driver/ print queue deadlock, see below).
add a sleep(2); in between the "Starting..." print line and the first kernel. I think your program will then work. (This allows the display driver to fully service the first printout before the first kernel is launched, so no CPU thread stall.)
Stop X and run from a console. I think your program will work.
When the GPU is both hosting an X display and also running CUDA tasks, it has to switch between the two. For the duration of the CUDA task, ordinary display processing is suspended. You can read more about this here.
The problem here is that when running X, the first printout is getting sent to the print queue but not actually displayed before the first kernel is launched. This is evident because you don't see the printout before the display freeze. After that, the CPU thread is getting stalled waiting for the display of the text. The second kernel is not starting. The intervening sleep(2); and it's interaction with the OS is enough for this stall to occur. And the executing first kernel has the display driver "stopped" for ordinary display tasks, so the OS never gets past it's stall, so the 2nd kernel doesn't get launched, leading to the apparent hang.
Note that options 1,2, or 3 in the linked custhelp article would be effective in your case. Option 4 would not.

Python C API - Stopping Execution (and continuing it later)

1) I would like to use the profiling functions in the Python C API to catch the python interpreter when it returns from specific functions.
2) I would like to pause the python interpreter, send execution back to the function that called the interpreter in my C++ program, and finally return execution to the python interpreter, starting it on the line of code after where it stopped. I would like to maintain both globals and locals between the times where execution belongs to python.
Part 1 I've finished. Part 2 is my question. I don't know what to save so I can return to execution, or how to return to execution given that saved data.
From what I could get off the python API docs, I will have to save some part of the executing frame, but I haven't found anything. Some additional questions...
What, exactly does a PyFrameObject contain? The python API docs, surprisingly, never explain that.
If I understand your problem, you have a C++ program that calls into python. When python finishes executing a function, you want to pause the interpreter and pick up where the C++ code left off. Some time later your C++ program needs to cal back into python, and have the python interpreter pick up where it left off.
I don't think you can do this very easily with one thread. Before you pause the interpreter the stack looks like this:
[ top of stack ]
[ some interpreter frames ]
[ some c++ frames ]
To pause the interpreter, you need to save off the interpreter frames, and jump back to the top-most C++ frame. Then to unpause, you need to restore the interpreter frames, and jump up the stack to where you left off. Jumping is doable (see http://en.wikipedia.org/wiki/Setjmp.h), but saving and restoring the stack is harder. I don't know of an API to do this.
However you could do this with two threads. The thread created at the start of your c++ program (call it thread 1) runs the c++ code, and it creates thread 2 to run the python interpreter.
Initially (when were running c++ code), thread 1 is executing and thread 2 is blocked (say on a condition variable, see https://computing.llnl.gov/tutorials/pthreads/). When you run or unpause the interpreter thread 1 signals the condition variable, and waits on it. This wakes up thread 2 (which runs the interpreter) and causes thread 1 to block. When the interpreter needs to pause, thread 2 signals the condition variable and waits on it (so thread 2 blocks, thread 1 wakes up). You can bounce back and forth between the threads to your heart's content. Hope this helps.

Strange issue running infinite while loop in EXE

I am facing strange issue on Windows CE:
Running 3 EXEs
1)First exe doing some work every 8 minutes unless exit event is signaled.
2)Second exe doing some work every 5 minutes unless exit event signaled.
3)Third exe while loop is running and in while loop it do some work at random times.
This while loop continues until exit event signaled.
Now this exit event is global event and can be signaled by any process.
The Problem is
When I run First exe it works fine,
Run second exe it works fine,
run third exe it works fine
When I run all exes then only third exe runs and no instructions get executed in first and second.
As soon as third exe gets terminated first and second starts get processing.
It that can be the case that while loop in third exe is taking all CPU cycles?
I havn't tried putting Sleep but I think that can do some tricks.
But OS should give CPU to all processes ...
Any thoughts ???
Put the while loop in the third EXE to Sleep each time through the loop and see what happens. Even if it doesn't fix this particular probem, it isn't ever good practice to poll with a while loop, and even using Sleep inside a loop is a poor substitute for a proper timer.
On the MSDN, I also read that CE allows for (less than) 32 processes simultaneously. (However, the context switches are lightning fast...). Some are already taken by system services.
(From Memory) Processes in Windows CE run until completion if there are no higher priority processes running, or they run for their time slice (100ms) if there are other processes of equal priority running. I'm not sure if Windows CE gives the process with the active/foreground window a small priority boost (just like desktop Windows), or not.
In your situation the first two processes are starved of processor time so they never run until the third process exits. Some ways to solve this are:
Make the third process wait/block on some multi-process primitives (mutex, semaphore, etc) and a short timeout. Using WaitForMultipleObjects/WaitForSingleObject etc.
Make the third process wait using a call to Sleep every time around the processing loop.
Boost the priority of the other processes so when they need to run they will interrupt the third process and actually run. I would probably make the least often called process have the highest priority of the three processes.
The other thing to check is that the third process does actually complete its tasks in time, and does not peg the CPU trying to do its thing normally.
Yeah I think that is not good solution . I may try to use timer and see the results..