Issue with Mutual Execution of Concurrent Go Routines

Issue with Mutual Execution of Concurrent Go Routines - concurrency

In my code there are three concurrent routines. I try to give a brief overview of my code,
Routine 1 {
do something
*Send int to Routine 2
Send int to Routine 3
Print Something
Print Something*
do something
}
Routine 2 {
do something
*Send int to Routine 1
Send int to Routine 3
Print Something
Print Something*
do something
}
Routine 3 {
do something
*Send int to Routine 1
Send int to Routine 2
Print Something
Print Something*
do something
}
main {
routine1
routine2
routine3
}
I want that, while codes between two do something (codes between two star marks) is executing, flow of control must not go to other go routines. For example, when routine1 is executing the events between two stars (sending and printing events), routine 2 and 3 must be blocked (means flow of execution does not pass to routine 2 or 3 from routine 1).After completing last print event, flow of execution may pass to routine 2 or 3.Can anybody help me by specifying, how can I achieve this ? Is it possible to implement above specification by WaitGroup ? Can anybody show me by giving a simple example how to implement above specified example by using WaitGroup. Thanks.
NB:May be this is repeat question of this. I tried by using that sync-lock mechanism, however, may be because I have a large code that's why I could not put lock-unlock properly, and it's creating deadlock situation (or may be my method is error producing). Can anybody help me by a simple procedure thus I can achieve this. I give a simple example of my code here where Here I want to put two prints and sending event inside mutex (for routine 1) thus routine 2 can't interrupt it. Can you help me how is it possible. One possible solution given,
http://play.golang.org/p/-uoQSqBJKS which gives error.

Why do you want to do this?
The deadlock problem is, if you don't allow other goroutines to be scheduled, then your channel sends can't proceed, unless there's buffering. Go's channels have finite buffering, so you end up with a race condition on draining before they get sent on while full. You could introduce infinite buffering, or put each send in its own goroutine, but it again comes down to: why are you trying to do this; what are you trying to achieve?
Another thing: if you only want to ensure mutual exclusion of the three sets of code between *s, then yes, you can use mutexes. If you want to ensure that no code interrupts your block, regardless of where it was suspended, then you might need to use runtime.LockOSThread and runtime.UnlockOSThread. These are fairly low level and you need to know what you're doing, and they're rarely needed. Of you want there to be no other goroutines running, you'll have to have runtime.GOMAXPROCS(1), which is currently the default.

The problem in answering your question is that it seems no one understands what your problem really is. I see you're asking repeatedly about roughly the same, though no progress has been done. There's no offense in saying this. It's an attempt to help you by a suggestion to reformulate your problem in a way comprehensible to others. As a possible nice side effect, some problems do solve themselves while being explained to others in an understandable way. I've experienced that many times by myself.
Another hint could be in the suspicious mix of explicit syncing and channel communication. That doesn't mean the design is necessarily broken. It just doesn't happen in a typical/simple case. Once again, your problem might be atypical/non trivial.
Perhaps it's somehow possible to redesign your problem using only channels. Actually I believe that every problem involving explicit synchronization (in Go) could be coded while using only channels. That said, it is true some problems are written with explicit synchronization very easily. Also channel communication, as cheap as it is, is not as cheap as most synchronization primitives. But that could be looked after later, when the code works. If the "pattern" for some say sync.Mutex will visibly emerge in the code, it should be possible to switch to it and much more easy to do that when the code already works and hopefully has tests to watch your steps while making the adjustments.
Try to think about your goroutines like independently acting agents which:
Exclusively own the data received from the channel. The language will
not enforce this, you must deploy own's discipline.
Don't anymore touch the data they've sent to a channel. It follows from first rule, but important enough to be explicit.
Interact with other agents (goroutines) by data types, which encapsulate a whole unit of workflow/computation. This eliminates e.g. your earlier struggle with geting the right number of channel messages before the "unit" is complete.
For every channel they use, it must be absolutely clear in before if the channel must be unbuffered, must be buffered for fixed number of items or if it may be unbound.
Don't have to think (know) about what other agents are doing above getting a message from them if that is needed for the agent to do its own task - part of the bigger picture.
Using even such few rules of thumb should hopefully produce code which is more easy to reason about and which usually doesn't requires any other synchronization. (I'm intentionally ignoring performance issues of mission critical applications now.)

Related

C++ Concurrency, Coroutines & Job Scheduling?

I'm trying to get my head around multithreading in C++, to come up with a general purpose implementation that suits me. Everyone has a different implementation, Awesome CPP lists 39 libraries. It seems to me though that this is a logistical problem that is of the same ilk as any logistical scheduling problem in any field.
In my head, there are two obvious ways to repeatedly perform the job abc:
Split abc into 3 separate tasks: a, b & c. Spawn x threads. Have a queue. Jobs coming in get added to the queue. Each thread grabs the next task from the queue, and at the end of the task puts it back into the queue for the next task. They can either access the queue directly, or they can all communicate with a central 'manager' or 'scheduler' thread that serves them with their tasks.
Perform abc sequentially on x separate threads independently (parallelism.)
(1) has the problem that there is potentially a lot of overhead in keeping a queue and dealing with race conditions on it. (1) is otherwise intuitive and makes sense to me. It's what I would do in real life with a real life problem. It's literally how companies work in the real world.
(2) has the problem that any blocking causes the whole thread to block, idling the CPU thread. And (2) is far less flexible and applicable in less use-cases. On the plus side is has no overhead between tasks.
Question 1: Doesn't (1) also have the same blocking problem? If a thread reads from a file, it'll have to wait for the disk. How is that usually addressed, is there some way to yield back temporarily while its doing something like reading or writing from disk, or is this usually addressed simply by having more threads running than there are CPU threads and hoping not too many block at once?
It seems to me that (1) is clearly the better solution, except that it restricts the tasks only to medium to large scaled tasks. It would be pointless to use it to do something like parallelizing straightforward math (just an example) because handling the queue would take longer than the actual processing of the task. Hence the value of (1) for any given task is inversely proportionate to the difference between the overhead of the storage mechanism (the queue) and the size of the task. This sounds fine on the surface, until you realize that the efficiency of splitting into tasks is itself proportionate to the size of the task. To put it simply: you want each task to be small for overall efficiency in theory, but in practice you want each task to be larger so as to minimize the overhead of the queue.
Its obvious that some storage mechanism is required because you can't keep track of something without a recording mechanism, it doesn't have to be strictly a queue, but any form of recording the task in memory while it waits to be picked up. The optimization of the queue (I'm using the word loosely, not strictly a queue type) is then the #1 important factor here. The cheaper a task can receive its payload, the better.
Which leads me to Question 2: is this what C++20 coroutines are useful for? I've spent hours reading tutorials on coroutines, but it's still unclear what useful they're for. I think I get what they do. If I have it right they allow a special type of function (coroutine) to pause itself in the middle, yield its processing back to the caller along with a payload, and the caller can later resume it. But why would I want to do that? And can't I do that just by splitting the function into two?
Question 3: Are coroutines meant to be used by a task scheduler thread to somehow optimize the queuing? Or is the point just to allow you to write code linearly and then put those yields in it to break it up? In which case it wouldn't be useful for me if I already had my jobs split up into separate tasks by design?
Question 4: Am I trying to reinvent the wheel here? Has this problem already been solved? And if so, why are there so many different implementations?

Q1: No, it more likely has a different blocking problem.
Q2: Co-routines have many applications; try substituting for X in "is this what X is for?" X = { while, if, return, pointer, ... }. Don't look to standards bodies (particularly that one) for insight; they are best at punctuation and spell checking.
Q3: Co-routines can be used to optimise various constructions, but the real goal of using such a formalism is to make your program as natural an expression of the problem as possible. One of the better examples of how Co-routines can be intelligently used are the Go-routines of Go.
Q4: Probably; almost definitely; because many of the solutions are inadequate.
Q1+Q4. There is no single blocking problem, some that come to mind are: Deadlock, Livelock, unnecessarily sequential, non-Scalable, Slow. Some structures {{ threads, coroutines, threads + coroutines } * { locks, conditions, message passing }} help solve some of these problems, but induce others. My favourite is { (threads + coroutines) * (message passing) }, which is typically good for everything but Slow.

When should I use concurrency in Go?

So besides handling multiple server requests is there any other time that concurrency is relevant? I ask because it's so built into the language that I feel wasteful if I don't use it but I can barely find a use for it.

Not an expert in Go (yet) but I'd say:
Whenever it is easiest to do so.
The beauty of the concurrency model in Go is that it is not fundamentally a multi-core architecture with checks and balances where things usually break - it is a multi-threaded paradigm that not only fits well into a multi-core architecture, it also fits well into a distributed system architecture.
You do not have to make special arrangements for multiple goroutines to work together harmoniously - they just do!
Here's an example of a naturally concurrent algorithm - I want to merge multiple channels into one. Once all of the input channels are exhausted I want to close the output channel.
It is just simpler to use concurrency - in fact it doesn't even look like concurrency - it looks almost procedural.
/*
Multiplex a number of channels into one.
*/
func Mux(channels []chan big.Int) chan big.Int {
// Count down as each channel closes. When hits zero - close ch.
var wg sync.WaitGroup
wg.Add(len(channels))
// The channel to output to.
ch := make(chan big.Int, len(channels))
// Make one go per channel.
for _, c := range channels {
go func(c <-chan big.Int) {
// Pump it.
for x := range c {
ch <- x
}
// It closed.
wg.Done()
}(c)
}
// Close the channel when the pumping is finished.
go func() {
// Wait for everyone to be done.
wg.Wait()
// Close.
close(ch)
}()
return ch
}
The only concession I have to make to concurrency here is to use a sync.WaitGroup as a counter for concurrent counting.
Note that this is not purely my own work - I had a great deal of help with this here.

Here is a good example from one of Go's inventors, Rob Pike, of using concurrency because it is an easier way to express the solution to a problem:
Lexical Scanning in Go
Generalizing on that a bit, any producer-consumer problem is a natural fit for 2 goroutines using a channel to pass outputs from the producer to the consumer.
Another good use for concurrency is interacting with multiple input/output sources (disks, network, terminal, etc.). Your program should be able to wake up and do some work whenever a result comes from any of these sources. It is possible to do this with one thread and a system call like poll(2) or select(2). When your thread wakes up, it must figure out which result came in, find where it left off in the relevant task, and pick up from there. That's a lot of code you need to write.
Writing that code is much easier using one goroutine per task. Then the state of that task is captured implicitly in the goroutine, and picking up where it left off is as simple as waking up and running.

My 2 cents... If you think about channels/goroutines only in the context of concurrency, you are missing the boat.
While go is not an object language or strictly a functional language, it does allow you to take design features from both and apply them.
One of the basic tenets of object oriented design is the Single Responsibility
Principle. Applying this principle forces you to think about design in terms of messages, rather than complex object behavior. These same design constraints can be used in go, to allow you to start thinking about "messages on channels" connecting single purpose functions.
This is just one example, but if you start thinking this way, you'll see many more.

Also not an expert at Go, so some of my approaches may be non-canonical, but here are some ways I've found concurrency useful so far:
Doing operations while waiting for a network request, disk I/O, or database query to finish
Executing divide-and-conquer algorithms more quickly
Since goroutines are functions, and functions are first-class citizens in Go, you can pass them around as variables. This is convenient when your program has many autonomous pieces. (For example, I'm playing around with simulating a city's traffic system. Each vehicle is its own goroutine, and they communicate with intersections and other vehicles by using channels. Each one does its own thing.)
Simultaneous I/O operations across different devices
Used concurrency to perform Dijkstra's algorithm on a set of points in an image to draw "intelligent scissor" lines -- one goroutine per point made this implementation significantly faster.
GoConvey uses concurrency to run tests across packages at the same time to respond faster to changes while debugging using the web UI. (As an inherent bonus, this adds some pseudo-randomness to the testing sequence so your test results are truly consistent.)
Concurrency could be (read: "is sometimes, but not necessarily always") useful when you have operations that could run independently of each other but would otherwise run sequentially. Even if those operations depend on data or some sort of signal from other goroutines at certain points, you can communicate that all with channels.
For some inspiration and an important distinction in these matters, and for some funny gopher pictures, see Concurrency is not Parallelism.

Any time an item of data doesn't depend on the previous one.

Multithreading: a blocking wait with timeout

I'm using TinyThread++ to get clean and simple platform independent control over threading features in my project. I just came upon a situation where I'd like to have responsive synchronized message passing without pegging the CPU, while allowing a thread to continue to do a bit of work on the side while it is idle. Sure, I could simply spawn a third thread to do this "other work" but all I'm missing is a condition variable wait(int ms) type function rather than the wait() that already works great. The idea is that I'd like for it to block only for up to ms milliseconds, so it will be able to time out and perform some actions periodically (during which the thread will not be actively waiting on the condition variable). The idea is that even though it's nice to have the thread sitting there waiting to pounce on any incoming messages, if I give it some task to do on the side which takes only 50 microseconds to execute, and I only need to run that once every second, it definitely shouldn't push me to make yet another thread (and message queue and other resources) to get it done.
Does any of this make sense? I'm looking for suggestions on how i might go about implementing this. I'm hoping adding a couple of lines to the TinyThread code can provide me with this functionality.

Well the source code for the wait function isn't very complicated so making the required modificiations looks simple enough:
The linux implementation relies on the pthread_cond_wait function
which can trivially be changed to the pthread_cond_timedwait
function. Do read the documentation carefully in case I forgot about any minutias.
On the windows side of things, it's a little more
complicated and I'm no expert on multithreading on windows. That
being said, if there's a timed version of the _wait function (I'm pretty sure there is),
changing that should work just fine. Again, read over the documentation carefully before doing any modifications.
Now before you go off and do these modifications, I don't think what you're trying to do is a good idea. The main advantage of using threads is to conceptually seperate different tasks. Trying to do multiple things in a single thread is a bit like trying to do multiple things in a single function: it complicates the design and makes things harder to debug. So unless the overhead of creating a new thread is provably too great or unless the resulting code remains simple and easy to understand, I'd split it up into multiple threads.
Finally, I get the feeling that you might not be aware that condition variables can return spuriously (returns without anybody having done any signalling or returns when the condition is still false). So just in case, I'd suggest reviewing the usage examples and making sure you understand why those loops are there.

Testing concurrent data structure

What are some methods for testing concurrent data structures to make sure the data structs behave correctly when accessed from multiple threads ?

All of the other answers have focused on actually testing the code by putting it through its paces and actually running it in one form or another or politely saying "don't do it yourself, use an existing library".
This is great and all, but IMO, the most important (practical tests are important too) test is to look at the code line by line and for every line of code ask "what happens if I get interrupted by another thread here?" Imagine another thread, running just about any of the other lines/functions during this interruption. Do things still stay consistent? When competing for resources, does the other thread[s] block or spin?
This is what we did in school when learning about concurrency and it is a surprisingly effective approach. Bottom line, I feel that taking the time to prove to yourself that things are consistent and work as expected in all states is the first technique you should use when dealing with this stuff.

Concurrent systems are probabilistic and errors are often difficult to replicate. Therefore you need to run various input/output cases, each tested over time (hours, days, etc) in order to detect possible errors.
Tests for concurrent data structure involves examining the container's state before and after expected events such as insert and delete.

Use a pre-existing, pre-tested library that meets your needs if possible.
Make sure that the code has appropriate self-consistency checks (preferably fast sanity checks), and run your code on as many different types of hardware as possible to help narrow down interesting timing problems.
Have multiple people peer review the code, preferably without a pre-explanation of how it's supposed to work. That way they have to grok the code which should help catch more bugs.
Set up a bunch of threads that do nothing but random operations on the data structures and check for consistency at some rate.

Start with the assumption that your calls to access/modify data are not thread safe and use locks to ensure only a single thread can access/modify any part of the data at a time. Only after you can prove to yourself that a specific type of access is safe outside of the lock by multiple threads at once should you move that code outside of the lock.
Assume worst case scenarios, e.g. that your code will stop right in the middle of some pointer manipulation or another critical point, and that another thread will encounter that data in mid-transition. If that would have a bad result, leave it within the lock.

I normally test these kinds of things by interjecting sleep() calls at appropriate places in the distributed threads/processes.
For instance, to test a lock, put sleep(2) in all your threads at the point of contention, and spawn two threads roughly 1 second apart. The first one should obtain the lock, and the second should have to wait for it.
Most race conditions can be tested by extending this method, but if your system has too many components it may be difficult or impossible to know every possible condition that needs to be tested.

Run your concurrent threads for one or a few days and look what happens. (Sounds strange, but finding out race conditions is such a complex topic that simply trying it is the best approach).

How to keep asynchronous parallel program code manageable (for example in C++)

I am currently working on a server application that needs to control a collection devices over a network. Because of this, we need to do a lot of parallel programming. Over time, I have learned that there are three approaches to communication between processing entities (threads/processes/applications). Regrettably, all three approaches have their disadvantages.
A) You can make a synchronous request (a synchronous function call). In this case, the caller waits until the function is processed and the response has been received. For example:
const bool convertedSuccessfully = Sync_ConvertMovie(params);
The problem is that the caller is idling. Sometimes this is just not an option. For example, if the call was made by the user interface thread, it will seem like the application has blocked until the response arrives, which can take a long time.
B) You can make an asynchronous request and wait for a callback to be made. The client code can continue with whatever needs to be done.
Async_ConvertMovie(params, TheFunctionToCallWhenTheResponseArrives);
This solution has the big disadvantange that the callback function necessarily runs in a separate thread. The problem is now that it is hard to get the response back to the caller. For example, you have clicked a button in a dialog, which called a service asynchronlously, but the dialog has been long closed when the callback arrives.
void TheFunctionToCallWhenTheResponseArrives()
{
//Difficulty 1: how to get to the dialog instance?
//Difficulty 2: how to guarantee in a thread-safe manner that
// the dialog instance is still valid?
}
This in itself is not that big a problem. However, when you want to make more than one of such calls, and they all depend on the response of the previous one, this becomes in my experience unmanageably complex.
C) The last option I see is to make an asynchronous request and keep polling until the response has arrived. In between the has-the-response-arrived-yet checks, you can do something useful. This is the best solution I know of to solve the case in which there is a sequence of asynchronous function calls to make. This is because it has the big advantage that you still have the whole caller context around when the response arrives. Also, the logical sequence of the calls remains reasonably clear. For example:
const CallHandle c1 = Sync_ConvertMovie(sourceFile, destFile);
while(!c1.ResponseHasArrived())
{
//... do something in the meanwhile
}
if (!c1.IsSuccessful())
return;
const CallHandle c2 = Sync_CopyFile(destFile, otherLocation);
while(!c1.ResponseHasArrived())
{
//... do something in the meanwhile
}
if (c1.IsSuccessful())
//show a success dialog
The problem with this third solution is that you cannot return from the caller's function. This makes it unsuitable if the work you want to do in between has nothing to do at all with the work you are getting done asynchronously. For a long time I am wondering if there is some other possibility to call functions asynchronously, one that doesn't have the downsides of the options listed above. Does anyone have an idea, some clever trick perhaps?
Note: the example given is C++-like pseudocode. However, I think this question equally applies to C# and Java, and probably a lot of other languages.

You could consider an explicit "event loop" or "message loop", not too different from classic approaches such as a select loop for asynchronous network tasks or a message loop for a windowing system. Events that arrive may be dispatched to a callback when appropriate, such as in your example B, but they may also in some cases be tracked differently, for example to cause transactions in a finite state machine. A FSM is a fine way to manage the complexity of an interaction along a protocol that requires many steps, after all!
One approach to systematize these consideration starts with the Reactor design pattern.
Schmidt's ACE body of work is a good starting point for these issues, if you come from a C++ background; Twisted is also quite worthwhile, from a Python background; and I'm sure that similar frameworks and sets of whitepapers exist for, as you say, "a lot of other languages" (the Wikipedia URL I gave does point at Reactor implementations for other languages, besides ACE and Twisted).

I tend to go with B, but instead of calling forth and back, I'd do the entire processing including follow-ups on a separate thread. The main thread can meanwhile update the GUI and either actively wait for the thread to complete (i.e. show a dialog with a progress bar), or just let it do its thing in the background and pick up the notification when it's done. No complexity problems so far, since the entire processing is actually synchronous from the processing thread's point of view. From the GUI's point of view, it's asynchronous.
Adding to that, in .NET it's no problem to switch to the GUI thread. The BackgroundWorker class and the ThreadPool make this easy as well (I used the ThreadPool, if I remember correctly). In Qt, for example, to stay with C++, it's quite easy as well.
I used this approach on our last major application and am very pleased with it.

Like Alex said, look at Proactor and Reactor as documented by Doug Schmidt in Patterns of Software Architecture.
There are concrete implementations of these for different platforms in ACE.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js