anonymous struct and empty struct - concurrency

http://play.golang.org/p/vhaKi5uVmm
package main
import "fmt"
var battle = make(chan string)
func warrior(name string, done chan struct{}) {
select {
case opponent := <-battle:
fmt.Printf("%s beat %s\n", name, opponent)
case battle <- name:
// I lost :-(
}
done <- struct{}{}
}
func main() {
done := make(chan struct{})
langs := []string{"Go", "C", "C++", "Java", "Perl", "Python"}
for _, l := range langs { go warrior(l, done) }
for _ = range langs { <-done }
}
[1st Question]
done <- struct{}{}
How and Why do we need this weird-looking struct? Is it empty struct or anonymous struct? I googled it but couldn't find the right answer or documentation to explain about this.
The original source is from Andrew Gerrand's talk
http://nf.wh3rd.net/10things/#10
Here
make(chan struct{})
done is a channel of type struct{}
So I tried with
done <- struct{}
But it is not working. Why do I need an extra brackets for this line?
done <- struct{}{}
[2nd Question]
for _ = range langs { <-done }
Why do I need this line? I know that this line is necessary because without this line, no output. But Why and what does this line do? And what makes it necessary in this code? I know that <-done is to receive values from the channel done and discard the received values. But why do I need to do this?

Note that one interesting aspect of using struct{} for the type pushed to a channel (as opposed to int or bool), is that the size of an empty struct is... 0!
See the recent article "The empty struct" (March 2014) by Dave Cheney.
You can create as many struct{} as you want (struct{}{}) to push them to your channel: your memory won't be affected.
But you can use it for signaling between go routines, as illustrated in "Curious Channels".
finish := make(chan struct{})
As the behaviour of the close(finish) relies on signalling the close of the channel, not the value sent or received, declaring finish to be of type chan struct{} says that the channel contains no value; we’re only interested in its closed property.
And you retain all the other advantages linked to a struct:
you can define methods on it (that type can be a method receiver)
you can implement an interface (with said methods you just define on your empty struct)
as a singleton
in Go you can use an empty struct, and store all your data in global variables. There will only be one instance of the type, since all empty structs are interchangeable.
See for instance the global var errServerKeyExchange in the file where the empty struct rsaKeyAgreement is defined.

Composite literals
Composite literals construct values for structs, arrays, slices, and
maps and create a new value each time they are evaluated. They consist
of the type of the value followed by a brace-bound list of composite
elements. An element may be a single expression or a key-value pair.
struct{}{} is a composite literal of type struct{}, the type of the value followed by a brace-bound list of composite elements.
for _ = range langs { <-done } is waiting until all the goroutines for all the langs have sent done messages.

struct{} is a type (in particular, a structure with no members). If you have a type Foo, you can create a value of that type in an expression with Foo{field values, ...}. Putting this together, struct{}{} is a value of the type struct{}, which is what the channel expects.
The main function spawns warrior goroutines, which will write to the done channel when they have finished. The last for block reads from this channel, ensuring that main won't return until all the goroutines have finished. This is important because the program will exit when main completes, irrespective of whether there are other goroutines running.

Good questions,
The whole point of the struct channel in this scenario is simply to signal the completion that something useful has happened. The channel type doesn't really matter, he could have used an int or a bool to accomplish the same effect. What's important is that his code is executing in a synchronized fashion where he's doing the necessary bookkeeping to signal and move on at key points.
I agree the syntax of struct{}{} looks odd at first because in this example he is declaring a struct and creating it in-line hence the second set of brackets.
If you had a pre-existing object like:
type Book struct{
}
You could create it like so: b := Book{}, you only need one set of brackets because the Book struct has already been declared.

done channel is used to receive notifications from warrior method that indicates the worker is done processing. So the channel can be anything, for example:
func warrior(name string, done chan bool) {
select {
case opponent := <-battle:
fmt.Printf("%s beat %s\n", name, opponent)
case battle <- name:
// I lost :-(
}
done <- true
}
func main() {
done := make(chan bool)
langs := []string{"Go", "C", "C++", "Java", "Perl", "Python"}
for _, l := range langs { go warrior(l, done) }
for _ = range langs { <-done }
}
We declare done := make(chan bool) as a channel that receives bool value, and send true at the end of warrior instead. This works! You can also define the done channel to any other type, it won't matter.
1. So what is with the weird done <- struct{}{}?
It is just another type that will be passed to channel. This is an empty struct, if you are familiar with the following:
type User struct {
Name string
Email string
}
struct{} makes no difference except it contains no fields, and struct{}{} is just an instance out of it. The best feature is it does not cost memory space!
2. for loop usage
We create 6 goroutines to run in the background with this line:
for _, l := range langs { go warrior(l, done) }
We use the for _ = range langs { <-done }, because the main goroutine(where main function runs) does not wait for goroutins to finish.
If we does not include the last for line, chances are we see no outputs(because main goroutines quits before any child goroutines executes fmt.Printf code, and when main goroutine quits, all child goroutines will quit with it, and will not have any chance to run anyway).
So we wait for all goroutines to finish(it runs to the end, and send a message to the done channel), then exits. done channel here is a blocked channel, which means <-done will block here until a message is received from the channel.
We have 6 goroutines in the background, and use for loop, we wait until all goroutines send a message which means it finished running(because the done <-struct{}{} is at the the end of function).

Related

How to check if sync.WaitGroup.Done() is called in unit test

Let's say I have a function if that is executed asynchronous as a go routine:
func f(wg *sync.WaitGroup){
defer wg.Done()
// Do sth
}
main(){
var wg sync.WaitGroup
wg.Add(1)
go f(&wg)
wg.Wait() // Wait until f is done
// ...
}
How would I create a unit test for f that makes sure wg.Done() is called?
One option is to call wg.Done() in the test directly after the f is called. If f fails to call wg.Done() the test will panic which is not nice.
Another option would be to create an interface for sync.WaitGroup but that seems a bit weird.
How would I create a unit test for f that makes sure wg.Done() is called?
Something like this:
func TestF(t *testing.T) {
wg := &sync.WaitGroup{}
wg.Add(1)
// run the task asynchronously
go f(wg)
// wait for the WaitGroup to be done, or timeout
select {
case <-wrapWait(wg):
// all good
case <-time.NewTimer(500 * time.Millisecond).C:
t.Fail()
}
}
// helper function to allow using WaitGroup in a select
func wrapWait(wg *sync.WaitGroup) <-chan struct{} {
out := make(chan struct{})
go func() {
wg.Wait()
out <- struct{}{}
}()
return out
}
You don't inspect the WaitGroup directly, which you can't do anyway. Instead you assert that the function behaves as expected, given the expected input.
In this case, the expected input is the WaitGroup argument and the expected behavior is that wg.Done() gets called eventually. What does that mean, in practice? It means that if the function is successful a WaitGroup with count 1 will reach 0 and allow wg.Wait() to proceed.
The statement defer wg.Done() at the beginning of f already makes sure that the test is resilient to errors or crashes. The addition of a timeout is simply to make sure the test will complete within a reasonable time, i.e. that it doesn't stall your test suite for too long. Personally, I prefer using explicit timeouts, either with timers or with contexts, to 1) avoid problems if someone forgets to set timeouts at the CI level, 2) make the time ceiling available to anyone who checks out the repo and runs the test suite, i.e. avoid dependencies on IDE configs or whatnot.

Rust async-await: check if any future in a list resolves to true concurrently?

I'm trying to run a list of futures concurrently (instead of in sequence) in Rust async-await (being stabilized soon), until any of them resolves to true.
Imagine having a Vec<File>, and a future to run for each file yielding a bool (may be unordered). Here would be a simple sequenced implementation.
async fn my_function(files: Vec<File>) -> bool {
// Run the future on each file, return early if we received true
for file in files {
if long_future(file).await {
return true;
}
}
false
}
async fn long_future(file: File) -> bool {
// Some long-running task here...
}
This works, but I'd like to run a few of these futures concurrently to speed up the process. I came across buffer_unordered() (on Stream), but couldn't figure out how to implement this.
As I understand it, something like join can be used as well to run futures concurrently, given that you gave a multithreaded pool. But I don't see how that could efficiently be used here.
I attempted something like this, but couldn't get it to work:
let any_true = futures::stream::iter(files)
.buffer_unordered(4) // Run up to 4 concurrently
.map(|file| long_future(file).await)
.filter(|stop| stop) // Only propagate true values
.next() // Return early on first true
.is_some();
Along with that, I'm looking for something like any as used in iterators, to replace the if-statement or the filter().next().is_some() combination.
How would I go about this?
I think that you should be able to use select_ok, as mentioned by Some Guy. An example, in which I've replaced the files with a bunch of u32 for illustration:
use futures::future::FutureExt;
async fn long_future(file: u32) -> bool {
true
}
async fn handle_file(file: u32) -> Result<(), ()> {
let should_stop = long_future(file).await;
// Would be better if there were something more descriptive here
if should_stop {
Ok(())
} else {
Err(())
}
}
async fn tims_answer(files: Vec<u32>) -> bool {
let waits = files.into_iter().map(|f| handle_file(f).boxed());
let any_true = futures::future::select_ok(waits).await.is_ok();
any_true
}

How to monitor changes with Condvar and Mutex

I have a shared Vec<CacheChange>. Whenever a new CacheChange is written I want to wake up readers. I recall that a Condvar is good for signaling when a predicate/situation is ready, namely, when the Vec is modified.
So I spent some time creating a Monitor abstraction to own the Vec and provide wait and lock semantics.
The problem now is I don't know when to reset the Condvar. What is a good way to give a reasonable amount of time to readers to hit the predicate and work their way to holding the lock? before closing the condvar? Am I approach Condvars the wrong way?
This is Rust code but this more a question of fundamentals for exact concurrent access/notification between multiple readers.
pub struct Monitor<T>(
sync::Arc<MonitorInner<T>>
);
struct MonitorInner<T> {
data: sync::Mutex<T>,
predicate: (sync::Mutex<bool>, sync::Condvar)
}
impl<T> Monitor<T> {
pub fn wait(&self) -> Result<(),sync::PoisonError<sync::MutexGuard<bool>>> {
let mut open = try!(self.0.predicate.0.lock());
while !*open {
open = try!(self.0.predicate.1.wait(open));
}
Ok(())
}
pub fn lock(&self) -> Result<sync::MutexGuard<T>, sync::PoisonError<sync::MutexGuard<T>>> {
self.0.data.lock()
}
pub fn reset(&mut self) -> Result<(),sync::PoisonError<sync::MutexGuard<bool>>> {
let mut open = try!(self.0.predicate.0.lock());
*open = false;
Ok(())
}
pub fn wakeup_all(&mut self) -> Result<(),sync::PoisonError<sync::MutexGuard<bool>>> {
let mut open = try!(self.0.predicate.0.lock());
*open = true;
self.0.predicate.1.notify_all();
Ok(())
}
}
After the first wakeup call, my readers are able to miss reads. Probably because they are still holding the data lock while the predicate has been toggled again.I've seen this in my test code with just one reader and one writer.
Then there's the complication of when to reset the Monitor, ideally it would be locked after all readers had their chance to look at the data. This could cause deadlock issues if the reader ignore their monitors (no guarantee they should service every wakeup call).
Do I need to use some kind of reader tracking system with timeouts and track when new data arrives while monitor reads are still being serviced? Is there an existing paradigm I should be aware of?
The simplest solution is to use a counter instead of a boolean.
struct MonitorInner<T> {
data: sync::Mutex<T>,
signal: sync::Condvar,
counter: sync::AtomicUsize,
}
Then, every time an update is done, the counter is incremented. It is never reset, so there is no question about when to reset.
Of course, it means that readers should remember the value of the counter the last time they were woken up.

RxCpp: observer's lifetime if using observe_on(rxcpp::observe_on_new_thread())

What is the proper way to wait until all the observers on_completed are called if the observers are using observe_on(rxcpp::observe_on_new_thread()):
For example:
{
Foo foo;
auto generator = [&](rxcpp::subscriber<int> s)
{
s.on_next(1);
// ...
s.on_completed();
};
auto values = rxcpp::observable<>::create<int>(generator).publish();
auto s1 = values.observe_on(rxcpp::observe_on_new_thread())
.subscribe([&](int) { slow_function(foo); }));
auto lifetime = rxcpp::composite_subscription();
lifetime.add([&](){ wrapper.log("unsubscribe"); });
auto s2 = values.ref_count().as_blocking().subscribe(lifetime);
// hope to call something here to wait for the completion of
// s1's on_completed function
}
// the program usually crashes here when foo goes out of scope because
// the slow_function(foo) is still working on foo. I also noticed that
// s1's on_completed never got called.
My question is how to wait until s1's on_completed is finished without having to set and poll some variables.
The motivation of using observe_on() is because there are usually multiple observers on values, and I would like each observer to run concurrently. Perhaps there are different ways to achieve the same goal, I am open to all your suggestions.
Merging the two will allow a single blocking subscribe to wait for both to finish.
{
Foo foo;
auto generator = [&](rxcpp::subscriber<int> s)
{
s.on_next(1);
s.on_next(2);
// ...
s.on_completed();
};
auto values = rxcpp::observable<>::create<int>(generator).publish();
auto work = values.
observe_on(rxcpp::observe_on_new_thread()).
tap([&](int c) {
slow_function(foo);
}).
finally([](){printf("s1 completed\n");}).
as_dynamic();
auto start = values.
ref_count().
finally([](){printf("s2 completed\n");}).
as_dynamic();
// wait for all to finish
rxcpp::observable<>::from(work, start).
merge(rxcpp::observe_on_new_thread()).
as_blocking().subscribe();
}
A few points.
the stream must return the same type for merge to work. if combining streams of different types, use combine_latest instead.
the order of the observables in observable<>::from() is important, the start stream has ref_count, so it must be called last so that the following merge will have subscribed to the work before starting the generator.
The merge has two threads calling it. This requires that a thread-safe coordination be used. rxcpp is pay-for-use. by default the operators assume that all the calls are from the same thread. any operator that gets calls from multiple threads needs to be given a thread-safe coordination which the operator uses to impose thread-safe state management and output calls.
If desired the same coordinator instance could be used for both.

Pipe Future result to self

Is it safe to pipe a Future's result directly to 'self'?
Within an actor:
Future(hardWork()).pipeTo(self)
Or must we assign to a val:
val me = self
Future(hardWork()).pipeTo(me)
Everything in your code is safe, because you are not closing over anything. You are just calling a regular method pipeTo and passing in a regular parameter. Only closing over something (like you did in your own answer) might be dangerous, but in the case of self there is no danger because self is not mutable.
Apparently 'self' is safe, so no need for the val me = self.
// Completely safe, "self" is OK to close over
// and it's an ActorRef, which is thread-safe
Future { expensiveCalculation() } onComplete { f => self ! f.value.get }
http://doc.akka.io/docs/akka/2.2.3/general/jmm.html#Actors_and_shared_mutable_state