Related
My application is futures-based with async/await, and has the following structure within one of its components:
a "manager", which is responsible for starting/stopping/restarting "workers", based both on external input and on the current state of "workers";
a dynamic set of "workers", which perform some continuous work, but may fail or be stopped externally.
A worker is just a spawned task which does some I/O work. Internally it is a loop which is intended to be infinite, but it may exit early due to errors or other reasons, and in this case the worker must be restarted from scratch by the manager.
The manager is implemented as a loop which awaits on several channels, including one returned by async_std::stream::interval, which essentially makes the manager into a poller - and indeed, I need this because I do need to poll some Mutex-protected external state. Based on this state, the manager, among everything else, creates or destroys its workers.
Additionally, the manager stores a set of async_std::task::JoinHandles representing live workers, and it uses these handles to check whether any workers has exited, restarting them if so. (BTW, I do this currently using select(handle, future::ready()), which is totally suboptimal because it relies on the select implementation detail, specifically that it polls the left future first. I couldn't find a better way of doing it; something like race() would make more sense, but race() consumes both futures, which won't work for me because I don't want to lose the JoinHandle if it is not ready. This is a matter for another question, though.)
You can see that in this design workers can only be restarted when the next poll "tick" in the manager occurs. However, I don't want to use a too small interval for polling, because in most cases polling just wastes CPU cycles. Large intervals, however, can delay restarting a failed/canceled worker by too much, leading to undesired latencies. Therefore, I though I'd set up another channel of ()s back from each worker to the manager, which I'd add to the main manager loop, so when a worker stops due to an error or otherwise, it will first send a message to its channel, resulting in the manager being woken up earlier than the next poll in order to restart the worker right away.
Unfortunately, with any kinds of channels this might result in more polls than needed, in case two or more workers stop at approximately the same time (which due to the nature of my application, is somewhat likely to happen). In such case it would make sense to only run the manager loop once, handling all of the stopped workers, but with channels it will necessarily result in the number of polls equal to the number of stopped workers, even if additional polls don't do anything.
Therefore, my question is: how do I notify the manager from its workers that they are finished, without resulting in extra polls in the manager? I've tried the following things:
As explained above, regular unbounded channels just won't work.
I thought that maybe bounded channels could work - if I used a channel with capacity 0, and there was a way to try and send a message into it but just drop the message if the channel is full (like the offer() method on Java's BlockingQueue), this seemingly would solve the problem. Unfortunately, the channels API, while providing such a method (try_send() seems to be like it), also has this property of having capacity larger than or equal to the number of senders, which means it can't really be used for such notifications.
Some kind of atomic or a mutex-protected boolean flag also look as if it could work, but there is no atomic or mutex API which would provide a future to wait on, and would also require polling.
Restructure the manager implementation to include JoinHandles into the main select somehow. It might do the trick, but it would result in large refactoring which I'm unwilling to make at this point. If there is a way to do what I want without this refactoring, I'd like to use that first.
I guess some kind of combination of atomics and channels might work, something like setting an atomic flag and sending a message, and then skipping any extra notifications in the manager based on the flag (which is flipped back to off after processing one notification), but this also seems like a complex approach, and I wonder if anything simpler is possible.
I recommend using the FuturesUnordered type from the futures crate. This collection allows you to push many futures of the same type into a collection and wait for any one of them to complete at once.
It implements Stream, so if you import StreamExt, you can use unordered.next() to obtain a future that completes once any future in the collection completes.
If you also need to wait for a timeout or mutex etc., you can use select to create a future that completes once either the timeout or one of the join handles completes. The future returned by next() implements Unpin, so it is usable with select without problems.
I have written some actor classes and I find that I have to get a handle into the lifecycle of these entities. For example whenever my actor is initialized I would like a method to be called so that I can setup some listeners on message queues (or open db connections etc).
Is there an equivalent of this? The equivalent I can think of is Spring's InitialisingBean and DisposableBean
This is a typical scenario where you would override methods like preStart(), postStop(), etc. I don't see anything wrong with this.
Of course you have to be aware of the details - for example postStop() is called asynchronously after actor.stop() is invoked while preStart() is called when an Actor is started. This means that potentially slow/blocking things like DB interaction should be kept to a minimum.
You can also use the Actor's constructor for initialization of data.
As Matthew mentioned, supervision plays a big part in Akka - so you can instruct the supervisor to perform specific stuff on events. For example the so-called DeathWatch - you can be notified if one of the actors "you are watching upon" dies:
context.watch(child)
...
def receive = {
case Terminated(`child`) => lastSender ! "finished"
}
An Actor is basically two methods -- a constructor, and onMessage(Object): void.
There's nothing in its lifecycle that naturally provides for "wiring" behavior, which leaves you with a few options.
Use a Supervisor actor to create your other actors. A Supervisor is responsible for watching, starting and restarting Actors on failure -- and therefore it is often valuable to have a Supervisor that understands the state of integrated systems to avoid continously restarting. This Supervisor would create and manage Service objects (possibly via Spring) and pass them to Actors.
Use your preferred Initialization technique at the time of Actor construction. It's tricky but you can certainly combine Spring with Actors. Just be aware that should a Supervisor restart your actor, you'll need to be able to resurrect its desired state from whatever content you placed in the Props object you used to start it in the first place.
Wire everything on-demand. Open connections on demand when an Actor starts (and cache them as necessary). I find I do this fairly often -- and I let the Actor fail when its connections no longer work. The supervisor will restart the Actor, which will recreate all connections.
Something important things to remember:
The intent of Actor model is that Actors don't run continuously -- they only run when there are messages provided to them. If you add a message listener to an Actor, you are essentially adding new threads that can access that actor. This can be a problem if you use supervision -- a restarted actor may leak that thread and this may in turn cause the actor not to be garbage collected. It can also be a problem because it introduces a race condition, and part of the value of actors is avoiding that.
An Actor that does I/O is, from the perspective of the actor system, blocking. If you have too many Actors doing I/O at the same time, you will exhaust your Dispatcher's thread pool and lock up the system.
A given Actor instance can operate on many different threads over its lifetime, but will only operate on one thread at a time. This can be confusing to some messaging systems -- for example, JMS' Spec asserts that a Session not be used on multiple threads, and many JMS interpret this as "can only run on the thread on which it was started." You may see warnings, or even exceptions, resulting from this.
For these reasons, I prefer to use non-actor code to do some of my I/O. For example, I'll have an incoming message listener object whose responsibility is to take JMS messages off a queue, use them to create POJO messages, and send tells to the Actor system. Alternately, I'll use an Actor, but place that actor on a custom Dispatcher that has thread pinning enabled. This assures that that Actor will only run on a specific thread and won't block up the system that other non-I/O actors are using.
If say the code that my actor uses (a code I have no control over) throws an unhandled exception, could that result into the whole actor system process to crash or each actor is running in some kind of special container?
To clarify more, in my use case, I want each actor to load (at run time) some user written code/lib and call some interface methods on them. These libs maybe buggy and can potentially result in my actor system os process to die or halt or something like that. I mean what if the code that actor calls does something that halt (like accessing a remote resource by a buggy client or a dead loop) or even call Enviroment.exit() or something of bad nature.
I mean if my requirement is to allow each actor to load code that I do not have control over, how can I guard my actor system against them? Do I even have to do this?
One way that I can think the whole actor system OS process guard itself against these third party code is to run each actor inside some kind of a container or event have one actor system per actor on the local machine that my actor controls? Do I have to go this far or akka already takes care of this for me and any failure at actor level would not jeopardize the whole actor system and its process??
If the JVM process dies, the JVM process dies. You get around that by using Akka Cluster so you can observe and react to node failures.
I am trying to understand how Ember RunLoop works and what makes it tick. I have looked at the documentation, but still have many questions about it. I am interested in understanding better how RunLoop works so I can choose appropriate method within its name space, when I have to defer execution of some code for later time.
When does Ember RunLoop start. Is it dependant on Router or Views or Controllers or something else?
how long does it approximately take (I know this is rather silly to asks and dependant on many things but I am looking for a general idea, or maybe if there is a minimum or maximum time a runloop may take)
Is RunLoop being executed at all times, or is it just indicating a period of time from beginning to end of execution and may not run for some time.
If a view is created from within one RunLoop, is it guaranteed that all its content will make it into the DOM by the time the loop ends?
Forgive me if these are very basic questions, I think understanding these will help noobs like me use Ember better.
Update 10/9/2013: Check out this interactive visualization of the run loop: https://machty.s3.amazonaws.com/ember-run-loop-visual/index.html
Update 5/9/2013: all the basic concepts below are still up to date, but as of this commit, the Ember Run Loop implementation has been split off into a separate library called backburner.js, with some very minor API differences.
First off, read these:
http://blog.sproutcore.com/the-run-loop-part-1/
http://blog.sproutcore.com/the-run-loop-part-2/
They're not 100% accurate to Ember, but the core concepts and motivation behind the RunLoop still generally apply to Ember; only some implementation details differ. But, on to your questions:
When does Ember RunLoop start. Is it dependant on Router or Views or Controllers or something else?
All of the basic user events (e.g. keyboard events, mouse events, etc) will fire up the run loop. This guarantees that whatever changes made to bound properties by the captured (mouse/keyboard/timer/etc) event are fully propagated throughout Ember's data-binding system before returning control back to the system. So, moving your mouse, pressing a key, clicking a button, etc., all launch the run loop.
how long does it approximately take (I know this is rather silly to asks and dependant on many things but I am looking for a general idea, or maybe if there is a minimum or maximum time a runloop may take)
At no point will the RunLoop ever keep track of how much time it's taking to propagate all the changes through the system and then halt the RunLoop after reaching a maximum time limit; rather, the RunLoop will always run to completion, and won't stop until all the expired timers have been called, bindings propagated, and perhaps their bindings propagated, and so on. Obviously, the more changes that need to be propagated from a single event, the longer the RunLoop will take to finish. Here's a (pretty unfair) example of how the RunLoop can get bogged down with propagating changes compared to another framework (Backbone) that doesn't have a run loop: http://jsfiddle.net/jashkenas/CGSd5/ . Moral of the story: the RunLoop's really fast for most things you'd ever want to do in Ember, and it's where much of Ember's power lies, but if you find yourself wanting to animate 30 circles with Javascript at 60 frames per second, there might be better ways to go about it than relying on Ember's RunLoop.
Is RunLoop being executed at all times, or is it just indicating a period of time from beginning to end of execution and may not run for some time.
It is not executed at all times -- it has to return control back to the system at some point or else your app would hang -- it's different from, say, a run loop on a server that has a while(true) and goes on for infinity until the server gets a signal to shut down... the Ember RunLoop has no such while(true) but is only spun up in response to user/timer events.
If a view is created from within one RunLoop, is it guaranteed that all its content will make it into the DOM by the time the loop ends?
Let's see if we can figure that out. One of the big changes from SC to Ember RunLoop is that, instead of looping back and forth between invokeOnce and invokeLast (which you see in the diagram in the first link about SproutCore's RL), Ember provides you a list of 'queues' that, in the course of a run loop, you can schedule actions (functions to be called during the run loop) to by specifying which queue the action belongs in (example from the source: Ember.run.scheduleOnce('render', bindView, 'rerender');).
If you look at run_loop.js in the source code, you see Ember.run.queues = ['sync', 'actions', 'destroy', 'timers'];, yet if you open your JavaScript debugger in the browser in an Ember app and evaluate Ember.run.queues, you get a fuller list of queues: ["sync", "actions", "render", "afterRender", "destroy", "timers"]. Ember keeps their codebase pretty modular, and they make it possible for your code, as well as its own code in a separate part of the library, to insert more queues. In this case, the Ember Views library inserts render and afterRender queues, specifically after the actions queue. I'll get to why that might be in a second. First, the RunLoop algorithm:
The RunLoop algorithm is pretty much the same as described in the SC run loop articles above:
You run your code between RunLoop .begin() and .end(), only in Ember you'll want to instead run your code within Ember.run, which will internally call begin and end for you. (Only internal run loop code in the Ember code base still uses begin and end, so you should just stick with Ember.run)
After end() is called, the RunLoop then kicks into gear to propagate every single change made by the chunk of code passed to the Ember.run function. This includes propagating the values of bound properties, rendering view changes to the DOM, etc etc. The order in which these actions (binding, rendering DOM elements, etc) are performed is determined by the Ember.run.queues array described above:
The run loop will start off on the first queue, which is sync. It'll run all of the actions that were scheduled into the sync queue by the Ember.run code. These actions may themselves also schedule more actions to be performed during this same RunLoop, and it's up to the RunLoop to make sure it performs every action until all the queues are flushed. The way it does this is, at the end of every queue, the RunLoop will look through all the previously flushed queues and see if any new actions have been scheduled. If so, it has to start at the beginning of the earliest queue with unperformed scheduled actions and flush out the queue, continuing to trace its steps and start over when necessary until all of the queues are completely empty.
That's the essence of the algorithm. That's how bound data gets propagated through the app. You can expect that once a RunLoop runs to completion, all of the bound data will be fully propagated. So, what about DOM elements?
The order of the queues, including the ones added in by the Ember Views library, is important here. Notice that render and afterRender come after sync, and action. The sync queue contains all the actions for propagating bound data. (action, after that, is only sparsely used in the Ember source). Based on the above algorithm, it is guaranteed that by the time the RunLoop gets to the render queue, all of the data-bindings will have finished synchronizing. This is by design: you wouldn't want to perform the expensive task of rendering DOM elements before sync'ing the data-bindings, since that would likely require re-rendering DOM elements with updated data -- obviously a very inefficient and error-prone way of emptying all of the RunLoop queues. So Ember intelligently blasts through all the data-binding work it can before rendering the DOM elements in the render queue.
So, finally, to answer your question, yes, you can expect that any necessary DOM renderings will have taken place by the time Ember.run finishes. Here's a jsFiddle to demonstrate: http://jsfiddle.net/machty/6p6XJ/328/
Other things to know about the RunLoop
Observers vs. Bindings
It's important to note that Observers and Bindings, while having the similar functionality of responding to changes in a "watched" property, behave totally differently in the context of a RunLoop. Binding propagation, as we've seen, gets scheduled into the sync queue to eventually be executed by the RunLoop. Observers, on the other hand, fire immediately when the watched property changes without having to be first scheduled into a RunLoop queue. If an Observer and a binding all "watch" the same property, the observer will always be called 100% of the time earlier than the binding will be updated.
scheduleOnce and Ember.run.once
One of the big efficiency boosts in Ember's auto-updating templates is based on the fact that, thanks to the RunLoop, multiple identical RunLoop actions can be coalesced ("debounced", if you will) into a single action. If you look into the run_loop.js internals, you'll see the functions that facilitate this behavior are the related functions scheduleOnce and Em.run.once. The difference between them isn't so important as knowing they exist, and how they can discard duplicate actions in queue to prevent a lot of bloated, wasteful calculation during the run loop.
What about timers?
Even though 'timers' is one of the default queues listed above, Ember only makes reference to the queue in their RunLoop test cases. It seems that such a queue would have been used in the SproutCore days based on some of the descriptions from the above articles about timers being the last thing to fire. In Ember, the timers queue isn't used. Instead, the RunLoop can be spun up by an internally managed setTimeout event (see the invokeLaterTimers function), which is intelligent enough to loop through all the existing timers, fire all the ones that have expired, determine the earliest future timer, and set an internal setTimeout for that event only, which will spin up the RunLoop again when it fires. This approach is more efficient than having each timer call setTimeout and wake itself up, since in this case, only one setTimeout call needs to be made, and the RunLoop is smart enough to fire all the different timers that might be going off at the same time.
Further debouncing with the sync queue
Here's a snippet from the run loop, in the middle of a loop through all the queues in the run loop. Note the special case for the sync queue: because sync is a particularly volatile queue, in which data is being propagated in every direction, Ember.beginPropertyChanges() is called to prevent any observers from being fired, followed by a call to Ember.endPropertyChanges. This is wise: if in the course of flushing the sync queue, it's entirely possible that a property on an object will change multiple times before resting on its final value, and you wouldn't want to waste resources by immediately firing observers per every single change.
if (queueName === 'sync')
{
log = Ember.LOG_BINDINGS;
if (log)
{
Ember.Logger.log('Begin: Flush Sync Queue');
}
Ember.beginPropertyChanges();
Ember.tryFinally(tryable, Ember.endPropertyChanges);
if (log)
{
Ember.Logger.log('End: Flush Sync Queue');
}
}
else
{
forEach.call(queue, iter);
}
Well my problem is the following. I have a piece of code that runs on several virtual machines, and each virtual machine has N interfaces(a thread per each). The problem itself is receiving a message on one interface and redirect it through another interface in the fastest possible manner.
What I'm doing is, when I receive a message on one interface(Unicast), calculate which interface I want to redirect it through, save all the information about the message(Datagram, and all the extra info I want) with a function I made. Then on the next iteration, the program checks if there are new messages to redirect and if it is the correct interface reading it. And so on... But this makes the program exchange information very slowly...
Is there any mechanism that can speed things up?
Somebody has already invented this particular wheel - it's called MPI
Take a look at either openMPI or MPICH
Why don't you use queuing? As the messages come in, put them on a queue and notify each processing module to pick them up from the queue.
For example:
MSG comes in
Module 1 puts it on queue
Module 2,3 get notified
Module 2 picks it up from queue and saved it in the database
In parallel, Module 3 picks it up from queue and processes it
The key is "in parallel". Since these modules are different threads, while Module 2 is saving to the db, Module 3 can massage your message.
You could use JMS or MQ or make your own queue.
It sounds like you're trying to do parallel computing across multiple "machines" (even if virtual). You may want to look at existing protocols, such as MPI - Message Passing Interface to handle this domain, as they have quite a few features that help in this type of scenario