Can I force task/actor to run on specific Node? using Ray - ray

I'm checking if ray fit to my use-case/business.
I know that a group of tasks will connect with the same actor/s and this will cause a lot of IO between the actor/s and the tasks.
I want to know if is there a way to force the actor/s and the tasks to run on the same Node, to optimize the IO connection.

For Ray 1.13, you can use ray.util.scheduling_strategies.NodeAffinitySchedulingStrategy(node_id, soft: bool),
Here is a simple example:
#ray.remote
class Actor:
pass
# "DEFAULT" scheduling strategy is used (packed onto nodes until reaching a threshold and then spread).
a1 = Actor.remote()
# Zero-CPU (and no other resources) actors are randomly assigned to nodes.
a2 = Actor.options(num_cpus=0).remote()
# Only run the actor on the local node.
a3 = Actor.options(
scheduling_strategy=NodeAffinitySchedulingStrategy(
node_id = ray.get_runtime_context().node_id,
soft = False,
)
).remote()

Related

Distribute work stored in table to multiple processes

I have a database table where each row represents a work to be done. This table is filled up/receive work through a rest API. Apart from a rest-service taking up the work, I have another service which uses actors to process this work.
I need suggestions in distributing this work evenly across these workers. This work is not one time, it is kind of done at an interval until user deletes that.
Therefore I need a mechanism where
The work as it comes is distributed evenly.
If the second service(work consumer) fails it can again boot up with all the records in table and distribute the work again.
Each actor represents one row of the work table.
class WorkActor(workId: String)(implicit system: ActorSystem, materializer: ActorMaterializer) extends Actor {
// read the record from table or whereever you want to read
override def preStart(): Unit = {
logger.info("WorkActor start ===> " + self)
}
override def receive: Receive = {
case _ => {}
}
}
Create an Akka cluster sharding region to dispatch the request from rest api to corresponding actor. Calling startShardingRegion function to return an actorRef. Then you could send the message to this sharding actorRef by rest API, and then corresponding will help you handle the message.
final case class CommandEnvelope(id: String, payload: Any)
def startShardingRegion(role: String)(implicit system: ActorSystem) = {
ClusterSharding(system).start(
typeName = role,
entityProps = Props(classOf[WorkActor]),
settings = ClusterShardingSettings(system),
extractEntityId = ClusterConfig.extractEntityId,
extractShardId = ClusterConfig.extractShardId
)
}
// sharding key
object ClusterConfig {
private val numberOfShards = 100
val extractEntityId: ShardRegion.ExtractEntityId = {
case CommandEnvelope(id, payload) => (id, payload)
}
val extractShardId: ShardRegion.ExtractShardId = {
case CommandEnvelope(id, _) => (id.hashCode % numberOfShards).toString
case ShardRegion.StartEntity(id) => (id.hashCode % numberOfShards).toString
}
}
Read or recover the data from preStart function in the actor. There are many choice. You may read the uncompleted work from MQ (Kafka), Akka persistence (RDS, Cassandra) etc.
SBR has open source solution. That is an advanced topic if your business logic works.
https://github.com/TanUkkii007/akka-cluster-custom-downing
The general outline of a solution is to use Akka Cluster, Cluster Sharding, and Akk Cluster Singleton. When the cluster is considered formed (generally when some minimum number of members have joined the cluster), you start the Cluster Sharding system (sharding work items by the DB's primary key) and then a Cluster Singleton will read the DB table and send work items to Cluster Sharding for distribution among the nodes of the cluster. Akka Streams and particularly Alpakka's Slick JDBC integration may prove useful within the singleton. Another cluster singleton to periodically check on jobs may also be useful to recover from cluster node failures (but see below for something to consider there).
Two notes:
If using Cluster Sharding and Cluster Singleton, you probably want to consider what happens in a split-brain situation: this is a distributed system and the probability of a split-brain eventually happening can be presumed to be 100%. In the split-brain scenario, you will very likely have the same jobs being performed simultaneously by different sides of the split, so you need to ask if that's acceptable in your use-case.
If not, then you will need a component which monitors the communications between nodes in your cluster to detect a split-brain and takes steps to resolve the condition: Lightbend's Split Brain Resolver is a good choice if you aren't interested in implementing this yourself.
In a related vein, if the jobs consist of many steps which must be performed, a question to ask is, if a cluster or node fails after completing, say, eight of ten steps, is it acceptable to redo steps 1-8 vs. starting with step 9? If the answer to this is "no", then you'll need to persist the intermediate state of the job. Akka Persistence is a great choice here, though you may want to read up on event sourcing. If using Persistence with Cluster Sharding and Cluster Singleton, it should be noted, you will almost certainly need to handle split-brains (see previous item).

Is it possible to assign a task to a specific worker in Ray?

Specifically I'd like my parameter store worker to always be invoked on the HEAD node, and not on any of the workers. This way I can optimize the resource configuration. Currently the parameter store task seems to get started on a random server, even if it called first, and even if it is followed by a ray.get()
Maybe it's possible to do something like:
ps = ParameterStore.remote(onHead=True)?
You can start the "head" node with an extra custom resource and then you can make the parameter store actor require that custom resource. For example, start the head node with:
ray start --head --resources='{"PSResource": 1}'
Then you can declare the parameter store actor class with
#ray.remote(resources={"PSResource": 1})
class ParameterStore(object):
pass
ps = ParameterStore.remote()
You can also declare the parameter store actor regularly and change the way you invoke it. E.g.,
#ray.remote
class ParameterStore(object):
pass
ps = ParameterStore._remote(args=[], resources={"PSResource": 1})
You can read more about resources in Ray at https://ray.readthedocs.io/en/latest/resources.html.

Alternative to cloning tokio channel's sender for futures' closures

I'm working with tokio and hyper to spawn several tasks.
// Defining the task
let task = self.some_future_using
.map(move |resp| println!("OK: {}", resp))
.map_err(move |e| println!("Error: {}",e));
// Spawning the task
tokio::spawn(task);
Instead of simply logging the results, I would like to send the result over a bounded tokio channel.
// Defines the channel
let (tx, rx) = mpsc::channel(10);
// Defining the task
let task = self.some_future_using
.map(|resp| /* Send Ok(resp) to tx */ )
.map_err(|e| /* Send Err(e) to tx */);
// Spawning the task
tokio::spawn(task);
As both closures may outlive the scope where tx is defined, we need to clone and move tx for both closures:
// Defines the channel
let (tx, rx) = mpsc::channel(10);
let mut tx_ok = tx.clone();
let mut tx_err = tx.clone();
// Defining the task
let task = self.some_future_using
.map(move |resp| tx_ok.try_send(Ok(())).unwrap() )
.map_err(move |e| tx_err.try_send(Ok(())).unwrap() );
// Spawning the task
tokio::spawn(task);
In the case more logic is added using combinators (map, and_then, etc), every closures would require it's own cloned version of tx to use it.
Is cloning the only solution? Could we achieve the same without cloning the channel's sender for each declared closure that uses it?
Could we achieve the same without cloning the channel's sender for each declared closure that uses it?
No. This is how a Sender is shared, and there isn't another safe way to do it.
The channel manages shared resources by wrapping them in Arcs, so they can be shared safely between threads. There is a bit of logic involved in the Sender's clone method, but ultimately it is about cloning those Arcs - which is how Arcs are shared.
Cloning an Arc is cheap, and probably not something you should worry about, unless you are cloning them in a tight loop. Once they are cloned, there is very little overhead to an Arc - each clone is essentially a pointer.

Prioritize some workflow executions over others

I've been using the flow framework for amazon swf and I want to be able to run priority workflow executions and normal workflow executions. If there are priority tasks, then activities should pick up the priority tasks ahead of normal priority tasks. What is the best way to accomplish this?
I'm thinking that the following might work but I wonder if there's a better/recommended approach.
I'll define two Activity Workers and two activity lists for the activity. One priority list and one normal list. Each worker will be using the same activity class.
Both workers will be run on the same host (ec2 instance).
On the workflow, I'll define two methods: startNormalWorkflow and startHighWorkflow. In the startHighWorkflow method, I can use ActivitySchedulingOptions to put the task on the high priority list.
Problem with this approach is that there is no guarantee that the high priority task is scheduled before normal tasks.
It's a good question, it had me scratching my head for a while.
Of course, there is more than one way to skin this cat and there exists a number of valid solutions. I focused here on the simplest possible that I could conceive of, namely, execution of tasks in order of priority within a single workflow.
The scenario goes as follows: I define one activity worker serving two task lists, default_tasks and urgent_tasks, with a trivial logic:
If there are pending tasks on the urgent_tasks list, then pick one from there,
Otherwise, pick a task from default_tasks
Execute any task selected.
The question is how to check if any high priority tasks are pending? CountPendingActivityTasks API comes to the rescue!
I know you use Flow for development. My example is written using boto.swf.layer2 as Python is so much easier for prototyping - but the idea remains the same and can be extended to a more complex scenario with high and low priority workflow executions.
So, to accomplish the above using boto.swf follow these steps:
Export credentials to the environment
$ export AWS_ACCESS_KEY_ID=your access key
$ export AWS_SECRET_ACCESS_KEY= your secret key
Get the code snippets
For convenience, you can fork it from github:
$ git clone git#github.com:oozie/stackoverflow.git
$ cd stackoverflow/amazon-swf/priority_tasks/
To bootstrap the domain and the workflow:
# domain_setup.py
import boto.swf.layer2 as swf
DOMAIN = 'stackoverflow'
VERSION = '1.0'
swf.Domain(name=DOMAIN).register()
swf.ActivityType(domain=DOMAIN, name='SomeActivity', version=VERSION, task_list='default_tasks').register()
swf.WorkflowType(domain=DOMAIN, name='MyWorkflow', version=VERSION, task_list='default_tasks').register()
Decider implementation:
# decider.py
import boto.swf.layer2 as swf
DOMAIN = 'stackoverflow'
ACTIVITY = 'SomeActivity'
VERSION = '1.0'
class MyWorkflowDecider(swf.Decider):
domain = DOMAIN
task_list = 'default_tasks'
version = VERSION
def run(self):
history = self.poll()
print history
if 'events' in history:
# Get a list of non-decision events to see what event came in last.
workflow_events = [e for e in history['events']
if not e['eventType'].startswith('Decision')]
decisions = swf.Layer1Decisions()
last_event = workflow_events[-1]
last_event_type = last_event['eventType']
if last_event_type == 'WorkflowExecutionStarted':
# At the start, get the worker to fetch the first assignment.
decisions.schedule_activity_task(ACTIVITY+'1', ACTIVITY, VERSION, task_list='default_tasks')
decisions.schedule_activity_task(ACTIVITY+'2', ACTIVITY, VERSION, task_list='urgent_tasks')
decisions.schedule_activity_task(ACTIVITY+'3', ACTIVITY, VERSION, task_list='default_tasks')
decisions.schedule_activity_task(ACTIVITY+'4', ACTIVITY, VERSION, task_list='urgent_tasks')
decisions.schedule_activity_task(ACTIVITY+'5', ACTIVITY, VERSION, task_list='default_tasks')
elif last_event_type == 'ActivityTaskCompleted':
# Complete workflow execution after 5 completed activities.
closed_activity_count = sum(1 for wf_event in workflow_events if wf_event.get('eventType') == 'ActivityTaskCompleted')
if closed_activity_count == 5:
decisions.complete_workflow_execution()
self.complete(decisions=decisions)
return True
Prioritizing worker implementation:
# worker.py
import boto.swf.layer2 as swf
DOMAIN = 'stackoverflow'
VERSION = '1.0'
class PrioritizingWorker(swf.ActivityWorker):
domain = DOMAIN
version = VERSION
def run(self):
urgent_task_count = swf.Domain(name=DOMAIN).count_pending_activity_tasks('urgent_tasks').get('count', 0)
if urgent_task_count > 0:
self.task_list = 'urgent_tasks'
else:
self.task_list = 'default_tasks'
activity_task = self.poll()
if 'activityId' in activity_task:
print urgent_task_count, 'urgent tasks in the queue. Executing ' + activity_task.get('activityId')
self.complete()
return True
Run the workflow from three instances of an interactive Python shell
Run the decider:
$ python -i decider.py
>>> while MyWorkflowDecider().run(): pass
...
Start an execution:
$ python -i decider.py
>>> swf.WorkflowType(domain='stackoverflow', name='MyWorkflow', version='1.0', task_list='default_tasks').start()
Finally, kick off the worker and watch the tasks as they're getting executed:
$ python -i worker.py
>>> while PrioritizingWorker().run(): pass
...
2 urgent tasks in the queue. Executing SomeActivity2
1 urgent tasks in the queue. Executing SomeActivity4
0 urgent tasks in the queue. Executing SomeActivity5
0 urgent tasks in the queue. Executing SomeActivity1
0 urgent tasks in the queue. Executing SomeActivity3
It turns out that using a separate task list that you have to check first doesn't work well.
There's a couple of problems.
First, the count API doesn't update reliably. So you may get 0 tasks even when there are urgent tasks in the queue.
Second, the call that polls for tasks hangs if there are no tasks available. So when you poll for the non-urgent tasks, that will "stick" for either 2 minutes, or until you have a non-urgent task to do.
So this can cause all kinds of problems in your workflow.
For this to work, SWF would have to implement a polling API that could return the first task from a list of task lists. Then it would be much easier.

Akka supervisor managing supervisors

I reckon there might be a broader question of application design using Akka hidden in here but I'll ask how does one set up a supervision tree where a "kernel" or "top" supervisor might supervise children that are other supervisors which supervisor workers?
You might start with a declarative supervisor on the top level
val kernel = Supervisor(
SupervisorConfig(
OneForOneStrategy(List(classOf[Exception]), 3, 1000),
Supervise(
actorOf[Module1],
Permanent) ::
Supervise(
actorOf[Module2],
Permanent) ::
Supervise(
actorOf[Module3],
Permanent) ::
Nil
)
)
where module 1 to 3 represents the next level of your application architecture. Each module itself is a supervisor for all of its child actors and implemented like for example
class Module1 extends Actor {
self.faultHandler = OneForOneStrategy(List(classOf[Throwable]), 5, 5000)
def receive = {
case Register(actor) =>
self.link(actor)
}
}
You might replace the "Register(actor)" message with something more suitable for your application, and you might want to span a further programmatically created supervisor layer but basically that would be the approach to follow.
You'll find further details (as Viktor already commented) in the official akka documentation on Fault tolerance with Scala or Fault tolerance with Java.