Is it possible to certainly assign one worker to one host in the cluster? - ray

I want each worker to be allocated to different hosts in the cluster. Such as, I have a cluster with 3 hosts, with IP address 192.168.0.100, 192.168.0.101, 192.168.0.102 respectively. I want to create 3 workers, and assign the task of each work to a different host. Could it be possible?

Yeah - you can do so with custom resources. The main idea would be for each node to have a specific custom resource, and then your application actors can be specified to have one of the custom resource. See: https://ray.readthedocs.io/en/latest/resources.html#specifying-a-node-s-resource-requirements

Related

AWS ECS: Load Balancing multiple containers under a single service

Suppose I have a task definition with 2 nginx containers. Now when I make a service (EC2 launch type) using this task definition and try to add an ALB to it I am able to only choose one "container to load balance". How can I set it up so I am able to load balance the other container too using the same ALB ?
What I have tried and understood is: Suppose I am running 2 tasks using above definition under a single service. So there will be 4 containers running (2 each of container_1 and container_2). When I choose the container_1 to load balance during service ALB creation, ECS will create a target group and put the instance and the dynamic ports of container_1 as targets. This target group will then be mapped to the ALB's rules. This works for me.
But now if I want to have a rule setup for my other set of containers the only way I see is to make the target group myself by hardcoding the dynamic ports of container_2's containers, which is less than ideal.
A use case that I can think of right now is suppose one container is running the frontend and another is running the backend. So I want url's like /my-app/pages/* to go to frontend containers and /my-app/apis/* to go to backend containers. I'm sure there are better examples/use cases but this is all I can think of right now.
So, how would I go about setting this up ?
Thanks!
Let's follow the documentation and recommendations from AWS on why more than 1 containers are allowed in a task. From the documentation [1]
When the following conditions are required, we recommend that you
deploy your containers in a single task definition:
Your containers share a common lifecycle (that is, they are launched
and terminated together).
Your containers must run on the same underlying host (that is, one
container references the other on a localhost port).
You require that your containers share resources.
Your containers share data volumes.
Otherwise, you should define your containers in separate tasks
definitions so that you can scale, provision, and deprovision them
separately.
[1] https://docs.aws.amazon.com/AmazonECS/latest/developerguide/application_architecture.html
For your use-case of frontend and backend applications, two different services is the right way. 2 ECS Services, with 1 TaskDefinition and 1 Container each.
As to how both should integrate with each other, via URL or service names (Microservice Mesh) is another topic and discussion.

EKS: How to reduce data transfer between AZs?

I am using EKS and I have 3 nodes each one in a separate availability zone. I have 2 kubernetes deployments of 2 different apps (let's name them app A and app B) with 3 replicas each, and those 2 applications have to communicate with each other. They communicate through the ClusterIP kubernetes service. So, each node has 2 pods: 1 from app A and 1 pod from app B.
What I would like to achieve is the communication of app A and app B to never leave the node, as the pods are already in the same node. Is there any way to achieve this?
I have seen on CloudWatch that communication leaves the nodes.
What I would like to achieve is the communication of app A and app B to never leave the node, as the pods are already in the same node. Is there any way to achieve this?
Run both services as containers within the same pod, and have them communicate with one another locally.
I don't believe there's a way to use affinities or taints to achieve this reliably.
edit: Inaccurate, see docs linked in the comment on the OP's question. Left up for completeness.

Automated setup for multi-server RethinkDB cluster via an ECS service

I'm attempting to set up a RethinkDB cluster with 3 servers total spread evenly across 3 private subnets, each in different AZ's in a single region.
Ideally, I'd like to deploy the DB software via ECS and provision the EC2 instances with auto scaling, but I'm having trouble trying to figure out how to instruct the RethinkDB instances to join a RethinkDB cluster.
To create/join a cluster in RethinkDB, when you start up a new instance of RethinkDB, you specify host:port combination of one of the other machines in the cluster. This is where I'm running into problems. The Auto Scaling service is creating new primary ENI's for my EC2 instances and using a random IP in my subnet's range, so I can't know the IP of the EC2 instance ahead of time. On top of that, I'm using awsvpc task networking, so ECS is creating new secondary ENI's dedicated to each docker container and attaching them to the instances when it deploys them and those are also getting new IP's, which I don't know ahead of time.
So far I've worked out one possible solution, which is to not use an autoscaling group, but instead to manually deploy 3 EC2 instances across the private subnets, which would let me assign my own, predetermined, private IP. As I understand it, this still doesn't help me if I'm using awsvpc task networking though because each container running on my instances will get its own dedicated secondary ENI and I wont know the IP of that secondary ENI ahead of time. I think I can switch my task networking to bridge mode, to get around this. That way I can use the predetermined IP of the EC2 instances (the primary ENI) in the RethinkDB join command.
So In conclusion, the only way to achieve this, that I can figure out, is to not use Auto Scaling, or awsvpc task networking, both of which would otherwise be very desirable features. Can anyone think of a better way to do this?
As mentioned in the comments, this is more of an issue around the fact you need to start a single RethinkDB instance one time to bootstrap the cluster and then handle discovery of the existing cluster members when joining new members to the cluster.
I would have thought RethinkDB would have published a good pattern in their docs for this because it's going to be pretty common when setting up clusters but I couldn't see anything useful in their docs. If someone does know of an official recommendation then you should definitely use this rather than what I'm about to propose especially as I have no experience with running RethinkDB.
This is more just spit-balling here and will be completely untested (at least for now) but the principle is going to be you need to start a single, one off instance of RethinkDB to bootstrap the cluster, then have more cluster members join and then ditch the special case bootstrap member that didn't attempt to join a cluster and leave the remaining cluster members to work.
The bootstrap instance is easy enough to consider. You just need a RethinkDB container image and an ECS task that just runs it in stand-alone mode with the ECS service only running one instance of the task. To enable the second set of cluster members to easily discover cluster members including this bootstrapping instance it's probably easiest to use a service discovery mechanism such as the one offered by ECS which uses Route53 records under the covers. The ECS service should register the service in the RethinkDB namespace.
Then you should create another ECS service that's basically the same as the first but in an entrypoint script should list the services in the RethinkDB namespace and then resolve them, discarding the container's own IP address and then uses the discovered host to join to with --join when starting RethinkDB in the container.
I'd then set the non bootstrap ECS service to just 1 task at first to allow it to discover the bootstrap version and then you should be able to keep adding tasks to the service one at a time until you're happy with the size of the non bootstrapped cluster leaving you with n + 1 instances in the cluster including the original bootstrap instance.
After that I'd remove the bootstrap ECS service entirely.
If an ECS task dies in the non bootstrap ECS service dies for whatever reason it should be able to auto rejoin without any issue as it will just find a running RethinkDB task and start that.
You could probably expand the checks for which cluster member to join to by checking that the RethinkDB port is open and running before using that as a member to join so it will handle multiple tasks being started at the same time (with my original suggestion it could potentially find another task that is looking to join the cluster and try to join to that first, with them all potentially deadlocking if they all failed to randomly pick the existing cluster members by chance).
As mentioned, this answer comes with a big caveat that I haven't got any experience running RethinkDB and I've only played with the service discovery mechanism that was recently released for ECS so might be missing something here but the general principles should hold fine.

AWS AutoScaling with Static IPs

Is it possible to do AutoScaling with Static IPs in AWS ? The newly created instances should either have a pre-defined IP or pick from a pool of pre-defined IPs.
We are trying to setup ZooKeeper in production, with 5 zooKeeper instances. Each one should have a static-IP which are to hard-coded in the Kafka's AMI/Databag that we use. It should also support AutoScaling, so that if one of the zooKeeper node goes down, a new one is spawned with the same IP or from a pool of IPs. For this we have decided to go with 1 zoo-keeper instance per AutoScaling group, but the problem is with the IP.
If this is the wrong way, please suggest the right way. Thanks in advance !
One method would be to maintain a user data script on each instance, and have each instance assign itself an elastic IPs from a set of EIPs assigned for this purpose. This user data script would be referenced in the ASGs Launch Configuration, and would run on launch.
Say the user script is called "/scripts/assignEIP.sh", using the AWS CLI you would have it consult the pool to see which ones are available and which ones are not (already in use). Then it would assign itself one of the available EIPS.
For ease of IP management, you could keep the pool of IPs in a simple text properties file on S3, and have the instance download and consult that list when the instance starts.
Keep in mind that each instance will need an to be assigned IAM instance profile that will allow each instance to consult and assign EIPs to itself.

AWS EC2 Autoscaling: Defining a master instance, which is never terminated

I am using EC2 with autoscaling and loadbalancing to host my webapp. To guarantee consistency between the EC2 instances, I only want to allow access to the administration interface from one instance, so all write operations are executed on this instance. The other instaces then periodically download copies of the changed files.
So here's my question:
Can I have a designated "Master" Instance, in my autoscaling group, which is slightly different (runs script for uploading files which were written to)? Of course this Instance should never be shut down, no matter what. All the other "Slave" Instances are indentical an can be created and terminated on demand. Is there some sort of configuration option for this or can I do this with a policy?
My suggestion would be one of two things, either have two autoscaling groups - one for the readonly instances (i.e. the non-master), and then a second ASG for the master instance(s). Even if there is only one master instance at any time, you can still benefit by including it in its own autoscaling group by taking advantage of the ability for the ASG to detect when it has failed, and spin up a single new instance to replace it.
Alternatively, leave the master instance out of the auto-scaling altogether, and just run it as a reserved instances - let the rest of the RO instances scale up and down as necessary.