Understanding AWS EC2 Cluster Usage - amazon-web-services

I have a Question about The Cluster usage in AWS. If I have 10 Instances running, do I have One master Instance and when I run a threaded Application on One Instance, is the application able to use all Instances like it would with multiple Cores?
I have seen the Tutorials on the Website but I can't figure out how these Clusters work. If I run One Application it counts as one job even if threaded right? So will only one instance be used?
Thank you in advance.

In aws , you have one master instance and one to many core instances.
Master instance will schedule and lead the job but the core ones will do all the work.
There is also an option of task instances which will process when it comes to tasks.
Cheers,

Related

Cannot run more than two tasks in Amazon Web Services

I have two clusters in my Amazon Elastic Container Service, one for production and one as a testing environment.
Each cluster has three different services with one task each. There should be 6 tasks running.
To update a task, I always pushed my new Docker Image to the Elastic Container Registry and restarted the Service with the new Image.
Since about 2 weeks I am only able to start 2 Tasks at all. It doesn't depend on the cluster, just 2 Tasks in general.
It looks like the tasks that should start are stuck in the "In Progress" Rollout State.
Has anybody similar problem or knows how to fix this?
I wrote to the support with this issue.
"After a review, I have noticed that the XXXXXXX region has not yet been activated. In order to activate the region you will have to launch an instance, I recommended a Free Tier EC2 instance.
After the EC2 instance has been launched you can terminate it thereafter.
"
I don't know why, but it's working

Automated setup for multi-server RethinkDB cluster via an ECS service

I'm attempting to set up a RethinkDB cluster with 3 servers total spread evenly across 3 private subnets, each in different AZ's in a single region.
Ideally, I'd like to deploy the DB software via ECS and provision the EC2 instances with auto scaling, but I'm having trouble trying to figure out how to instruct the RethinkDB instances to join a RethinkDB cluster.
To create/join a cluster in RethinkDB, when you start up a new instance of RethinkDB, you specify host:port combination of one of the other machines in the cluster. This is where I'm running into problems. The Auto Scaling service is creating new primary ENI's for my EC2 instances and using a random IP in my subnet's range, so I can't know the IP of the EC2 instance ahead of time. On top of that, I'm using awsvpc task networking, so ECS is creating new secondary ENI's dedicated to each docker container and attaching them to the instances when it deploys them and those are also getting new IP's, which I don't know ahead of time.
So far I've worked out one possible solution, which is to not use an autoscaling group, but instead to manually deploy 3 EC2 instances across the private subnets, which would let me assign my own, predetermined, private IP. As I understand it, this still doesn't help me if I'm using awsvpc task networking though because each container running on my instances will get its own dedicated secondary ENI and I wont know the IP of that secondary ENI ahead of time. I think I can switch my task networking to bridge mode, to get around this. That way I can use the predetermined IP of the EC2 instances (the primary ENI) in the RethinkDB join command.
So In conclusion, the only way to achieve this, that I can figure out, is to not use Auto Scaling, or awsvpc task networking, both of which would otherwise be very desirable features. Can anyone think of a better way to do this?
As mentioned in the comments, this is more of an issue around the fact you need to start a single RethinkDB instance one time to bootstrap the cluster and then handle discovery of the existing cluster members when joining new members to the cluster.
I would have thought RethinkDB would have published a good pattern in their docs for this because it's going to be pretty common when setting up clusters but I couldn't see anything useful in their docs. If someone does know of an official recommendation then you should definitely use this rather than what I'm about to propose especially as I have no experience with running RethinkDB.
This is more just spit-balling here and will be completely untested (at least for now) but the principle is going to be you need to start a single, one off instance of RethinkDB to bootstrap the cluster, then have more cluster members join and then ditch the special case bootstrap member that didn't attempt to join a cluster and leave the remaining cluster members to work.
The bootstrap instance is easy enough to consider. You just need a RethinkDB container image and an ECS task that just runs it in stand-alone mode with the ECS service only running one instance of the task. To enable the second set of cluster members to easily discover cluster members including this bootstrapping instance it's probably easiest to use a service discovery mechanism such as the one offered by ECS which uses Route53 records under the covers. The ECS service should register the service in the RethinkDB namespace.
Then you should create another ECS service that's basically the same as the first but in an entrypoint script should list the services in the RethinkDB namespace and then resolve them, discarding the container's own IP address and then uses the discovered host to join to with --join when starting RethinkDB in the container.
I'd then set the non bootstrap ECS service to just 1 task at first to allow it to discover the bootstrap version and then you should be able to keep adding tasks to the service one at a time until you're happy with the size of the non bootstrapped cluster leaving you with n + 1 instances in the cluster including the original bootstrap instance.
After that I'd remove the bootstrap ECS service entirely.
If an ECS task dies in the non bootstrap ECS service dies for whatever reason it should be able to auto rejoin without any issue as it will just find a running RethinkDB task and start that.
You could probably expand the checks for which cluster member to join to by checking that the RethinkDB port is open and running before using that as a member to join so it will handle multiple tasks being started at the same time (with my original suggestion it could potentially find another task that is looking to join the cluster and try to join to that first, with them all potentially deadlocking if they all failed to randomly pick the existing cluster members by chance).
As mentioned, this answer comes with a big caveat that I haven't got any experience running RethinkDB and I've only played with the service discovery mechanism that was recently released for ECS so might be missing something here but the general principles should hold fine.

AWS ECS running a task that requires many cores

I am conceptually trying to understand how to use AWS ECS to run my "cluster" jobs.
I have some scientific software inside a Docker container, that natively takes advantage of as many cores as the underlying instance has to offer.
My question in this case is, can I use AWS ECS to "increase" the number of "visible" cores to the task running inside my Docker container. For instance, is my "cluster" limited to only a single instance? Or is a "cluster" expandable to multiple instances?
I haven't been able to find any answers my looking through he AWS docs.
Cluster is just some EC2 instances that are ECS-enabled (are running special agent software) and grouped together. Tasks that you run on this cluster are spread across these instances. Each task can involve multiple containers. However, each container stays within its instance ‘boundaries’, hardware-wise. It is allocated a number of “CPU units” and shares them with other containers running on the same instance.
From my understanding, running a process spanning multiple cores in a container is not quite fitting ECS architecture idea—it seems like trying to do part of ECS’s scheduler job.
I found these resources useful when I was reading about it:
My notes on Amazon's ECS post by Jérôme Petazzoni
Application Architecture in ECS docs
Task Definition Parameters in ECS docs
I had a similar situation moving a Python app that used a script to spawn copies of itself based on the number of cores. The answer to this isn't so much an ECS problem as it is a Docker best practice... you should strive to use 1 process per container. (see https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/)
How I ended up implementing this was using a Dockerfile to run each process and then used essential ECS tasks so it will reload itself if the task died.
Your cluster is a collection of EC2 instances with the ECS service running. Each instance has a certain number of CPU 'units' (typically 1024 units === 1 core) and RAM. I profiled my app at peak load and tweaked the mix until I got it where I liked it. If your app can use more CPU than that, try giving it 2048 CPU or some other amount and see how it performs. I used Meros (https://meros.io/) to profile my app.
Hope this helps!
"increase" the number of "visible" cores to the task running inside my Docker container
Container and cluster is different things, you may run lot of containers on one instance, but you can't run one container on multiply instances.
Cluster - it is set of docker containers.
is my "cluster" limited to only a single instance?
no, you may choose number of instances in cluster

AWS Elastic Beanstalk - why would I use leader_only for a command?

I am writing a django app which I plan on deploying to AWS via Elastic Beanstalk. I am trying to understand why I would need to specify 'leader_only' for a container command I want to run for my app. More details about this can be found here.
It says:
Additionally, you can use leader_only. One instance is chosen to be
the leader in an Auto Scaling group. If the leader_only value is set
to true, the command runs only on the instance that is marked as the
leader.
If I have several instances running my app because I want to scale it, wouldn't using 'leader_only' run the command on only one instance, and not affect the rest? I am probably misunderstanding the purpose of it, but that seems non-ideal because the environment in the leader may differ from the other instances, and the end user may get different results depending on which instance they happen to connect to.
From a technical point of view, elastic beanstalk is autoscaling group and when you deploy something you need to assume that potentially your commands can be executed simultaneously on several ec2 instances.
Main goal of the leader_only option is to make sure that your commands will be executed on only one ec2 instance. It is useful for use cases such as execution of the db migration scripts, creation of db, etc., that should be executed just once on one ec2. So leader_only is just a marker that some commands will be executed on this instance only.
However, you need to keep in mind, the leader attribute is set once on creation of your environment and in case if leader died and was replaced by new instance possible situation when you don't have any leaders in autoscaling group.
I've done considerable testing of this recently. Both leader_only and EB_IS_COMMAND_LEADER. Both Apache 1 and Apache 2 setups.
The two named values above can be found in many discussions, guides and documents, but the situation is basically this:
You cannot trust being able to reliably detect a leader in a multiple EC2 instance environment, except during deployment and scale up
That means you cannot use the testing of either of the values above to confirm a command will run on exactly one (not zero, not 2+) instance as part of a cron job or scheduled task.
Recent improvements and changes to the way leader status is managed may well mean that a leader is always available during deployments and scale up, but at other times, including after instance replacement, there may not be a leader instance to be found.
There are two main options available if you really need to only run a scheduled task once while managing multiple instances.
A worker environment specifically for scheduled tasks, or another external service like Lambda with EventBridge (CloudWatch Events)
Setup crons to run across all instance in deployment configs. Include a small amount of code before the cron runs which connects to the AWS api, gets a list of current instances and checks the id of the first returned against its own ID to see if it should run the cron.

Is it possible to auto scale with amazon web services, with ever changing AMI's?

Curious if this is possible:
We have a web application that at MOST times, works just fine with our single small instance. However, when we get multiple customers running simultaneously intense queries (we are a cloud scheduling service); our instance bogs way down to near 80% cpu load and becomes pretty unresponsive.
Is there a way to have AWS fire up another small instance (or a few), quickly, only for the times that its operating under this intense load? BUT, the real question is how does this work when we have very frequent programming updates to our application? Do we have to manually create a new image everytime we upload a code change???
Thanks
You should never be running anything important on a single EC2 instance. Instances can--and do--go offline randomly. Always use an autoscaling (AS) group that spans multiple availability zones. An AS group will automatically bring new instances online when you hit a certain trigger (in your case, CPU utilization). And then it will scale down the instances when traffic subsides. Autoscaling is the heart and soul of AWS and if you're not using it, you might as well be using a cheaper (and more durable) VPS host.
No, you don't want to be creating a new AMI for each code release. Ideally you should use a base AMI (like one of Amazon's official ones) and then have it auto-provision at boot. You can use the "user data" field when you launch an AMI to bootstrap this process. It can be as simple as a bash script that pulls from your Git repo to as something as sophisticated as Puppet or Chef.
The only time I create custom AMI's is if the provisioning process just takes too long. However that can almost always be solved by storing the needed files in S3.