AWS ECS: Load Balancing multiple containers under a single service

AWS ECS: Load Balancing multiple containers under a single service - amazon-web-services

Suppose I have a task definition with 2 nginx containers. Now when I make a service (EC2 launch type) using this task definition and try to add an ALB to it I am able to only choose one "container to load balance". How can I set it up so I am able to load balance the other container too using the same ALB ?
What I have tried and understood is: Suppose I am running 2 tasks using above definition under a single service. So there will be 4 containers running (2 each of container_1 and container_2). When I choose the container_1 to load balance during service ALB creation, ECS will create a target group and put the instance and the dynamic ports of container_1 as targets. This target group will then be mapped to the ALB's rules. This works for me.
But now if I want to have a rule setup for my other set of containers the only way I see is to make the target group myself by hardcoding the dynamic ports of container_2's containers, which is less than ideal.
A use case that I can think of right now is suppose one container is running the frontend and another is running the backend. So I want url's like /my-app/pages/* to go to frontend containers and /my-app/apis/* to go to backend containers. I'm sure there are better examples/use cases but this is all I can think of right now.
So, how would I go about setting this up ?
Thanks!

Let's follow the documentation and recommendations from AWS on why more than 1 containers are allowed in a task. From the documentation [1]
When the following conditions are required, we recommend that you
deploy your containers in a single task definition:
Your containers share a common lifecycle (that is, they are launched
and terminated together).
Your containers must run on the same underlying host (that is, one
container references the other on a localhost port).
You require that your containers share resources.
Your containers share data volumes.
Otherwise, you should define your containers in separate tasks
definitions so that you can scale, provision, and deprovision them
separately.
[1] https://docs.aws.amazon.com/AmazonECS/latest/developerguide/application_architecture.html
For your use-case of frontend and backend applications, two different services is the right way. 2 ECS Services, with 1 TaskDefinition and 1 Container each.
As to how both should integrate with each other, via URL or service names (Microservice Mesh) is another topic and discussion.

Related

How are users on an application running in a container split over the ec2 instances?

So I want to launch a web application, and run it on containers in AWS.
I want to give users access to the tool through a log in page.
I don't understand how AWS manages the relationship of containers and the instances backing them.
My main questions are -
Will multiple containers run on a single ec2 instance?
If the compute power required by a container exceeds the processing power of a single instance, and I have auto-scaling enabled, will it launch multiple instances to support a single container? or will I need to go in and upgrade my ec2 instance type?
Finally, when users log in to the app, will AWS deploy a new container for each user, and subsequently a new instance to run on? or can one container support multiple users?
Also a link to a page where I can find this information would be tremendously helpful.

I will try to answer your questions, but how #Ermiya Eskandary said, the documentation will answer all the questions about container in AWS.
Yes, if your have for example a 2gb memory and 1vcpu ec2 instance and your container need a 500mb memory and 0,25vcpu, you can run a lot of containers inside EC2. You can set the task placement group to tell AWS how to handle container into EC2: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-placement.html
No, if your container size exceeds your ec2 instance, is impossible to share the resources of multiple ec2 to hold on one single container. If you are using ecs core (ec2 mode), the ec2 size always need to be bigger than the container.
No, you will use one container to attend multiple clients, if you are running out resources, your ASG will increase number of tasks running, using the rule that i said in the first point.
To finish, based on my experience, if your use case don't need to work with cores of machine, using a custom AMI or any other thing in the infrastructure level (linux/windows), i would use ec2 fargate.
Fargate have less operational overhead, since you need to orchestrate auto scalling group both in ec2 and in your tasks using ecs with ec2.

AWS ECS container networking, communication between services and within containers

I am having trouble understanding the point of AWS implementation of service discovery in ECS when using bridge mode, and in general a path forward to (relatively basic) container networking, despite the numerous AWS blog posts on the subject.
Service discovery seems to me about solving dynamically generated containers accessibility (in tasks), so that, similar to docker user defined networks, I can access different tasks on a cluster with a predefined canonical host name.name-space, in a VPC.
I've made sure in the VPC that:
DNS hostnames: Enabled
DNS resolution: Enabled
When service discovery is defined when using bridge mode:
it still tacks on a dynamic portion to the name i did not specify.
{
"Name": "my-service.my-namespace.",
"Type": "SRV",
"SetIdentifier": "4b46cb82ba434dasdb163c1f06ca5c083",
"MultiValueAnswer": true,
"TTL": 60,
"ResourceRecords": [
{
"Value": "1 1 27017 4b46cb82ba434dasdb163c1f06ca5c083.my-service.my-namespace."
}
],
"HealthCheckId": "862bd287-2b41-43ac-8442-a3d27042482b"
},
So i need to manually look up the record each time a service is created, or updated. I cannot dig
my-service.my-namespace for example, that record does not exist.
And:
2. every time the service is updated, the record is regenerated...
To get here i need to do:
$ aws servicediscovery list-namespaces
$ aws route53 list-resource-record-sets --hosted-zone-id $ZONE_ID --region us-east-1
My application currently accesses task hosts via injected environment variables, but if the record refreshes on every service update, this is a non starter. All documentation/forums I've come across seem to say, create some kind of dynamic SRV lookup workaround (seems hackish?) or just switch to awsvpc mode, but then why is this service available at all under bridge/host?
Clearly I'm missing something fundamental.
In addition I'm using dynamic port mapping. If I don't, things like rolling updates fail with port already in use errors. Similarly attempting to run a new instance of a task via scheduling creates the same error.
I can connect within a given docker container in a task with the internal private DNS of the instance, i.e. ip-172-31-52-141.ec2.internal, but here I'm outside of the VPC (?) i.e. now I need to be specifying the dynamically mapped port. So this is a non starter as well.
All of this sits behind a public ALB (for dynamic port resolution etc), and this has been working fine, requests from outside AWS resolve correctly to the target groups / targeted services.
If I switch to awsvpc mode, and enable service discovery, I can have multiple tasks/services communicate privately.
However what if I want to have multiple services communicate, but additionally a single service/task might house multiple docker containers, (e.g. a localized redis cache). I cannot specify the 'link' for these containers without the network mode being 'bridge' again.
here's the TLDR question:
I have 2 tasks and a service associated with each task. There may be multiple instances of each task, therefore ports need to be dynamic. In each task i have 2 containers.
What is the general approach here for allowing different services to communicate via a predefined host.namespace dns resolution, and have the containers inside each task communicate with each other?
Apologies for the long post, but as a novice to ECS/AWS, I'm really struggling here ;)
Any feedback or advice is really appreciated.

Restart a single exited container in an ECS task

I have a container that is part of an ECS task definition, which I have marked as essential=false, because if this container goes down, I do not want the ECS agent to take down the other containers in the task. Making the container "non-essential" has achieved the desired result in my case: that container crashes, and the other containers on the task do not get taken down or restarted.
However, I do want this non-essential container to be independently restarted. Is there any built-in way to accomplish this? Basically, if the container exits, run docker start or docker restart on that container (which we are currently having to do manually). I have not had any luck so far with the documentation or from exploring the AWS console.

Docker provides a restart policy that would be useful in your case (--restart always), however, based on this thread, ECS does not support restarting existing containers.
The suggested and accepted workaround was:
ECS supports this use-case through the concept of a "service".
Services work to continuously make the reality (known state) match the
desired state, including the desired number of running tasks you
specify. If a task started by a service stops, the service will create
a new task to replace it. Services help you manage the number of
copies you want running, deployments, binding to and unbinding from
load balancers, respond to load balancer health checks, and integrate
with auto scaling so your service can scale in or out automatically.
You can check out the documentation for more detail.

Automated setup for multi-server RethinkDB cluster via an ECS service

I'm attempting to set up a RethinkDB cluster with 3 servers total spread evenly across 3 private subnets, each in different AZ's in a single region.
Ideally, I'd like to deploy the DB software via ECS and provision the EC2 instances with auto scaling, but I'm having trouble trying to figure out how to instruct the RethinkDB instances to join a RethinkDB cluster.
To create/join a cluster in RethinkDB, when you start up a new instance of RethinkDB, you specify host:port combination of one of the other machines in the cluster. This is where I'm running into problems. The Auto Scaling service is creating new primary ENI's for my EC2 instances and using a random IP in my subnet's range, so I can't know the IP of the EC2 instance ahead of time. On top of that, I'm using awsvpc task networking, so ECS is creating new secondary ENI's dedicated to each docker container and attaching them to the instances when it deploys them and those are also getting new IP's, which I don't know ahead of time.
So far I've worked out one possible solution, which is to not use an autoscaling group, but instead to manually deploy 3 EC2 instances across the private subnets, which would let me assign my own, predetermined, private IP. As I understand it, this still doesn't help me if I'm using awsvpc task networking though because each container running on my instances will get its own dedicated secondary ENI and I wont know the IP of that secondary ENI ahead of time. I think I can switch my task networking to bridge mode, to get around this. That way I can use the predetermined IP of the EC2 instances (the primary ENI) in the RethinkDB join command.
So In conclusion, the only way to achieve this, that I can figure out, is to not use Auto Scaling, or awsvpc task networking, both of which would otherwise be very desirable features. Can anyone think of a better way to do this?

As mentioned in the comments, this is more of an issue around the fact you need to start a single RethinkDB instance one time to bootstrap the cluster and then handle discovery of the existing cluster members when joining new members to the cluster.
I would have thought RethinkDB would have published a good pattern in their docs for this because it's going to be pretty common when setting up clusters but I couldn't see anything useful in their docs. If someone does know of an official recommendation then you should definitely use this rather than what I'm about to propose especially as I have no experience with running RethinkDB.
This is more just spit-balling here and will be completely untested (at least for now) but the principle is going to be you need to start a single, one off instance of RethinkDB to bootstrap the cluster, then have more cluster members join and then ditch the special case bootstrap member that didn't attempt to join a cluster and leave the remaining cluster members to work.
The bootstrap instance is easy enough to consider. You just need a RethinkDB container image and an ECS task that just runs it in stand-alone mode with the ECS service only running one instance of the task. To enable the second set of cluster members to easily discover cluster members including this bootstrapping instance it's probably easiest to use a service discovery mechanism such as the one offered by ECS which uses Route53 records under the covers. The ECS service should register the service in the RethinkDB namespace.
Then you should create another ECS service that's basically the same as the first but in an entrypoint script should list the services in the RethinkDB namespace and then resolve them, discarding the container's own IP address and then uses the discovered host to join to with --join when starting RethinkDB in the container.
I'd then set the non bootstrap ECS service to just 1 task at first to allow it to discover the bootstrap version and then you should be able to keep adding tasks to the service one at a time until you're happy with the size of the non bootstrapped cluster leaving you with n + 1 instances in the cluster including the original bootstrap instance.
After that I'd remove the bootstrap ECS service entirely.
If an ECS task dies in the non bootstrap ECS service dies for whatever reason it should be able to auto rejoin without any issue as it will just find a running RethinkDB task and start that.
You could probably expand the checks for which cluster member to join to by checking that the RethinkDB port is open and running before using that as a member to join so it will handle multiple tasks being started at the same time (with my original suggestion it could potentially find another task that is looking to join the cluster and try to join to that first, with them all potentially deadlocking if they all failed to randomly pick the existing cluster members by chance).
As mentioned, this answer comes with a big caveat that I haven't got any experience running RethinkDB and I've only played with the service discovery mechanism that was recently released for ECS so might be missing something here but the general principles should hold fine.

What is the difference between a task and a service in AWS ECS?

It appears that one can either run a Task or a Service based on a Task Definition. What are the differences and similarities between Task and Service? Is there a clue in the fact that one can specify "Task Group" when creating Task but not Service? Are Task and Service hierarchically equal instantiations of Task Definition, or is Service composed of Tasks?

A Task Definition is a collection of 1 or more container configurations. Some Tasks may need only one container, while other Tasks may need 2 or more potentially linked containers running concurrently. The Task definition allows you to specify which Docker image to use, which ports to expose, how much CPU and memory to allot, how to collect logs, and define environment variables.
A Task is created when you run a Task directly, which launches container(s) (defined in the task definition) until they are stopped or exit on their own, at which point they are not replaced automatically. Running Tasks directly is ideal for short-running jobs, perhaps as an example of things that were accomplished via CRON.
A Service is used to guarantee that you always have some number of Tasks running at all times. If a Task's container exits due to an error, or the underlying EC2 instance fails and is replaced, the ECS Service will replace the failed Task. This is why we create Clusters so that the Service has plenty of resources in terms of CPU, Memory and Network ports to use. To us it doesn't really matter which instance Tasks run on so long as they run. A Service configuration references a Task definition. A Service is responsible for creating Tasks.
Services are typically used for long-running applications like web servers. For example, if I deployed my website powered by Node.JS in Oregon (us-west-2) I would want say at least three Tasks running across the three Availability Zones (AZ) for the sake of High-Availability; if one fails I have another two and the failed one will be replaced (read that as self-healing!). Creating a Service is the way to do this. If I had 6 EC2 instances in my cluster, 2 per AZ, the Service will automatically balance Tasks across zones as best it can while also considering CPU, memory, and network resources.
UPDATE:
I'm not sure it helps to think of these things hierarchically.
Another very important point is that a Service can be configured to use a load balancer, so that as it creates the Tasks—that is it launches containers defined in the Task Definition—the Service will automatically register the container's EC2 instance with the load balancer. Tasks cannot be configured to use a load balancer, only Services can.

Beautifully explained in words by #talentedmrjones. Picture below will help you visualize it easily :)

Task Definition:
This is the blueprint describing which Docker containers to run and represents your application. It includes several tasks.
Service:
An instance of Task Definition. It also defines the minimum and maximum Tasks from one Task Definition run at any given time, autoscaling, and load balancing.
ECS Container Instances:
This is an EC2 instance that has Docker and an ECS Container Agent running on it. The Agent takes care of the communication between ECS and the instance, providing the status of running containers and managing running new ones.
Relationship:

Task Definition: (It is a configuration)
A task definition is a blueprint for your application and describes one or more containers through attributes. Some attributes are configured at the task level, but the majority of attributes are configured per container.
You are defining your containers and how to launch them via Task definitions. You describe how containers should be provisioned (link to ECR’s saved container images, CPU units, Memory, Container ports to expose, network type).
Task definitions specify the container information for your application (web), such as how many containers are part of your task, what resources they will use, how they interact with each other and which host port they will use. It can be of Fargate and EC2 type.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js