Persistence in AWS Fargate Containers - amazon-web-services

I have 2 containers in a Fargate task definition. One of the containers is a database server. I'm wanting to persist the data directory. However, Fargate doesn't support the Source Path field when setting up a volume in the task definition. Does anyone know who to set up persistence in Fargate?

AWS Fargate at this moment is targeted to stateless container solutions only, but we never know, maybe AWS is already working in a solution for it.
Remember you are sharing the same host with other AWS Customers. Your instance could be terminated and restarted in another host anytime. You also can scale out your service anytime.
You can use any of the options below:
use RDS for general purpose databases.
If your DB is not available you can start a new EC2 and install the database
continue to use fargate for the other services.

AWS Fargate supports EFS volumes, at last!

I can think about 3 ways to do this:
use a storage solutions compatible with containers workload (longhorn or portwork are good calls)
use RDS
use a distributed database that can have multiple copies of it's data (but you will have to take care of the case all the copies where shutdown)

[Fargate] [Volumes]: Allow at least EFS mounts to Fargate Containers.
This is some thing you can trust:
https://github.com/aws/containers-roadmap/issues/53
Until then you can:
Generate dump of Database periodically within the container.
With the help of AWS CLI/SDK, Upload same to S3.
Use dump to recover whenever required.

Related

How to setup ReactJs, NodeJs, Redis application on AWS

I am newbie in AWS and totally confused about the deploy. here i have
React for front-end , Nodejs for API, Mongodb for database and redis for session store.
Can i use 1 EC2 for every service ? or
Divide every service as different EC2
Can i use Elastic Beanstalk Environment?
Which is better option for scaling and update without downtime in future ?
Can I use 1 EC2 for every service?
Its depend on your case but the best approach is to utilize the underlying EC2 instance is to run multiple services on single EC2 for nodejs and front-end app, as nodejs container-based application take maximum advantage in this case. in this case, ECS blue-green deployment with the dynamic port of the container can help to scale with zero downtime.
Divide every service as different EC2
In nodejs based application this approach does not help you a lot where for Redis and mongo it make sense if you are planning for clustering and replica also these applications need persistent storage so will keep storage on each instance, so my suggestion is to keep redis and mongo DB in daemon mode and application in replica mode, as these are application that will do blue-green deployment not the redis or Db.
AWS provides two types of task to deal with such cases
REPLICA—
The replica scheduling strategy places and maintains the desired
number of tasks across your cluster. By default, the service scheduler
spreads tasks across Availability Zones. You can use task placement
strategies and constraints to customize task placement decisions. For
more information, see Replica.
DAEMON—
The daemon scheduling strategy deploys exactly one task on each active
container instance that meets all of the task placement constraints
that you specify in your cluster. When using this strategy, there is
no need to specify a desired number of tasks, a task placement
strategy, or use Service Auto Scaling policies. For more information
ecs_services

Deploying MEAN app on AWS ECS

I have successfully deployed a MEAN app on AWS ECS, but there are a couple things I don't have set-up properly.
1) If I spin up a new task, the Mongo data does not persist between the containers
2) Should my Mongo container and my frontend container be in the same task definition? This seems wrong because I feel like they should be able to scale independently of each other. But if they should be in separate task definitions, do I link them the same way?
Current Architecture:
1 Task Defintion
contains frontend container and mongo container which are linked
I did not define any mounts or volumes (which I assume is why data isn't persisting, but I am struggling to figure out how to properly set this up)
1 Cluster
1 service
contains load balancer and auto-scaling group (when this auto-scaling group creates a new task, I run into the issue of not having data persistence)
I guess what you assume is correct. Since you are not defining any mounts the data is not persistent. I recommend using Amazon EFS to Persist Data from Amazon ECS Containers.You can find step by step guide below to achieve the same.
Using Amazon EFS to Persist Data from Amazon ECS Containers

Dealing with AWS Elastic Beanstalk Multi-container databases and persistent storage

I'm new to both Elastic Beanstalk, EC2 and Docker and spent the last couple of weeks researching and playing around with it. I have a few questions that I'm finding difficult to find answers to elsewhere.
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
2) Is it better to use AWS RDS in production and then have an external database container locally?
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
I don't know if this is stated anywhere but I am fairly sure AWS does not intent for you to use EB's multi-container to run databases or anything that should run only once on your system. As their examples show, it is for you to have better control what the front end server will be.
If you want to run databases, or store files, you will either move to AWS ECS where you can better control this, or use multiple EB environment (e.g. create a worker-tier, single instance environment for running the database)
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
I have not used eb local run and instead use docker-compose, which allows me to properly run a proper environment locally, including my databases. Yes, you may need to duplicate some information between the docker-compose file, and the Dockerrun file, but once you set it up, you will see how powerful it is. Because you are still sharing the Dockerfiles, you can still assume things will run in a similar enough way once deployed.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
Yes, I think that is correct. EB assumes you will use RDS or dynamodb or something else, already centralized and managed.
2) Is it better to use AWS RDS in production and then have an external database container locally?
Yes, and by the way, rather than having EB manage the creation of the database, I find it a better practice for you to manually instantiate it so that it stays persistent after you kill your EB environments.
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
Yes, using S3 is the way to go for multiple reasons, but mostly because AWS manages and you can scale without you having to worry about it. In fact, you want your client to get or even post the files directly on S3, so your server does not have to do any work (note the server may need to sign the URL but that is about it).
If you really have an issue against S3 (for whatever reason), then you will also (like with the database) create a second, single instance EB environment with EBS to ensure you have a single instance. But compared to the S3 solution it won't scale very far, and will in fact be much more expensive than using S3.

How to understand Amazon ECS cluster

I recently tried to deploy docker containers using task definition by AWS. Along the way, I came across the following questions.
How to add an instance to a cluster? When creating a new cluster using Amazon ECS console, how to add a new ec2 instance to the new cluster. In other words, when launching a new ec2 instance, what config is needed in order to allocate it to a user created cluster under Amazon ECS.
How many ECS instances are needed in a cluster, and what are the factors?
If I have two instances (ins1, ins2) in a cluster, and my webapp, db containers are running in ins1. After I updated the running service (through http://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-service.html), I can see the newly created service is running in "ins2", before draining the old service in "ins1". My question is that after my webapp container allocated to another instance, the access IP address becomes another instance IP. How to prevent or what the solution to make the same IP address access to webapp? Not only IP, what about the data after changing to a new instance?
These are really three fairly different questions, so it might best to split them into different questions here accordingly - I'll try to provide an answer regardless:
Amazon ECS Container Instances are added indirectly, it's the job of the Amazon ECS Container Agent on each instance to register itself with the cluster created and named by you, see concepts and lifecycle for details. For this to work, you need follow the steps outlined in Launching an Amazon ECS Container Instance, be it manually or via automation. Be aware of step 10.:
By default, your container instance launches into your default
cluster. If you want to launch into your own cluster instead of the
default, choose the Advanced Details list and paste the following
script into the User data field, replacing your_cluster_name with the
name of your cluster.
#!/bin/bash
echo ECS_CLUSTER=your_cluster_name >> /etc/ecs/ecs.config
You only need a single instance for ECS to work as such, because the cluster itself is managed by AWS on your behalf. This wouldn't be sufficient for high availability scenarios though:
Because the container hosts are just regular Amazon EC2 instances, you would need to follow AWS best practices and spread them over two or three Availability Zones (AZ) so that a (rare) outage of an AZ doesn't impact your cluster, because ECS can migrate your containers to a different host instance (provided your cluster has sufficient spare capacity).
Many advanced clustering technologies that facilitate containers have their own service orchestration layers and usually require an uneven number >= 3 (service) instances for a high availability setup. You can read more about this in section Optimal Cluster Size within Administration for example (see also Running CoreOS with AWS EC2 Container Service).
This refers back to the high availability and service orchestration topics mentioned in 2. already, more precisely your are facing the problem of service discovery, which becomes more prevalent even when using container technologies in general and micro-services in particular:
To get familiar with this, I recommend Jeff Lindsay's Understanding Modern Service Discovery with Docker for an excellent overview specifically focused on your use case.
Jeff also maintains a containerized version of the increasingly popular Consul, which makes it simple for services to register themselves and to discover other services via a DNS or HTTP interface (see Running Consul in Docker and gliderlabs/docker-consul).

Boot strapping AWS auto scale instances

We are discussing at a client how to boot strap auto scale AWS instances. Essentially, a instance comes up with hardly anything on it. It has a generic startup script that asks somewhere "what am I supposed to do next?"
I'm thinking we can use amazon tags, and have the instance itself ask AWS using awscli tool set to find out it's role. This could give puppet info, environment info (dev/stage/prod for example) and so on. This should be doable with just the DescribeTags privilege. I'm facing resistance however.
I am looking for suggestions on how a fresh AWS instance can find out about it's own purpose, whether from AWS or perhaps from a service broker of some sort.
EC2 instances offer a feature called User Data meant to solve this problem. User Data executes a shell script to perform provisioning functions on new instances. A typical pattern is to use the User Data to download or clone a configuration management source repository, such as Chef, Puppet, or Ansible, and run it locally on the box to perform more complete provisioning.
As #e-j-brennan states, it's also common to prebundle an AMI that has already been provisioned. This approach is faster since no provisioning needs to happen at boot time, but is perhaps less flexible since the instance isn't customized.
You may also be interested in instance metadata, which exposes some data such as network details and tags via a URL path accessible only to the instance itself.
An instance doesn't have to come up with 'hardly anything on it' though. You can/should build your own custom AMI (Amazon machine image), with any and all software you need to have running on it, and when you need to auto-scale an instance, you boot it from the AMI you previously created and saved.
http://docs.aws.amazon.com/gettingstarted/latest/wah-linux/getting-started-create-custom-ami.html
I would recommend to use AWS Beanstalk for creating specific instances, this makes it easier since it will create the AutoScaling groups and Launch Configurations (Bootup code) which you can edit later. Also you only pay for EC2 instances and you can manage most of the things from Beanstalk console.