How is `tmp` folder managed when using ECS Fargate? - amazon-web-services

I'm currently running some containers on production using AWS Fargate. I'm running an application that from time to time populates some files to /tmp folder.
That said, I want to know what happens to this /tmp folder. Is this something managed by Fargate (by ECS Container Agent, for example) or is it something that I need to manage by myself (using a cronjob to clear the files there, for example)?
NOTE 1: One way to handle that is to use s3 to handle that kind of behavior, however, the question is to know how Fargate behaves regarding /tmp folder.
NOTE 2: I don't need the files in /tmp folder, they just happen to appear there, and I want to know if I need to remove them or if ECS will do that for me.
I couldn't find anything about that in documentation. If someone points that subjects on the docs, I would be happy to accept the answer.

if I understand your question correctly, it looks like you want more precise control over temporary storage within your container.
I don't think there is anything special that ECS or Fagate does with /tmp folders on the FS within the container.
However, docker does have a notion of a tempfs mount. This allows you to designate a path that allows you to avoid storing data on the containers host machine.
https://docs.docker.com/storage/tmpfs/
ECS and Fargate recently added support for the tmpfs flag:
https://aws.amazon.com/about-aws/whats-new/2018/03/amazon-ecs-adds-support-for-shm-size-and-tmpfs-parameters/

If I understand correctly, after your Fargate task ends its running, all the storage goes away.
According to aws documentation, a Fargate task receives some storage when provisioned, but it is an ephemeral storage.
So unless you are using a forever running task, you don't need to deal with temporary files. They will be gone with the storage.
I hope this helps.

Related

Backup and restore files in ECS Tasks in AWS

Description
I have an ECS cluster that has multiple tasks. Every task is a Wordpress website. These task will automatically start and stop based on some Lambda functions. To persist the files when a task goes down for some reason I tried using EFS, but that is very slow when the burst credits ran out.
Now I use the volume type: Bind Mount (just using the normal filesystem, nothing fancy here). The websites are a lot faster but not persisted anymore. When an instance goes down the files of that website are gone. ECS starts the task again but without the files the websites break.
First solution
My first solution is to run an extra container in the task that makes backups once a day and stores it in S3. All files are automatically stored in a .tar.gz and uploaded to S3. This all works fine but I don't have a way yet to restore these backups yet. These things should be considered:
When a new tasks starts: need to check if current task/website already has a backup
If the latest backup should be restored: download .tar.gz from S3 and unzip it
To realize this I think it should be a bash script or something like it and run it on startup of a task?
Possible second solution
Another solution I thought about and I think is a lot cleaner is instead of having an extra container doing backups every day. Mount EFS to each task and have it sync data between the Bind Mount and EFS. This way EFS is a backup storage location instead of the working file system for my websites. Other pros: The tasks/websites will have more recent backups and I have more CPUs and Memory in my EC2 instances in my ECS cluster for other tasks.
Help?
I would like some opinions on the solutions above and maybe some advice if the second solution is any good and some tips on how to implement it. Any other advice would be helpful too!

Persistence in AWS Fargate Containers

I have 2 containers in a Fargate task definition. One of the containers is a database server. I'm wanting to persist the data directory. However, Fargate doesn't support the Source Path field when setting up a volume in the task definition. Does anyone know who to set up persistence in Fargate?
AWS Fargate at this moment is targeted to stateless container solutions only, but we never know, maybe AWS is already working in a solution for it.
Remember you are sharing the same host with other AWS Customers. Your instance could be terminated and restarted in another host anytime. You also can scale out your service anytime.
You can use any of the options below:
use RDS for general purpose databases.
If your DB is not available you can start a new EC2 and install the database
continue to use fargate for the other services.
AWS Fargate supports EFS volumes, at last!
I can think about 3 ways to do this:
use a storage solutions compatible with containers workload (longhorn or portwork are good calls)
use RDS
use a distributed database that can have multiple copies of it's data (but you will have to take care of the case all the copies where shutdown)
[Fargate] [Volumes]: Allow at least EFS mounts to Fargate Containers.
This is some thing you can trust:
https://github.com/aws/containers-roadmap/issues/53
Until then you can:
Generate dump of Database periodically within the container.
With the help of AWS CLI/SDK, Upload same to S3.
Use dump to recover whenever required.

Docker Volume vs AWS s3

Probably I am completely off in my assumptions but I am pretty new to both Docker and Aws and we have two applications which are Dockerized containers working under the same docker-compose network bridge.
Now, We have been looking for a way that these two containers can share some files. Since we are on the cloud, one suggestion was Amazon s3 Bucket. Which is great. But My questions is that since we are on Docker envionment does it not make more sense to share those files in a Docker Volume? I thought that's exactly what Docker Volume is. A mounted virtual place where files can be shared. At least that is my shallow and simplistic understanding after reading about Docker Volumes
So I do have some questions
Is my assumptions that AWS s3 bucket and Docker volumes provide similar functionality like comparing apples to apples?
If my assumption is correct then would a Docker Volume qualify to be called an object store?
If it does qualify to be called an object store then would it be wise to use Docker Volume as replacement of AWS s3?
If not, why?
Yes. They are different and even complementary. There's a plugin for Docker volumes on AWS here:
https://github.com/joeduffy/blocker
I wouldn't use the term object store. It's implemented as a filesystem mounted on the container.
No...
... for the reason stated in (1).

Dealing with AWS Elastic Beanstalk Multi-container databases and persistent storage

I'm new to both Elastic Beanstalk, EC2 and Docker and spent the last couple of weeks researching and playing around with it. I have a few questions that I'm finding difficult to find answers to elsewhere.
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
2) Is it better to use AWS RDS in production and then have an external database container locally?
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
I don't know if this is stated anywhere but I am fairly sure AWS does not intent for you to use EB's multi-container to run databases or anything that should run only once on your system. As their examples show, it is for you to have better control what the front end server will be.
If you want to run databases, or store files, you will either move to AWS ECS where you can better control this, or use multiple EB environment (e.g. create a worker-tier, single instance environment for running the database)
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
I have not used eb local run and instead use docker-compose, which allows me to properly run a proper environment locally, including my databases. Yes, you may need to duplicate some information between the docker-compose file, and the Dockerrun file, but once you set it up, you will see how powerful it is. Because you are still sharing the Dockerfiles, you can still assume things will run in a similar enough way once deployed.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
Yes, I think that is correct. EB assumes you will use RDS or dynamodb or something else, already centralized and managed.
2) Is it better to use AWS RDS in production and then have an external database container locally?
Yes, and by the way, rather than having EB manage the creation of the database, I find it a better practice for you to manually instantiate it so that it stays persistent after you kill your EB environments.
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
Yes, using S3 is the way to go for multiple reasons, but mostly because AWS manages and you can scale without you having to worry about it. In fact, you want your client to get or even post the files directly on S3, so your server does not have to do any work (note the server may need to sign the URL but that is about it).
If you really have an issue against S3 (for whatever reason), then you will also (like with the database) create a second, single instance EB environment with EBS to ensure you have a single instance. But compared to the S3 solution it won't scale very far, and will in fact be much more expensive than using S3.

Is s3cmd a safe option for sync EC2 instances?

I have the following problem: we are working on a project on AWS which will use autoscaling, so the EC2 instances will start and die very often. Freeze images, update the launch configurations, auto scalling groups, alarms, etc, takes a while and several things can go wrong.
I just want the new instances to sync the most recent code, so I was just thinking about fetching it from S3 using s3cmd once the instance finishes booting and manually updating it everytime we have new codes to be uploaded. So my doubts are:
Is it too much risky to store the code on s3? How secure are the files in there? Using the s3cmd encryption password it is unlikely someone will be able do decrypt them?
What other ooptions would be good for this? I was thinking about rsync, but then I think I would need to store the private key for the servers inside them, which I don't think its a good idea.
Thanks for any advices
You might be a candidate for Elastic Beanstalk - using a plain vanilla AMI.
Then package your application, use AWS's ebextensions tool to customize the instance when it is spun up. ebextensions will allow you to do anything you like to the image, in place, as it is deploying. change .htaccess, erase a file, place a cron job, whatever.
When you have code updates, package them, upload and do a rolling update.
All instances will use your latest code, including auto-scaled ones.
The key concept here is to never have your real data in the instance, where it might go away if an instance dies or is shut down.
Elastic Beanstalk will allow you to set up the load balancing, auto-scaling, monitoring, etc.