Motivation for putting Docker containers inside an AWS EC2 instance - amazon-web-services

There seems to be a growing trend of developers setting up multiple Docker containers inside of their AWS EC2 instances. This seems counterintuitive to me. Why put a virtual machine inside another virtual machine? Why not just use smaller EC2 instances (ie make the instance the same size as your container and then not use Docker at all)? One argument I've heard is that Docker containers make it easy to bring your development environment exactly as-is to prod. Can't the same be done with Amazon Machine Images (AMIs)? Is there some sort of marginal cost savings? Is it so that developers can be cloud-agnostic and avoid vendor lock in? I understand the benefits of Docker on physical hardware, just not on EC2.

The main advantage of Docker related to your question is the concept of images. These are light-weight and easier to configure than AMIs. Also, note that Docker is simple because it runs on a VM (in this case EC2).
More info here - How is Docker different from a normal virtual machine?

Docker containers inside of their AWS EC2 instances. This seems counterintuitive to me. Why put a virtual machine inside another virtual machine?
Err... a docker container is not a VM. It is a way to isolate part of the resources (filesystem, CPU, memory) of your host (here an EC2 VM).
AMI (Amazon Machine Images) is just one of EC2 resources

Related

AWS EC2 instance vs Docker?

What is the difference between an AWS EC2 instance and a docker container instance? When should I use one over the other?
When you get an EC2 instance it will provide the base installation of that specific operating system with some additional AWS packages installed such as the SSM Agent.
There are then AMIs that are prepared for specific usecases such as SQL Server, or in this case pre-configured with AWS Orchestration services (either ECS or EKS) which have the usecase software installed.
If you're not familiar with Docker I would suggest running it in your local environment first so that you can become familiar with it. Yes people have been moving towards containers and serverless but you need to ensure you are able to support this in production.
With containers being deployed you will need to understand the orchestration layer that you're using. It's very easy to see containers as an alternative to a virtualisation layer, but there are many differences to how these operate.
Take a look at the What is Docker? page for further explanations.

Can I use create a share between an EC2 instances and my local machine?

Just to give you a context... I'm new to the aws world and all the services that provides.
I have a legacy application which I need to share some binarys with a client, and I was trying to use a ec2 instance (Amazon Linux AMI) with samba, to map it into a windows local machine.
I was able to establish a conection with another ec2 instances (same vpc), just as a tryout. But I wasn't able to do so with my windows machine or even with a linux vm I have.
The inbound rules for this concept ec2 instance was fully open (All traffic allowed).
Main question
Is it possible to do? Share a file system between a ec2-instances with a (over internet) local machine?
Just saying:
S3 storage isn't an option.
And in my region FSX still ain't implemented and for latency reasons is a no go.
Please ask as many questions you want, I'll try to anwser them as fast as I Can.
Kind Regads.
TL;DR - it's possible, but there's no 'simple' solution (in my opinion).
I thought of two possible solutions that you can implement, here we go ...
1: AWS EFS, AWS Direct Connect and Docker
A possible solution would be using AWS Elastic File System (EFS), AWS Direct Connect and a Docker Linux container.
Drawbacks
If it's the first time you encounter with the above AWS services or Docker, then it's going to be a bit of a journey to learn about them
EFS pricing - it's not so cheap, and you also need to consider the inbound and outbound traffic, it's best to use the calculator that is in the pricing page
EFS performance - if you only share files then it should be okay, but if you expect to get high speeds, then remember that it's not an EBS volume, so for higher speeds you need to pay more money
AWS Direct Connect pricing - you also need to take that into consideration
Security - I'm not sure how sensitive your data is, but you need to make sure you create a very strict VPC, with Security Groups and Network Access List rules - read about the VPC Security Best Practices
Steps to implement the solution
Follow the Walkthrough: Create and Mount a File System On-Premises with AWS Direct Connect and VPN, also, here are the steps on how to combine it with Docker
(Optional) To make it a bit easier - for Windows to "support" Linux file-system, you should use Windows Git Bash. If you're not sure how to use install 3rd-party apps in Windows Git Bash (like aws-vault) then read this blog post
Create an EFS in AWS, and mount it to your EC2 instance, read more about it here
Use AWS Direct Connect to connect to your VPC from your local Windows machine
Install Docker for Windows on your local machine
Create a Docker Volume, and mount the same EFS to that volume - a good example for this step
Test it - SSH to your EC2 instance, create a file on the EFS volume and then check in your local Docker Linux container that this file appears on the EFS volume
I omitted the security steps because it's up to you how strict you want your solution to be.
2: Using S3 as a shared file-system
You can try out this tool s3fs-fuse, but you'll still need to use a Docker Linux container since you're on Windows. I haven't tested it but it looks promising. You can read this blog post, it's a step-by-step tutorial on how to do it, and also shares some other possible solutions.

Docker Volume vs AWS s3

Probably I am completely off in my assumptions but I am pretty new to both Docker and Aws and we have two applications which are Dockerized containers working under the same docker-compose network bridge.
Now, We have been looking for a way that these two containers can share some files. Since we are on the cloud, one suggestion was Amazon s3 Bucket. Which is great. But My questions is that since we are on Docker envionment does it not make more sense to share those files in a Docker Volume? I thought that's exactly what Docker Volume is. A mounted virtual place where files can be shared. At least that is my shallow and simplistic understanding after reading about Docker Volumes
So I do have some questions
Is my assumptions that AWS s3 bucket and Docker volumes provide similar functionality like comparing apples to apples?
If my assumption is correct then would a Docker Volume qualify to be called an object store?
If it does qualify to be called an object store then would it be wise to use Docker Volume as replacement of AWS s3?
If not, why?
Yes. They are different and even complementary. There's a plugin for Docker volumes on AWS here:
https://github.com/joeduffy/blocker
I wouldn't use the term object store. It's implemented as a filesystem mounted on the container.
No...
... for the reason stated in (1).

Difference between Docker and AMI

In the context of AWS:
AMI is used to package software and can be deployed on EC2.
Docker can also be used to package software and can also be deployed to EC2.
What's the difference between both and how do I choose between them?
An AMI is an image. This is a whole machine that you can start new instances from. A docker container is more lightweight and portable. A docker container should be transportable between providers while an AMI is not (easily).
AMI's are VM images basically.
Docker containers are packaged mini-images that run on some VM in an isolated environment.
Eventhough this doesn't answer the question directly, but gives some background on how they are used.
One approach is you launch EC2 instances with Amazon AMI's (or can be any AMI) then run docker containers (with all dependencies) on top of it. With this approach, the docker image gets bloated over time and there is a container drift over time. Also time taken for the application to be up and running is more as the Ec2 has to be booted and docker has to bring up your app server.
Another approach is "Immutable Ec2 instances". With this approach, you use Amazon AMI as base and install all the dependencies ( use shell scripts or Ansible) and bake them in the AMI. We use Hashicorp Packer which is an amazing tool. Here the time taken for the application to be up and running is greatly reduced as all the dependencies ( java8 , tomcat, war file etc) are already installed in the AMI.
For production use case, use Packer to create AMI and use Terraform to launch cloud resources to use this AMI. Tie all this together in Jenkins pipeline.
This link has details about differences between Docker and AMI:-
https://forums.docker.com/t/how-would-you-differentiate-between-docker-vs-ec2-image/1235/2

Dealing with AWS Elastic Beanstalk Multi-container databases and persistent storage

I'm new to both Elastic Beanstalk, EC2 and Docker and spent the last couple of weeks researching and playing around with it. I have a few questions that I'm finding difficult to find answers to elsewhere.
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
2) Is it better to use AWS RDS in production and then have an external database container locally?
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
I don't know if this is stated anywhere but I am fairly sure AWS does not intent for you to use EB's multi-container to run databases or anything that should run only once on your system. As their examples show, it is for you to have better control what the front end server will be.
If you want to run databases, or store files, you will either move to AWS ECS where you can better control this, or use multiple EB environment (e.g. create a worker-tier, single instance environment for running the database)
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
I have not used eb local run and instead use docker-compose, which allows me to properly run a proper environment locally, including my databases. Yes, you may need to duplicate some information between the docker-compose file, and the Dockerrun file, but once you set it up, you will see how powerful it is. Because you are still sharing the Dockerfiles, you can still assume things will run in a similar enough way once deployed.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
Yes, I think that is correct. EB assumes you will use RDS or dynamodb or something else, already centralized and managed.
2) Is it better to use AWS RDS in production and then have an external database container locally?
Yes, and by the way, rather than having EB manage the creation of the database, I find it a better practice for you to manually instantiate it so that it stays persistent after you kill your EB environments.
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
Yes, using S3 is the way to go for multiple reasons, but mostly because AWS manages and you can scale without you having to worry about it. In fact, you want your client to get or even post the files directly on S3, so your server does not have to do any work (note the server may need to sign the URL but that is about it).
If you really have an issue against S3 (for whatever reason), then you will also (like with the database) create a second, single instance EB environment with EBS to ensure you have a single instance. But compared to the S3 solution it won't scale very far, and will in fact be much more expensive than using S3.