Dealing with AWS Elastic Beanstalk Multi-container databases and persistent storage - amazon-web-services

I'm new to both Elastic Beanstalk, EC2 and Docker and spent the last couple of weeks researching and playing around with it. I have a few questions that I'm finding difficult to find answers to elsewhere.
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
2) Is it better to use AWS RDS in production and then have an external database container locally?
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?

I don't know if this is stated anywhere but I am fairly sure AWS does not intent for you to use EB's multi-container to run databases or anything that should run only once on your system. As their examples show, it is for you to have better control what the front end server will be.
If you want to run databases, or store files, you will either move to AWS ECS where you can better control this, or use multiple EB environment (e.g. create a worker-tier, single instance environment for running the database)
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
I have not used eb local run and instead use docker-compose, which allows me to properly run a proper environment locally, including my databases. Yes, you may need to duplicate some information between the docker-compose file, and the Dockerrun file, but once you set it up, you will see how powerful it is. Because you are still sharing the Dockerfiles, you can still assume things will run in a similar enough way once deployed.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
Yes, I think that is correct. EB assumes you will use RDS or dynamodb or something else, already centralized and managed.
2) Is it better to use AWS RDS in production and then have an external database container locally?
Yes, and by the way, rather than having EB manage the creation of the database, I find it a better practice for you to manually instantiate it so that it stays persistent after you kill your EB environments.
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
Yes, using S3 is the way to go for multiple reasons, but mostly because AWS manages and you can scale without you having to worry about it. In fact, you want your client to get or even post the files directly on S3, so your server does not have to do any work (note the server may need to sign the URL but that is about it).
If you really have an issue against S3 (for whatever reason), then you will also (like with the database) create a second, single instance EB environment with EBS to ensure you have a single instance. But compared to the S3 solution it won't scale very far, and will in fact be much more expensive than using S3.

Related

Programmaticaly configuring an RDS instance for Elastic Beanstalk

I am trying to automate the creation of the temporary environment, I am struggling with the creation of a related RDS instance inside Elastic Beanstalk.
I would like that when I call eb create envName the environment also spawns an RDS database.
One solution would be to manually do it. Another solution seems to involve '.ebextensions' Using Elastic Beanstalk .ebextensions to specify an RDS database , though .ebextensions are to be run at each deployment, this could fit through a specific hook, but I would like to have my config in .elasticbeanstalk/config.yml because this is the file that seems to have preconfiguration for when I call eb commands.
One of the reason to not but it in .ebextensions is also that in the project configuration, those are not environment specific, they describe the requirements for the app in any environment.
I personally believe it is a bad idea to create the database as part of the environment, as it is a lot harder to deal with stopping and recreating the environment if you need to. I believe it is a lot easier and safer to keep the database running even if you want to completely destroy the environment and start all over. I believe most experience users will recommend you to manage the RDS on your own (outside EB). It is really easy anyway.
That said (and given you didn't ask for that), you can simply create a database as part of the eb create command itself:
eb create -db -db.engine postgres -db.version 9.4 -db.user dbroot -db.i ...
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb3-create.html

How to create machine images that are identical between staging and production, but need different files (SSL certificates)?

I'm using Packer, and I'm new to creating machine images. Although I've created and deployed Docker containers before.
One concept I'd like to apply to the machine image building that I've found useful with Docker images is using the same exact image for staging testing that gets deployed to production. The different environments behave differently due to different environment variable values being passed in on startup, which in the case of Docker containers is often handled by a startup script ("entrypoint" in Docker terminology).
This has worked fine for me, but now I need to handle SSL certificates (actual files) being different between staging and production. In the case of Docker containers, you could just mount different volumes to the container. But I can't do that with machine images.
So how do people handle this scenario with machine images? Should I store these important files encrypted externally, and curl them in a startup script?
You could consider using a configuration management tool such as Ansible or Puppet to do any environment/host specific configuration you need once Packer has deployed the bulk of the VM.
Alternatively you could do as you mentioned and simply have a startup script curl the appropriate SSL certs (or any other environment specific files/config) that are needed from some location. Considering you've tagged your question with amazon-web-services you could use separate, private S3 buckets for testing or production and only allow certain instances access to the relevant buckets via IAM roles, protecting that data from being viewed by others or the wrong environment but also reducing the need to encrypt the data and then manage keys as well.
When you launch EC2 instances using your AMI, you can specify tags. Inside instances you can use AWS CLI to read these tags, so you can craft a script to run when the system starts and load whatever external files as you want based on the tag values (as #ydaetskcoR suggested from a private S3 bucket).
This is also useful: Find out the instance id from within an ec2 machine

Deploying Flask app that uses Celery and Redis to AWS: Elastic Beanstalk or EC2 directly?

I'm new to web development and i wrote a small Flask API that uses Celery for message queue and Redis as the broker. I start redis with redis-server and Celery with celery -A application.celery worker --loglevel=info on my local machine and the app runs with no problem.
However i couldn't get it to work on AWS. Right now I'm deploying the app following the docs but when I try to send requests to my API I get internal server errors, which are probably related to Redis and Celery not working. I SSH'ed into the EC2 instance but since I'm new, couldn't find what to do to get the app working.
My questions are:
1) What do i do to start my application, Redis and Celery after deploying it to AWS? Does Elastic Beanstalk do it automatically or do I need to do some stuff?
2) Where do I find my app files? I think I'll need to install all the requirements manually from requirements.txt, and set up a virtualenv in the EC2 instance, is that right?
3) If I setup and install all the requirements in a virtualenv, will they persist if the EC2 instance changes? The command line tool for Elastic Beanstalk deployed the application automatically and created Load Balancer and Auto Scaling Group. Will the installations I make through the SSH be available when new instances are created, or do I need to manually do that everytime, or is there some other way?
4) I heard some people say that creating an EC2 instance and deploying manually is better than using Elastic Beanstalk. What does Elastic Beanstalk do for me? Is it better if I use Elastic Beanstalk or deploy manually?
Thanks for any help!
For the past week I was trying to do the exact same thing, so I'd thought I'd share everything I've learned. Although these answers are spread about stackoverflow/google, but I can help all the same.
1) To get a flask app running is easy, you can use the elastic beanstalk CLI. Generally, just follow the AWS documentation here, it's fairly straightforward. In terms of Redis/Celery, you start to get multiple things going on. Before you do your initial deploy, you'll probably want to setup the celery worker, you can use this stackoverflow answer on how to setup celery as a daemon. Be sure you read the script, you'll need to set your application name properly. The thing to note when you deploy to production via EBS is that your application will be hosted by apache, meaning some strange things will happen if you call your tasks via "some_task.delay", as the name of the celery app will be unknown. As far as I know, the only way to work around this properly is to use:
my_celery_object.send_task("flask_application_name.the_task", [param1, param2], ...)
Wherever you need to call tasks.
You can now prepare your redis cache. You can use anything, for this I'll just assume you want to use AWS ElasticCache (EC). Going to EC, you'll need to deploy a cache cluster with however many nodes you want. Once deployed you'll see it on the list under "Cache Clusters". Next, click the "X node" link that's in the table, you'll need to copy the endpoint url (and port!) to your celery application which you can learn about here.
So now that you have everything ready to deploy, you'll be sad to hear that the security thing I mentioned earlier will cause your application to fail on any task requests as the elastic cache cluster will be part of the wrong security group initially. Go ahead and deploy, this will create the security group you need along with your app and everything else. You can then find that security group under the EC2 dashboard, under Network & Security -> Security Groups. The name of the group should be the name of your environment, something like "myapp-env" is the default. Now modify the inbound rules and add a custom TCP rule setting the port number to your redis port and the source to "Anywhere", save. At this point, note the group name and go to your elastic cache. Click the Cach Clusters on the left, modify the CACHE CLUSTER (not the node) for the app, and update the VPC security group to the one you just noted and save.
Now celery will automatically connect to the redis cache as it will keep attempting to make connections for awhile. Otherwise you can always redeploy.
Hopefully you now have a functioning Flask/Celery app utilizing redis.
2) You shouldn't need to know the location of your app files on the EBS EC2 instance as it will automatically use a virtual environment and requirements.txt assuming you followed the instructions found here. However, at the time of writing this, you can always ssh to your EC2 instance at find your application files at:
/opt/python/current/app
3) I don't know how you mean "If I setup and install all the requirements in a virtualenv, will they persist if the EC2 instance changes?" As I said previously, if you followed the instructions on how to deploy an EBS environment for flask, then new instances that are deployed will automatically update their environment based on your requirements.txt
4) This is a matter of opinion. I have definitely heard not using EBS could be a better way to go, I personally have no opinion here as I have only used EBS. There have been some severe struggles (including trying to setup this application). What I hear some people do is deploy via EBS so that they can get a pre-configured and ready to go EC2 machine and then make an AMI from that machine, tear the EBS down, and then make an EC2 with the AMI. Regardless of the route you go, if you are planning to have a database backed server, I have learned (the hard way) that you shouldn't have EBS automatically attach the RDS. This is due to the fact that the RDS is then associated with the EBS application, so if you have to replace the resources, terminate it, etc., you'll lose the RDS (you can work around this of course, it's just a pain is all).

Shared File Systems between multiple AWS EC2 instances

I have a couple of windows server instances running on Amazon EC2 and would like to make them a bit more fault tolerant by running a duplicate instance with load balancers.
The problem is the specific data, as an example it does no good to fail over from one web server to another web server if the contents of the document root i.e. C:/htdocs/ (Apache) or C:/Repositories (VisualSvn Server) are not identical.
Is there a way to share a volume across two or more instances?
My idea is share folder between EC2 istances:
I read it's not possible to attach the same EBS volume to multiple instances. I believe also AWS is not NFS friendly either in case I want to mount them across NFS.
And finally, I've also checked S3 bucket mounted with s3fs but I found out it's not a good option too.
Can anyone help point me in the right direction?
You are right, at the moment it is not possible to add an EBS volume to multiple instances. To create a common storage for all instances, there are options like NFS, mounting S3 buckets or using a distributed cluster filesystem like GlusterFS.
However in most cases you can simplify your setup. Try to offload static assets to another (static) domain or even host it on an website-enabled S3 bucket. This way you only have to care about the dynamic application logic or scripts on your app servers.
Also try to use some automated deployment and/or configuration management tools. With these you can for example create new machines easily, or you can use them to deploy the latest code on your machines.

Is s3cmd a safe option for sync EC2 instances?

I have the following problem: we are working on a project on AWS which will use autoscaling, so the EC2 instances will start and die very often. Freeze images, update the launch configurations, auto scalling groups, alarms, etc, takes a while and several things can go wrong.
I just want the new instances to sync the most recent code, so I was just thinking about fetching it from S3 using s3cmd once the instance finishes booting and manually updating it everytime we have new codes to be uploaded. So my doubts are:
Is it too much risky to store the code on s3? How secure are the files in there? Using the s3cmd encryption password it is unlikely someone will be able do decrypt them?
What other ooptions would be good for this? I was thinking about rsync, but then I think I would need to store the private key for the servers inside them, which I don't think its a good idea.
Thanks for any advices
You might be a candidate for Elastic Beanstalk - using a plain vanilla AMI.
Then package your application, use AWS's ebextensions tool to customize the instance when it is spun up. ebextensions will allow you to do anything you like to the image, in place, as it is deploying. change .htaccess, erase a file, place a cron job, whatever.
When you have code updates, package them, upload and do a rolling update.
All instances will use your latest code, including auto-scaled ones.
The key concept here is to never have your real data in the instance, where it might go away if an instance dies or is shut down.
Elastic Beanstalk will allow you to set up the load balancing, auto-scaling, monitoring, etc.