Programmaticaly configuring an RDS instance for Elastic Beanstalk - amazon-web-services

I am trying to automate the creation of the temporary environment, I am struggling with the creation of a related RDS instance inside Elastic Beanstalk.
I would like that when I call eb create envName the environment also spawns an RDS database.
One solution would be to manually do it. Another solution seems to involve '.ebextensions' Using Elastic Beanstalk .ebextensions to specify an RDS database , though .ebextensions are to be run at each deployment, this could fit through a specific hook, but I would like to have my config in .elasticbeanstalk/config.yml because this is the file that seems to have preconfiguration for when I call eb commands.
One of the reason to not but it in .ebextensions is also that in the project configuration, those are not environment specific, they describe the requirements for the app in any environment.

I personally believe it is a bad idea to create the database as part of the environment, as it is a lot harder to deal with stopping and recreating the environment if you need to. I believe it is a lot easier and safer to keep the database running even if you want to completely destroy the environment and start all over. I believe most experience users will recommend you to manage the RDS on your own (outside EB). It is really easy anyway.
That said (and given you didn't ask for that), you can simply create a database as part of the eb create command itself:
eb create -db -db.engine postgres -db.version 9.4 -db.user dbroot -db.i ...
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb3-create.html

Related

AWS beanstalk database choice

I tried to deploy my Django project to beanstalk (no docker). It seems to me that I can only choose AWS RDS as my database choice. Could I install PostgreSQL in the same beanstalk instance?
If so, how could I install PostgreSQL myself? Using RDS is just an additional cost for me. So, I am looking for cheap solution. Possibly SQLite3 is a solution? But I hope to use PostgreSQL.
Could I install PostgresSQL in the same beanstalk instance?
Yes, you could. But this will require a bit of "manual" setup and it will be not-scalable nor really fault tolerant. With RDS you pay premium, but you get fully managed, highly scalable and reliable database.
But of course, not all use cases require using RDS. In this case you could install PostgreSQL on your EB instance (I assume single-instance EB environment). For this you would need to setup a number of configuration options in .ebextensions. However, this process is not that easy as you would highly-couple your application deployments with the DB.
As a middle ground, I think it would be better to install PostgreSQL on a separate, dedicated instance. This way your EB instance and the DB are de-coupled, easier to manage, update, backup and scale.

Changes made via SSH WILL BE LOST if the instance is replaced by auto scaling

I need to install a gRPC PHP extension on my elastic beanstalk created EC2 instances. I have auto-scaling enabled, and when a new EC2 instance is kicked in, I lose all my installations.
From the documentation, I found two ways to fix this:
Create an instance and download everything required and take an image of that instance. And add the image id (AMI ID) in the Elastic Beanstalk environment (Under Configuration -> Instances). And every new instance created by auto-scaling will be from the image I provide. This approach never worked for me. Am I missing something here?
Write a config file in the .ebextensions to automatically install all the required extensions whenever a new instance is kicked in. And for this, we need to create a yaml/json file as per the documentation in cloud.google.com/php/grpc.
Can someone guide which approach should be taken? And help me create yaml/json file to automate the process for all the instances in auto scaling?
As per the AWS documentation here, to customize your Elastic Beanstalk environment you should use .ebextensions configuration files.
Creating .ebextensions provides the ability to completely customize the instances and environment that your application is running on/in, and makes upgrades, changes and/or additions to your instances and environment straightforward and efficient.
As a sidenote, ssh’ing to ElasticBeanstalk instances, and making on-instance changes, should be avoided. The autoscaling issue you are facing is one reason, however the other major reason is that making changes on the instance itself will cause the instances state to be out of sync with the EB state is expecting. If the state is out of sync, subsequent deployments will fail because the application version EB is expecting has drifted. Managing your application and environment through code and .ebextensions eliminates this issue.

Dealing with AWS Elastic Beanstalk Multi-container databases and persistent storage

I'm new to both Elastic Beanstalk, EC2 and Docker and spent the last couple of weeks researching and playing around with it. I have a few questions that I'm finding difficult to find answers to elsewhere.
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
2) Is it better to use AWS RDS in production and then have an external database container locally?
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
I don't know if this is stated anywhere but I am fairly sure AWS does not intent for you to use EB's multi-container to run databases or anything that should run only once on your system. As their examples show, it is for you to have better control what the front end server will be.
If you want to run databases, or store files, you will either move to AWS ECS where you can better control this, or use multiple EB environment (e.g. create a worker-tier, single instance environment for running the database)
One thing I like is that I am able to run eb local run to boot a local environment of what will be running in production. This seems to work well until it comes to databases.
I have not used eb local run and instead use docker-compose, which allows me to properly run a proper environment locally, including my databases. Yes, you may need to duplicate some information between the docker-compose file, and the Dockerrun file, but once you set it up, you will see how powerful it is. Because you are still sharing the Dockerfiles, you can still assume things will run in a similar enough way once deployed.
1) As far as I understand Elastic Beanstalk spawns instances running the containers inside, which could result in having multiple databases if Elastic Beanstalk spawns multiple instances? Is this correct?
Yes, I think that is correct. EB assumes you will use RDS or dynamodb or something else, already centralized and managed.
2) Is it better to use AWS RDS in production and then have an external database container locally?
Yes, and by the way, rather than having EB manage the creation of the database, I find it a better practice for you to manually instantiate it so that it stays persistent after you kill your EB environments.
3) In terms of persisting data, I read that EBS can only mount to one EC2 instance, how do people handle storing user files, or do they have their application push to a service such as S3 directly?
Yes, using S3 is the way to go for multiple reasons, but mostly because AWS manages and you can scale without you having to worry about it. In fact, you want your client to get or even post the files directly on S3, so your server does not have to do any work (note the server may need to sign the URL but that is about it).
If you really have an issue against S3 (for whatever reason), then you will also (like with the database) create a second, single instance EB environment with EBS to ensure you have a single instance. But compared to the S3 solution it won't scale very far, and will in fact be much more expensive than using S3.

Deploying Flask app that uses Celery and Redis to AWS: Elastic Beanstalk or EC2 directly?

I'm new to web development and i wrote a small Flask API that uses Celery for message queue and Redis as the broker. I start redis with redis-server and Celery with celery -A application.celery worker --loglevel=info on my local machine and the app runs with no problem.
However i couldn't get it to work on AWS. Right now I'm deploying the app following the docs but when I try to send requests to my API I get internal server errors, which are probably related to Redis and Celery not working. I SSH'ed into the EC2 instance but since I'm new, couldn't find what to do to get the app working.
My questions are:
1) What do i do to start my application, Redis and Celery after deploying it to AWS? Does Elastic Beanstalk do it automatically or do I need to do some stuff?
2) Where do I find my app files? I think I'll need to install all the requirements manually from requirements.txt, and set up a virtualenv in the EC2 instance, is that right?
3) If I setup and install all the requirements in a virtualenv, will they persist if the EC2 instance changes? The command line tool for Elastic Beanstalk deployed the application automatically and created Load Balancer and Auto Scaling Group. Will the installations I make through the SSH be available when new instances are created, or do I need to manually do that everytime, or is there some other way?
4) I heard some people say that creating an EC2 instance and deploying manually is better than using Elastic Beanstalk. What does Elastic Beanstalk do for me? Is it better if I use Elastic Beanstalk or deploy manually?
Thanks for any help!
For the past week I was trying to do the exact same thing, so I'd thought I'd share everything I've learned. Although these answers are spread about stackoverflow/google, but I can help all the same.
1) To get a flask app running is easy, you can use the elastic beanstalk CLI. Generally, just follow the AWS documentation here, it's fairly straightforward. In terms of Redis/Celery, you start to get multiple things going on. Before you do your initial deploy, you'll probably want to setup the celery worker, you can use this stackoverflow answer on how to setup celery as a daemon. Be sure you read the script, you'll need to set your application name properly. The thing to note when you deploy to production via EBS is that your application will be hosted by apache, meaning some strange things will happen if you call your tasks via "some_task.delay", as the name of the celery app will be unknown. As far as I know, the only way to work around this properly is to use:
my_celery_object.send_task("flask_application_name.the_task", [param1, param2], ...)
Wherever you need to call tasks.
You can now prepare your redis cache. You can use anything, for this I'll just assume you want to use AWS ElasticCache (EC). Going to EC, you'll need to deploy a cache cluster with however many nodes you want. Once deployed you'll see it on the list under "Cache Clusters". Next, click the "X node" link that's in the table, you'll need to copy the endpoint url (and port!) to your celery application which you can learn about here.
So now that you have everything ready to deploy, you'll be sad to hear that the security thing I mentioned earlier will cause your application to fail on any task requests as the elastic cache cluster will be part of the wrong security group initially. Go ahead and deploy, this will create the security group you need along with your app and everything else. You can then find that security group under the EC2 dashboard, under Network & Security -> Security Groups. The name of the group should be the name of your environment, something like "myapp-env" is the default. Now modify the inbound rules and add a custom TCP rule setting the port number to your redis port and the source to "Anywhere", save. At this point, note the group name and go to your elastic cache. Click the Cach Clusters on the left, modify the CACHE CLUSTER (not the node) for the app, and update the VPC security group to the one you just noted and save.
Now celery will automatically connect to the redis cache as it will keep attempting to make connections for awhile. Otherwise you can always redeploy.
Hopefully you now have a functioning Flask/Celery app utilizing redis.
2) You shouldn't need to know the location of your app files on the EBS EC2 instance as it will automatically use a virtual environment and requirements.txt assuming you followed the instructions found here. However, at the time of writing this, you can always ssh to your EC2 instance at find your application files at:
/opt/python/current/app
3) I don't know how you mean "If I setup and install all the requirements in a virtualenv, will they persist if the EC2 instance changes?" As I said previously, if you followed the instructions on how to deploy an EBS environment for flask, then new instances that are deployed will automatically update their environment based on your requirements.txt
4) This is a matter of opinion. I have definitely heard not using EBS could be a better way to go, I personally have no opinion here as I have only used EBS. There have been some severe struggles (including trying to setup this application). What I hear some people do is deploy via EBS so that they can get a pre-configured and ready to go EC2 machine and then make an AMI from that machine, tear the EBS down, and then make an EC2 with the AMI. Regardless of the route you go, if you are planning to have a database backed server, I have learned (the hard way) that you shouldn't have EBS automatically attach the RDS. This is due to the fact that the RDS is then associated with the EBS application, so if you have to replace the resources, terminate it, etc., you'll lose the RDS (you can work around this of course, it's just a pain is all).

AWS Elastic Beanstalk Backup & Recovery

I am new to AWS EB and I am trying to figure out how to backup and restore an entire EB environment. I created an AMI based on the EC2 instance generated by EB, and took a snapshot of RDS, also created by EB.
The problem I have is, how do I restore it, assuming that this is the correct approach of backup. Also, I am doing it manually, shouldn't there be an automated way of doing this within EB? By the way, when I created the AMI, it destroyed the source and the EB just created a new EC2 instance without all my changes.
How do I save & restore configuration changes to my application that impact both filesystem and database?
Unfortunately, Amazon AWS Elastic Beanstalk (EB) does not support restoring databases that contain live data, if those databases were created with EB. If you reload (AKA AWS "deploy") the EB saved configuration, you get a blank database!
I called them and they told me to create the RDS DB separately and update the application code to connect to the DB once you know it's name. If you restore the RDS DB it will have a new name too! So you have to update your code again to connect to it.
Also, if you code and environment is fine, but you want to restore your database, again it will have a new name and you will need to change your code.
How to change your code easily and automatically deploy it is a whole other question for which I don't have an answer yet.
So basically the RDS DB provisioning within Elastic Beanstalk has very limited uses, maybe coding and debugging and testing, but not live production use. :(
This is as of Jan 2015.
First go into your EB environment and save the current config. You should go to a running EC2 instance created by EB and make an Image. Then use that new AMI ID by going to the EB configuration and setting it. It will rebuild the environment tearing down all running instances and creating new ones.
For your RDS instance you should make a backup and restore with a new instance name as the docs say you will lose it if the environment is destroyed. You should probably just manually set the environment variables like RDS does and setup the proper security groups between RDS and EC2.
One option I think could work is just renaming the RDS instance name as the environment seems to break and then destroy the environment and create a new one with an attached RDS instance and then destroy that one and rename the old one to the new one's name which may work.
As always make proper backups before proceeding with any of these ideas.