I have a Django app that runs on the AWS EB environment. With recent updates, I had to integrate django-rq and rqscheduler for some queue-based background tasks. This all works fine on localhost with commands rqworker and rqscheduler. But I am having real trouble finding a way to make it run on the AWS EB environment. My analysis says the only way to go is to use ElastiCache. Can anyone guide me in the right direction or any blog posts that could help me with this?
Yeah! So you're probably going to want to separate your persistent store (Redis) from your workers. This is really well abstracted in Heroku (not saying you should necessarily use them, but their UI reflects reality very well) with Resources (not restarted between deploys) and Dynos (restarted between deploys).
You should likely have an ElastiCache (or self-hosted Redis) instance for each of your deployed environments (production, staging, etc.) with any URLs/credentials via YAML. That way, you won't lose jobs when your service is rebooted (because Redis will still be alive) but you can deploy new code whenever you want!
Related
I'm new to web development and i wrote a small Flask API that uses Celery for message queue and Redis as the broker. I start redis with redis-server and Celery with celery -A application.celery worker --loglevel=info on my local machine and the app runs with no problem.
However i couldn't get it to work on AWS. Right now I'm deploying the app following the docs but when I try to send requests to my API I get internal server errors, which are probably related to Redis and Celery not working. I SSH'ed into the EC2 instance but since I'm new, couldn't find what to do to get the app working.
My questions are:
1) What do i do to start my application, Redis and Celery after deploying it to AWS? Does Elastic Beanstalk do it automatically or do I need to do some stuff?
2) Where do I find my app files? I think I'll need to install all the requirements manually from requirements.txt, and set up a virtualenv in the EC2 instance, is that right?
3) If I setup and install all the requirements in a virtualenv, will they persist if the EC2 instance changes? The command line tool for Elastic Beanstalk deployed the application automatically and created Load Balancer and Auto Scaling Group. Will the installations I make through the SSH be available when new instances are created, or do I need to manually do that everytime, or is there some other way?
4) I heard some people say that creating an EC2 instance and deploying manually is better than using Elastic Beanstalk. What does Elastic Beanstalk do for me? Is it better if I use Elastic Beanstalk or deploy manually?
Thanks for any help!
For the past week I was trying to do the exact same thing, so I'd thought I'd share everything I've learned. Although these answers are spread about stackoverflow/google, but I can help all the same.
1) To get a flask app running is easy, you can use the elastic beanstalk CLI. Generally, just follow the AWS documentation here, it's fairly straightforward. In terms of Redis/Celery, you start to get multiple things going on. Before you do your initial deploy, you'll probably want to setup the celery worker, you can use this stackoverflow answer on how to setup celery as a daemon. Be sure you read the script, you'll need to set your application name properly. The thing to note when you deploy to production via EBS is that your application will be hosted by apache, meaning some strange things will happen if you call your tasks via "some_task.delay", as the name of the celery app will be unknown. As far as I know, the only way to work around this properly is to use:
my_celery_object.send_task("flask_application_name.the_task", [param1, param2], ...)
Wherever you need to call tasks.
You can now prepare your redis cache. You can use anything, for this I'll just assume you want to use AWS ElasticCache (EC). Going to EC, you'll need to deploy a cache cluster with however many nodes you want. Once deployed you'll see it on the list under "Cache Clusters". Next, click the "X node" link that's in the table, you'll need to copy the endpoint url (and port!) to your celery application which you can learn about here.
So now that you have everything ready to deploy, you'll be sad to hear that the security thing I mentioned earlier will cause your application to fail on any task requests as the elastic cache cluster will be part of the wrong security group initially. Go ahead and deploy, this will create the security group you need along with your app and everything else. You can then find that security group under the EC2 dashboard, under Network & Security -> Security Groups. The name of the group should be the name of your environment, something like "myapp-env" is the default. Now modify the inbound rules and add a custom TCP rule setting the port number to your redis port and the source to "Anywhere", save. At this point, note the group name and go to your elastic cache. Click the Cach Clusters on the left, modify the CACHE CLUSTER (not the node) for the app, and update the VPC security group to the one you just noted and save.
Now celery will automatically connect to the redis cache as it will keep attempting to make connections for awhile. Otherwise you can always redeploy.
Hopefully you now have a functioning Flask/Celery app utilizing redis.
2) You shouldn't need to know the location of your app files on the EBS EC2 instance as it will automatically use a virtual environment and requirements.txt assuming you followed the instructions found here. However, at the time of writing this, you can always ssh to your EC2 instance at find your application files at:
/opt/python/current/app
3) I don't know how you mean "If I setup and install all the requirements in a virtualenv, will they persist if the EC2 instance changes?" As I said previously, if you followed the instructions on how to deploy an EBS environment for flask, then new instances that are deployed will automatically update their environment based on your requirements.txt
4) This is a matter of opinion. I have definitely heard not using EBS could be a better way to go, I personally have no opinion here as I have only used EBS. There have been some severe struggles (including trying to setup this application). What I hear some people do is deploy via EBS so that they can get a pre-configured and ready to go EC2 machine and then make an AMI from that machine, tear the EBS down, and then make an EC2 with the AMI. Regardless of the route you go, if you are planning to have a database backed server, I have learned (the hard way) that you shouldn't have EBS automatically attach the RDS. This is due to the fact that the RDS is then associated with the EBS application, so if you have to replace the resources, terminate it, etc., you'll lose the RDS (you can work around this of course, it's just a pain is all).
I've been looking into Mesos, Marathon and Chronos combo to host a large number of websites. In my head I should be able to type a few commands into my laptop, and wait about 30 minutes for the thing to build and deploy.
My only issue, is that my resources are scattered across multiple data centers, numerous cloud accounts, and about 6 on premises places. I see no reason why I can't control them all from my laptop -- (I have serious power and control issues when it comes to my hardware!)
I'm thinking that my best approach is to build the brains in the cloud, (zoo keeper and at least one master), and then add on the separate data centers, but I am yet to see any examples of a distributed cluster, where not all the nodes can talk to each other.
Can anyone recommend a way of doing this?
I've got a setup like this, that i'd like to recommend:
Source code, deployment scripts and dockerfiles in GIT
Each webservice has its own directory and comes together with a dockerfile to containerize it
A build script (shell script running docker builds) builds all the docker containers, of which all images are pushed to a docker image repository
A ansible deploy deploys all the containers remotely to a set of VPSes. (You use your own deployment procedure, that fits mesos/marathon)
As part of the process, a activeMQ broker is deployed to the cloud (yep, in a container). While deploying, it supplies each node with the URL of the broker they need to connect to. In your setup you could instead use ZooKeeper or etcd for example.
I am also using jenkins to do automatic rebuilds and to run deploys whenever there has been GIT commits, but they can also be done manually.
Rebuilds are lightning fast, and deploys dont take much time either. I can replicate everything I have in my repository endlessly and have zero configuration.
To be able to do a new deploy, all I need is a set of VPSs with docker daemons, and some datastores for persistence. Im not sure if this is something that you can replace with mesos, but ansible will definitely be able to install a mesos cloud for you onto your hardware.
All logging is being done with logstash, to a central logging server.
i have setup a 3 master, 5 slave, 1 gateway mesos/marathon/docker setup and documented here
https://github.com/debianmaster/Notes/wiki/Mesos-marathon-Docker-cluster-setup-on-RHEL-7-with-three-master
this may help you in understanding the load balancing / scaling across different machines in your data center
1) masters can also be used as slaves
2) mesos haproxy bridge script can be used for service discovery of the newly created services in the cluster
3) gateway haproxy is updated every min with new services that are created
This documentation has
1) master/slave setup
2) setting up haproxy that automatically reloads
3) setting up dockers
4) example service program
You should use Terraform to orchestrate your infrastructure as code.
Terraform has a lot of providers that allows you to manage different resources accross multiples clouds services and/or bare-metal resources such as vSphere.
You can start with the Getting Started Guide.
Does anybody know how can I sync the plugins of a Wordpress deployed to AWS Elastic Beanstalk between all deployed instances?
I can easily deploy a Wordpress to EBeanstalk (even AWS give us a template for this) but if I install a plugin, it is installed in only one instance. There's no guarantee that any other user (or even myself) will access that instance again with the autoscaling happening. I already tested it and I could not access it in another instances.
I know the uploads folder will have the same problem, but there are already plugins to sync it with S3.
Thank you!
Research "stateless wordpress". I found a good starting point on Mike Otreva's site. In short, the solution involves building an example AWS Elastic Beanstalk environment and then using GIT to push your WP build. You'll need to set-up up a local Xamp environment in order to be able to GIT push your build. Once you do it this way, you'll never update plugins from the dashboard, instead you'll update your local build and then push the updates to your environment. You might even want to install a plugin that stops all automatic updates. I hope that helps some.
As a sysadmin, i'm looking for an efficient way or best practices that you do on managing an ec2 instances with autoscaling.
How you manage automate this following scenario: (our environment is running with autoscaling, Elastic Load Balancing and cloudwatch)
patching the latest version of the rpm packages of the server for security reasons? like (yup update/upgrade)
making a configuration change of the Apache server like a change of the httpd.conf and apply it to all instances in the auto-scaling group?
how do you deploy the latest codes to your app to the server with less disruption in production?
how do you use puppet or chef to automate your admin task?
I would really appreciate if you have anything to share on how you automate your administration task with aws
Check out Amazon OpsWorks, the new Chef based DevOps tool for Amazon Web Services.
It gives you the ability to run custom Chef recipes on your instances in the different layers (Load Balancer, App servers, DB...), as well as to manage the deployment of your app from various source repositories (Git, Subversion..).
It supports auto-scaling based on load (like the auto-scaling that you are already using), as well as auto-scaling based on time, which is more complex to achieve with standard EC2 auto-scaling.
This is relatively a young service and not all functionality is available already, but it might be useful for your.
patching the latest version of the rpm packages of the server for
security reasons? like (yup update/upgrade)
You can use puppet or chef to create a cron job that takes care of this for you (the cron would in its most basic form download and or install updates via a bash script). You may want to automatically upgrade, or simply notify an admin via email so you can evaluate before apply updates.
making a configuration change of the Apache server like a change of
the httpd.conf and apply it to all instances in the auto-scaling
group?
I usually handle all of my configuration files through my Puppet manifest. You could setup each EC2 instance to pull updates from a Puppet Server, then you can roll out changes on demand. Part of this process should be updating the AMI stored in your AutoScale group (this is done with the Amazon Command Line tools).
how do you deploy the latest codes to your app to the server with less
disruption in production?
Test it in staging first! Also a neat trick is to versioned deployments, so each time you do a deployment it gets its own folder (/var/www/v1 /var/www/v2 etc) and once you have verified the deployment was successful you simply update a symlink to point to the lastest version (/var/www/current points to /var/www/v2).
OpsWorks handles all this sort of stuff for you so you can look into that if you don't want to do it all yourself.
how do you use puppet or chef to automate your admin task?
You can use Chef or Puppet to do all sorts of things, and anything they can't (or you don't know how to) do can be done via a bash/python script that you invoke from Chef or Puppet.
I normally do things like install packages, build custom packages, set permissions, download things, start services, manage configuration files, setup cron jobs etc
I would really appreciate if you have anything to share on how you automate your administration task with aws
Look into CloudFormation. This can help you setup all your servers and related services (think EC2, LBS, CloudWatch) through configuration files, thus helping you to automate your entire stack (not just the EC2's Operating System).
There are quite a few resources on deployments of AMI's on EC2. But are there any solutions to incremental code updates to a PHP/Java based website?
Suppose I have 10 EC2 instances all running PHP / Java based websites with docroots local to the instance. I may want to do numerous code deployments to it through out the day.
I don't want to create a new AMI copy and scale that up to new instances each time I have a code update.
Any leads on how to best do this would be greatly appreciated. We use subversion as our main code repository and in the past we've simply done an SVN update/co when we were on one to two servers.
Thanks.
You should check out Elastic Beanstalk. Essentially you just package up your WAR or other code file, upload it to a bucket via AWS's command line/Eclipse integration and the deployment is performed automatically.
http://aws.amazon.com/elasticbeanstalk/
Elastic Beanstalk is exactly designed to do this for you. We use the Elastic Beanstalk java/tomcat flavor but it also has support for php, ruby, python environment. It has web console that allows you to deploy code (it even keeps history of it), it also has git tool to deploy code from command line.
It also has monitoring, load balancer, auto scaling all built in. Only a few web form entries to control all these.
Have you considered using a tool designed to manage this sort of thing for you, Puppet is well regarded in this area.
Have a look here:
https://puppetlabs.com/puppet/what-is-puppet/
(No I am not a Puppet Labs employee :))
Capistrano is a great tool for deploying code to multiple servers at once. Chef and Puppet are great tools for setting up those servers with databases, webservers, etc.
Go for a Capistrano . Its a good way to deploy your code on multiple servers .
As already mentioned Elastic Beanstalk is a good option if you just want a webserver and don't want to worry about the details.
Also, take a look at AWS CodeDeploy. You can have much more control over the lifecycle of your instance and you'd be looking at something very similar to what you have now (a set of EC2 instances that you setup). You can even get automatic deployments on instance launch with Auto Scaling.
You can either use Capsitrano or TravisCI.