I have a Django application. I am using Celery to run long-running processes on the background. Both the application and celery workers run on the same machine.
Now we are moving our servers to AWS. On AWS, we want to create a setup like the following:
We have n EC2 instances that run the app servers, and we have m EC2 instances as workers. When we need to do a long-running process, app server sends this job to the worker, and the worker processes the job. But the job is dependent on Django models and the database.
How can we setup the workers to enable them run these django model dependent jobs?
This is not AWS specific.
You have to:
make sure every server has same version of app code
all workers spread across servers use same task broker and result backend
workers can connect to your DB (if it's needed)
More verbose advise for config needs additional info :)
Another approach to this would be to use EC2 Container Service, with two different running docker containers, one for the app and one for the worker.
Related
I have a node.js app that allows the user to run Octave code through an API by integrating with Octave CLI. The app has the following components:
API - used for uploading the .m (Octave) file and emitting a message to the SQS queue with a new task (to run the .m file).
SQS Queue - used for publishing and subscribing messages between the API and the Worker.
Worker - responsible for processing tasks from the SQS queue. For every message, it uses the Docker API to execute a new Octave Docker instance.
Octave Instance Overseer - an EC2 machine with the docker.sock exposed to the Worker, allowing the Worker to execute an Octave instance with the given parameters and wait for its output.
Currently, the API, the Worker, and the Octave Docker Instance run on a single EC2 machine, inside docker, and I want to scale this solution to handle multiple Workers and Octave Docker Instances.
Using the docker.sock made sense during the creation of the project, as everything was running in the same machine, and the docker.sock was already exposed to the worker just by binding the volume, but this doesn't seem the right approach for scaling the application.
Using AWS Fargate for abstracting the Octave Instance Overseer doesn't seem to be a good idea, as Fargate charges for a minimum of one minute, and most of the executions last only a couple of seconds. Another approach would be to have an EC2 machine only running the Octave Instance Overseer, and scale the number of EC2 machines needed.
What architecture or service do you recommend to efficiently run short-lived Octave Docker instances while scaling the solution?
I am newbie in AWS and totally confused about the deploy. here i have
React for front-end , Nodejs for API, Mongodb for database and redis for session store.
Can i use 1 EC2 for every service ? or
Divide every service as different EC2
Can i use Elastic Beanstalk Environment?
Which is better option for scaling and update without downtime in future ?
Can I use 1 EC2 for every service?
Its depend on your case but the best approach is to utilize the underlying EC2 instance is to run multiple services on single EC2 for nodejs and front-end app, as nodejs container-based application take maximum advantage in this case. in this case, ECS blue-green deployment with the dynamic port of the container can help to scale with zero downtime.
Divide every service as different EC2
In nodejs based application this approach does not help you a lot where for Redis and mongo it make sense if you are planning for clustering and replica also these applications need persistent storage so will keep storage on each instance, so my suggestion is to keep redis and mongo DB in daemon mode and application in replica mode, as these are application that will do blue-green deployment not the redis or Db.
AWS provides two types of task to deal with such cases
REPLICA—
The replica scheduling strategy places and maintains the desired
number of tasks across your cluster. By default, the service scheduler
spreads tasks across Availability Zones. You can use task placement
strategies and constraints to customize task placement decisions. For
more information, see Replica.
DAEMON—
The daemon scheduling strategy deploys exactly one task on each active
container instance that meets all of the task placement constraints
that you specify in your cluster. When using this strategy, there is
no need to specify a desired number of tasks, a task placement
strategy, or use Service Auto Scaling policies. For more information
ecs_services
I am using Amazon Web Services ECS (Elastic Container Service).
My task definition contains Application + Redis + Celery and these containers are defined in task definition. Automatic scaling is set, so at the moment there are three instances with same mirrored infrastructure. However, there is a demand for a Celery Beat instance for scheduled tasks, so Celery Beat would be a great tool, since Celery is already in my infrastructure.
But here is the problem: if I add Celery Beat container together with other containers (add it to task definition), it will be mirrored and multiple instances will execute same scheduled tasks at the same moment. What would be a solution to this infrastructure problem? Should I create a seperate service?
We use single-beat to solve this problem and it works like a charm:
Single-beat is a nice little application that ensures only one
instance of your process runs across your servers.
Such as celerybeat (or some kind of daily mail sender, orphan file
cleaner etc...) needs to be running only on one server, but if that
server gets down, well, you go and start it at another server etc.
You should still set the number of desired tasks for the service to 1.
You can use ECS Task Placement strategy to place your Celery Beat task and choose "One Task Per Host". Make sure to choose Desire state to "1". In this way, your celery beat task will run only in 1 container in your cluster.
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_run_task.html
The desired task is the number of tasks you want to run in the cluster. You may set the "Number of tasks" while configuring the service or in the run task section. You may refer the below links for references.
Configuring service:
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service.html
Run Task:
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_run_task.html
Let me know if you find any issue with it.
I'm planning on transfer an application from Heroku to AWS Elastic Beanstalk. On the Heroku, I have two different applications, one for staging and the other for production, and both have their web and workers dynos.
I'd like to setup something like that on AWS EB. I've read the difference about Web Tier and Worker Tier, but here goes some questions:
Do I setup two different applications for production and staging? Or the same application and two different environments? If so, I would have to create 4 environments, two for production web/worker and two for staging web/worker? What's the correct structure? I'll use the same Rails application for web and worker. In that case, will I have to deploy them separate or is there a command to deploy both environments together?
I'll use the same Rails application for web and worker.
This tells me that you should have a single application. Applications manage application versions, which is basically just deployment history.
You will want to create 4 environments. This allows you to "promote to prod" by cname swapping, or by deploying a previously deployed version.
You will have to deploy your web/worker separately, but you could very easily create a script that deploys to both at the same time.
For future reference, AWS Elastic Beanstalk later has created a solution for that called Environment Links:
https://aws.amazon.com/about-aws/whats-new/2015/11/aws-elastic-beanstalk-adds-support-for-environment-links/
With that feature, we're now able to link both environments with the same code (so we deploy it only once instead of twice). To make the worker run the worker process and the web to run the web server, you can set different environment variables and customize the EB initialization scripts to check for those vars and run specific process.
I have deployed my application using Elastic Beanstalk, since this gives me a very easy deployment flow, to multiple instances at once using the "git aws.push".
I like to add background processing support to my application. The background worker will use the same codebase, and simply start up a long lived php script that continuously looks for tasks to execute. What AWS should i use to create such a worker instance?
Should i use the EB for this aswell or should i try to setup a standard EC2 instance (since i dont need it to be public available) ? I guess thats the right way of doing it and then create a deployment flow that make it easy to deploy to both my EC2 worker instances and to Elastic beanstalk app? or is there a better way of doing this?
AWS EB now adds support for Worker Instances. They're just a different kind of environment which those two differences:
They don't have a cnamePrefix (whatever.elasticbeanstalk.com)
Instead, they've got a SQS queue bound
On each instance, they run a daemon called sqsd which basically poll their environments' sqs queue and forward it to the local http server.
I believe it is worth a try.
If the worker is just polling a queue for jobs and does not require an ELB, then all you need to do is work with EC2, SQS, and probably S3. You can start EC2 instances as part of an Auto-scaling group that, for example, is configured to scale as a function of depth of the SQS queue. When there is no work to do you can have minimum # of EC2, but if the queue gets deep, auto-scaling will spin up more.