How to set leader attribute to an AWS Beanstalk instance? - amazon-web-services

I am having the below configuration in AWS Beanstalk
Environment type: Load balanced, auto scaling
Number instances: 1 - 4
When a new instance is created, crontab is added for the new instance. So duplication of crons are executing. How can I set the crontab to rub only in one instance?
I am using .ebextensions in my project.

You can't specify which instance is assigned the leader flag; there is an election process that determines which instance "wins".
That being said, you can use the leader_only flag in your .ebextensions/crontab.config file when you create the crontab. It might look something like this:
container_commands:
01_create_crontab:
command: "> /etc/cron.d/mycrontab && chmod 644 /etc/cron.d/mycrontab"
leader_only: true

Rather than relying on setting up cronjobs on deploy, which is unreliable (AWS may change 'leaders' after deploy), you should think about checking or setting the leader instance in the actual job code.
See https://github.com/dignoe/whenever-elasticbeanstalk/blob/master/bin/ensure_one_cron_leader , it's Ruby, not PHP, but the code is readable.
Deploy the crontab definition to all instances, but when the job is triggered:
* use distributed Mutex lock (either Redis/ElastiCache or database backed) - only one worker will be able to pass
* OR, may be more complicated to code: check on all instances if they are leaders, if there is none, set one, and then continue the job only on the leader.
Alternatively you'd have to switch to using Worker Tier or SNS/SQS systems. Set up the schedule in cron.yml, define a POST endpoint that accepts request from localhost only (there will be a local daemon process running on the worker server, that listens for SQS messages and triggers jobs => POST). Not an ideal solution but it's a trade-off, there are other benefits of AWS EB platform.

Related

How to run cron job only on single instance in AWS AutoScaling?

I have scheduled 2 cronjobs for my application.
My Application server is in an autoscaling group and I kept a minimum of 2 instances because of High availability. Everything working is fine but cron job is running multiple times because of 2 instances in autoscaling.
I could not limit the instance size to 1 because already my application in the production environment I prefer to have HA.
How should I have to limit execute cron job on a single instance? or should i have to use other services like AWS Lamda or AWS ELasticBeanstalk
Firstly you should consider whether running the crons on these instances is suitable. If you're trying to keep this highly available and it is directly interacted via customers what will the impact of the crons performance be?
Perhaps consider using a separate autoscaling group or instance with a total of 1 instances to run these crons? You could launch the instance or update the autoscaling group just before the cron needs to run and then automate the shutdown after it has completed.
Otherwise you would need to consider using a locking mechanism for your script. By using this your script write a lock to confirm that it is in process, at the beginning of the script run it would check whether there was any script lock in progress. To further prevent the chance of a collision between multiple servers consider adding jitter (random seconds of sleep) to the start of your script.
Suitable technologies for writing a lock are below:
DynamoDB using strongly consistent reads.
EFS for a Linux application, or FSX for a Windows application.
S3 using strong consistency.
Solutions suggested by Chris Williams sound reasonable if using lambda function is not an option.
One way to simulate cron job is by using CloudWatch Events (now known as EventBridge) in conjunction with AWS Lambda.
First you need to write a Lambda function with the code that needs to be executed on a schedule. Lambda supports cron expressions.
You can then use Schedule Expressions with EventBridge/CloudWatch Event in the same way as a cron tab and mention the Lambda function as target.
you can enable termination protection on of the instance. Attach necessary role & permission for system manager. once the instance is available under managed instance under system manager you can create a schedule event in cloudwatch to run ssm documents. if you are running a bash script convert that to ssm document and set this doc as targate. or you can use shellscript document for running commands

What is the easiest way to launch parallel jobs on AWS?

My use case is as follows:
I have a python script which:
1. reads a file from S3
2. processes the file and outputs a new file
3. saves the output file to S3 (or maybe a database)
The python script has some dependencies which are managed via virtualenv.
What is the recommended/easiest way of running these scripts in parallel on AWS?
I see the following options:
AWS Batch: Looks really complicated - I have to build my own Docker container, set up 3 different users, it's not easy to debug.
AWS Lambda: A bit easier to set up, but I still have to wrap my script up into a Lambda function. Debugging doesn't seem too straightforward
Slurm on manually spun up EC2 instances - From a user perspective, this is ideal - all I would have to do is just create a jobs.sbatch file which loads the virtualenv and runs the script. The main drawback is that I have to install and configure slurm.
What is the recommended way of handing this workflow?
Lambda will be suitable for you because you won't have to look at scaling nor get into setting up all the things. About the debugging, you can easily do it using sls wsgi serve
I think you can use a publish/subscribe mechanism by using an SQS queue containing the object key to work on. Then you can have a group of EC2 instances or ECS each subscribing the queue and performing the single operation. With the queue you ensure each process work on a single instance of the problem. I think it is possible to create an auto scaling group in ECS and you probably can change the number of machines to tune the performance/cost.

Celery Beat on Amazon ECS

I am using Amazon Web Services ECS (Elastic Container Service).
My task definition contains Application + Redis + Celery and these containers are defined in task definition. Automatic scaling is set, so at the moment there are three instances with same mirrored infrastructure. However, there is a demand for a Celery Beat instance for scheduled tasks, so Celery Beat would be a great tool, since Celery is already in my infrastructure.
But here is the problem: if I add Celery Beat container together with other containers (add it to task definition), it will be mirrored and multiple instances will execute same scheduled tasks at the same moment. What would be a solution to this infrastructure problem? Should I create a seperate service?
We use single-beat to solve this problem and it works like a charm:
Single-beat is a nice little application that ensures only one
instance of your process runs across your servers.
Such as celerybeat (or some kind of daily mail sender, orphan file
cleaner etc...) needs to be running only on one server, but if that
server gets down, well, you go and start it at another server etc.
You should still set the number of desired tasks for the service to 1.
You can use ECS Task Placement strategy to place your Celery Beat task and choose "One Task Per Host". Make sure to choose Desire state to "1". In this way, your celery beat task will run only in 1 container in your cluster.
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_run_task.html
The desired task is the number of tasks you want to run in the cluster. You may set the "Number of tasks" while configuring the service or in the run task section. You may refer the below links for references.
Configuring service:
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service.html
Run Task:
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_run_task.html
Let me know if you find any issue with it.

Autoscaling EC2: Launch Webserver on Spun-up Instance

I seem to not understand one central point of AWS Autoscaling:
I created an AMI of an (Ubuntu) EC2 instance having my web-server installed, and I use this AMI as the launch-configuration for my Autoscaling group.
But when Autoscaling decides to Spin up a new instance, how am I supposed to launch my webserver on that instance. Am I supposed to write some start-up scripts or what is best practice to start a process of a newly spun-up instance from Autoscaling?
When I deploy an application (PostgreSQL, Elasticsearch, whatever) to an EC2 instance, I usually do it intending on being able to repeat the process. So my first step is to create an initial deployment script which will do as much of the install and setup process as possible, without needing to know the IP address, hostname, amount of memory, number of processors, etc. Basically, as much as I can without needing to know anything that can change from one instance to the next or upon shutdown / restart.
Once that is stable, I create an AMI of it.
I then create an initialization script which I use in the launch-configuration and have it execute that script on the previously created AMI.
That's for highly configured applications. If you're just going with default settings (e.g. IP address = 0.0.0.0), than yes, I would simply set 'sudo update-rc.d <> defaults 95 10', so that it runs on startup.
Then create the AMI. When you create a new instance from that AMI, the webserver should startup by default. If it doesn't, I would look at whether or not you really set the init.d script to do so.
Launching a new instance from an AMI should be no different than booting up a previously shutdown instance.
By the way, as a matter of practice when creating these scripts, I also do a few things to make things much cleaner for me:
1) create modules in separate bash scripts (e.g. creating user accounts, set environment variables, etc) for repeatability
2) Each deployment script starts by downloading and installing the AWS CLI
3) Every EC2 instance is launched with an IAM role that has S3 read access, IAM SSH describe rights, EC2 Address allocation/association, etc.
4) Load all of the scripts onto S3 and then have the deployment / initialization scripts download the necessary bash module scripts, chmod +x and execute them. It's as close to OOP I can get without overdoing it, but it creates really clean bash scripts. The top level launch / initialization scripts for the most part just download individual scripts from S3 and execute them.
5) I source all of the modules instead of simply executing them. That way bash shares variables.
6) Make the linux account creation part of the initialization script (not the AMI). Using the CLI, you can query for users, grep for their public SSH key requested from AWS, create their accounts and have it ready for them to login automagically.
This way, when you need to change something (i.e. change the version of an application, change a configuration, etc.) you simply modify the module script and if it changes the AMI, re-launch and re-AMI. Otherwise, if it just just changes the instance-specifically, than just launch the AMI with the new initialization script.
Hope that helps...

PHP AWS Elastic Beanstalk background workers

I have deployed my application using Elastic Beanstalk, since this gives me a very easy deployment flow, to multiple instances at once using the "git aws.push".
I like to add background processing support to my application. The background worker will use the same codebase, and simply start up a long lived php script that continuously looks for tasks to execute. What AWS should i use to create such a worker instance?
Should i use the EB for this aswell or should i try to setup a standard EC2 instance (since i dont need it to be public available) ? I guess thats the right way of doing it and then create a deployment flow that make it easy to deploy to both my EC2 worker instances and to Elastic beanstalk app? or is there a better way of doing this?
AWS EB now adds support for Worker Instances. They're just a different kind of environment which those two differences:
They don't have a cnamePrefix (whatever.elasticbeanstalk.com)
Instead, they've got a SQS queue bound
On each instance, they run a daemon called sqsd which basically poll their environments' sqs queue and forward it to the local http server.
I believe it is worth a try.
If the worker is just polling a queue for jobs and does not require an ELB, then all you need to do is work with EC2, SQS, and probably S3. You can start EC2 instances as part of an Auto-scaling group that, for example, is configured to scale as a function of depth of the SQS queue. When there is no work to do you can have minimum # of EC2, but if the queue gets deep, auto-scaling will spin up more.