What is the recommend way/pattern for setting up a High Availability (multiple Availability Zones) periodic task (probably triggered using Cron) on AWS?
I would like to have the software installed on multiple EC2 instances in multiple Availability Zones but only have the task run on a single instance at a time. It doesn't matter which instance.
Before moving to AWS, we used to use database locking in a MySQL instance - only the instance that successfully creates a lock would run.
But there must be a better way on AWS? Particularly if there is no requirement for a database.
Thanks!
nick.
Since I first asked this question, it is now possible to use CloudWatch Events to schedule periodic events. Events can be:
A fixed rate of N minutes/hours/days
Cron expression
The targets can be:
An AWS Lambda function
Post to an SNS topic
Write to an SQS queue
Write to a Kinesis stream
Perform an action on EC2/EBS instance
SQS could then be used to notify a single instance in a cluster of machines in multiple availability zones to perform an action.
There is more information here:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/ScheduledEvents.html
Although it does not include a resilience/availability statement of what happens if an Availability Zone goes down.
One suggested solution that has been suggested to me is to use an Auto Scaling Group, with the maximum and minimum number of instances set to 1. This means that if an availability zone goes offline, the ASG will cause a new instance in another zone to be launched.
This technique was briefly covered on the Architecting on AWS training course, but I don't know how reliable it is.
Amazon just released the solution to your problem: Beanstalk worker tier periodic tasks:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-periodictasks
It basically relies on a yaml file listing cron schedules calling the API you want:
To invoke periodic tasks, your application source bundle must include a cron.yaml file at the root level. The file must contain information about the periodic tasks you want to schedule. Specify this information using standard crontab syntax.
It really depends on your requirements.
You could post your tasks to a SQS queue and have your instances (possibly an autoscaling group spread across different zones) poll that queue. SQS' at least (and generally only) once semantics could be a problem here if it is critical for you that tasks get executed only once. If that's the case, you could easily use a DynamoDB table and conditional writes. Or, if you're more after a full-fledged fault-tolerant solution, you might give airbnb chronos a try.
Related
I have a (golang web server) service running on AWS on a EC2 (no auto scaling). This service has a few cron jobs that runs throughout the day and these jobs starts when the service starts.
I would like to take advantage of auto scaling in some form on AWS. Been looking at ECS and Beanstalk.
When I add auto scaling I need the cron job to only execute on one of the scaled services due to rate limits on external APIs. Right now the cron job is tightly coupled within the service and I am looking for an option that does not require moving the cron job to its own service.
How can I achieve this in a good way using AWS?
You're going to get this problem as a general issue in any scalable application where crons cannot / should not run multiple times. It's not really AWS specific. I'm not sure to what extent you want to keep things coupled or how your crons are currently run but here are a few suggestions that might work for you:
Create a "cron runner" instance with a limit to run crons on
You could create a separate ECS service which has no autoscaling and a fixed value of 1 instance. This instance would run the same copy of your code as your "normal" instances and would run crons. You would turn crons off on your "normal" instances. You might find that this can be a very small instance since it doesn't handle any web traffic.
Create a "cron trigger" instance which fires off crons remotely
Here you create one "trigger" instance which sends a request to your normal instances through an ALB. Because your ALB will route the request to 1 of the servers behind it the cron only gets run once. One watch out with this is that if your cron is long running, you may need to consider your request timeouts. You'll also have to think about retries etc but I assume you already have a process that can be adapted for that.
The above solutions can be adapted with message queues etc but the basis of both is that there is another instance of some kind which starts the cron and is separate from your normal servers. Depending on when your cron runs, you may only need to run this cron instance for a few hours per day so it can be cost efficient to do things like this.
Personally I have used both methods in a multi-tenant application and I had to go with the option of running the cron like this due to the number of tenants and the time / resource it took to run the crons for all of them at once:
Cloudwatch schedule triggers a lambda which sends a message to SQS to queue a cron for each tenant individually.
Cron servers (totally separate from main web servers but running same / similar code) pull messages and run the cron for each tenant individually. Stores a key in redis for crons which are vital to only run once to stop issues with "at least once" delivery so crons don't run twice.
This can also help handle failures with retry policies and deadletter queues managed in SQS.
Ultimately you need to kick off these crons from one place. If possible, change up your crons so it doesn't matter if they run twice. It makes it easier to deal with retries and things like that.
I am studying AWS, per the illustration in AWS here:
For a min/max=1 case, what does it implicit to? Seems no scaling to me as min = max
Thank you for your kind enlightening.
UPDATE:
so here is an example use case:
http://www.briefmenow.org/amazon/how-can-you-implement-the-order-fulfillment-process-while-making-sure-that-the-emails-are-delivered-reliably/
Your startup wants to implement an order fulfillment process for
selling a personalized gadget that needs an average of 3-4 days to
produce with some orders taking up to 6 months you expect 10 orders
per day on your first day. 1000 orders per day after 6 months and
10,000 orders after 12 months. Orders coming in are checked for
consistency men dispatched to your manufacturing plant for production
quality control packaging shipment and payment processing If the
product does not meet the quality standards at any stage of the
process employees may force the process to repeat a step Customers are
notified via email about order status and any critical issues with
their orders such as payment failure. Your case architecture includes
AWS Elastic Beanstalk for your website with an RDS MySQL instance for
customer data and orders. How can you implement the order fulfillment
process while making sure that the emails are delivered reliably?
Options:
A.
Add a business process management application to your Elastic Beanstalk app servers and re-use the ROS
database for tracking order status use one of the Elastic Beanstalk instances to send emails to customers.
B.
Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group
with min/max=1 Use the decider instance to send emails to customers.
C.
Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group
with min/max=1 use SES to send emails to customers.
D.
Use an SQS queue to manage all process tasks Use an Auto Scaling group of EC2 Instances that poll the tasks
and execute them. Use SES to send emails to customers.
The voted answer is C.
Can anyone kindly share the understanding? Thank you very much.
Correct, there will be no scaling outward or inward when min/max=1. Or when min=max. This situation is generally used for keeping a service available in case of failures.
Consider the alternative; you launch with an EC2 instance that's been bootstrapped with some user data script. If the instance has issues, you'll need to stop it and begin another.
Instead, you launch using an AutoScaling Group with a Launch Configuration that takes care of bootstrapping instances. If your application server begins to fail, you can just de-register it from the AutoScaling Group. AWS will take care of bringing up another instance while you triage the defective one.
Another situation you might consider is when you want the option to deploy a new version of an application with the same AutoScaling Group. In this case, create a new Launch Configuration and register it with the ASG. Increase max and desired by 1 temporarily. AWS will launch the instance for you and if it succeeds, you can then reduce Max and Desired back down to 1. By default, AWS will remove the oldest server but you can guarantee that the new one stays up by using termination protection.
I am trying to create a certain kind of networking infrastructure, and have been looking at Amazon ECS and Kubernetes. However I am not quite sure if these systems do what I am actually seeking, or if I am contorting them to something else. If I could describe my task at hand, could someone please verify if Amazon ECS or Kubernetes actually will aid me in this effort, and this is the right way to think about it?
What I am trying to do is on-demand single-task processing on an AWS instance. What I mean by this is, I have a resource heavy application which I want to run in the cloud and have process a chunk of data submitted by a user. I want to submit a this data to be processed on the application, have an EC2 instance spin up, process the data, upload the results to S3, and then shutdown the EC2 instance.
I have already put together a functioning solution for this using Simple Queue Service, EC2 and Lambda. But I am wondering would ECS or Kubernetes make this simpler? I have been going through the ECS documenation and it seems like it is not very concerned with starting up and shutting down instances. It seems like it wants to have an instance that is constantly running, then docker images are fed to it as task to run. Can Amazon ECS be configured so if there are no task running it automatically shuts down all instances?
Also I am not understanding how exactly I would submit a specific chunk of data to be processed. It seems like "Tasks" as defined in Amazon ECS really correspond to a single Docker container, not so much what kind of data that Docker container will process. Is that correct? So would I still need to feed the data-to-be-processed into the instances via simple queue service, or other? Then use Lambda to poll those queues to see if they should submit tasks to ECS?
This is my naive understanding of this right now, if anyone could help me understand the things I've described better, or point me to better ways of thinking about this it would be appreciated.
This is a complex subject and many details for a good answer depend on the exact requirements of your domain / system. So the following information is based on the very high level description you gave.
A lot of the features of ECS, kubernetes etc. are geared towards allowing a distributed application that acts as a single service and is horizontally scalable, upgradeable and maintanable. This means it helps with unifying service interfacing, load balancing, service reliability, zero-downtime-maintenance, scaling the number of worker nodes up/down based on demand (or other metrics), etc.
The following describes a high level idea for a solution for your use case with kubernetes (which is a bit more versatile than AWS ECS).
So for your use case you could set up a kubernetes cluster that runs a distributed event queue, for example an Apache Pulsar cluster, as well as an application cluster that is being sent queue events for processing. Your application cluster size could scale automatically with the number of unprocessed events in the queue (custom pod autoscaler). The cluster infrastructure would be configured to scale automatically based on the number of scheduled pods (pods reserve capacity on the infrastructure).
You would have to make sure your application can run in a stateless form in a container.
The main benefit I see over your current solution would be cloud provider independence as well as some general benefits from running a containerized system: 1. not having to worry about the exact setup of your EC2-Instances in terms of operating system dependencies of your workload. 2. being able to address the processing application as a single service. 3. Potentially increased reliability, for example in case of errors.
Regarding your exact questions:
Can Amazon ECS be configured so if there are no task running it
automatically shuts down all instances?
The keyword here is autoscaling. Note that there are two levels of scaling: 1. Infrastructure scaling (number of EC2 instances) and application service scaling (number of application containers/tasks deployed). ECS infrastructure scaling works based on EC2 autoscaling groups. For more info see this link . For application service scaling and serverless ECS (Fargate) see this link.
Also I am not understanding how exactly I would submit a specific
chunk of data to be processed. It seems like "Tasks" as defined in
Amazon ECS really correspond to a single Docker container, not so much
what kind of data that Docker container will process. Is that correct?
A "Task Definition" in ECS is describing how one or multiple docker containers can be deployed for a purpose and what its environment / limits should be. A task is a single instance that is run in a "Service" which itself can deploy a single or multiple tasks. Similar concepts are Pod and Service/Deployment in kubernetes.
So would I still need to feed the data-to-be-processed into the
instances via simple queue service, or other? Then use Lambda to poll
those queues to see if they should submit tasks to ECS?
A queue is always helpful in decoupling the service requests from processing and to make sure you don't lose requests. It is not required if your application service cluster can offer a service interface and process incoming requests directly in a reliable fashion. But if your application cluster has to scale up/down frequently that may impact its ability to reliably process.
I want to create a web app for my organization where users can schedule in advance at what times they'd like their EC2 instances to start and stop (like creating events in a calendar), and those instances will be automatically started or stopped at those times. I've come across four different options:
AWS Datapipeline
Cron running on EC2 instance
Scheduled scaling of Auto Scaling Group
AWS Lambda scheduled events
It seems to me that I'll need a database to store the user's scheduled times for autostarting and autostopping an instance, and that I'll have to pull that data from the database regularly (to make sure that's the latest updated schedule). Which would be the best of the four above options for my use case?
Edit: Auto Scaling only seems to be for launching and terminating instances, so I can rule that out.
Simple!
Ask users to add a tag to their instance(s) indicating when they should start and stop (figure out some format so they can easily specify Mon-Fri or Every Day)
Create an AWS Lambda function that scans instances for their tags and starts/stops them based upon the tag content
Create an Amazon CloudWatch Event rule that triggers the Lambda function every 15 minutes (or whatever resolution you want)
You can probably find some sample code if you search for AWS Stopinator.
Take a look at ParkMyCloud if you're looking for an external SaaS app that can help your users easily schedule (or override that schedule) your EC2, RDS, and ASG instances. It also connects to SSO, provides an API, and shows you all of your resources across regions/accounts/clouds. There's a free trial available if you want to test it out.
Disclosure: I work for ParkMyCloud.
On an autoscaled environment running a periodic task, if the environment is scaled up, do the periodic tasks get run on each instance? Or more specifically, does each instance then post to the queue leading to multiple "periodic tasks" running?
Yes. If there's some periodic task that should only be triggered once, you should have a separate auto scale environment of minimum 1 maximum one instance to either perform the task or trigger it on one of your servers (maybe make a request to your load balancer and one of your instances will perform the task)
Yes, behind the screen it's just a cron job on all your instances. The default scenario for using periodic tasks is to read the tasks from the SQS queue on the worker nodes.
So yes, if you doing some kind of posting what has to happen only once, then you either need to put some logic between or use a different solution.
(For example generating some kind of time based ID which identifies the cycle of the cron job. So messages from the same cycle are having the same id, easy to filter them/ ignore everything after the firs.