Failed cron job handling with elastic beanstalk and SQS

Failed cron job handling with elastic beanstalk and SQS - amazon-web-services

I have two elastic beanstalk environments.
One is the 'primary' web server environment and the other is a worker environment that handles cron jobs.
I have 12 cron jobs, setup via a cron.yaml file that all point at API endpoints on the primary web server.
Previously my cron jobs were all running on the web server environment but of course this created duplicate cron jobs when this scaled up.
My new implementation works nicely but where my cron jobs fail to run as expected the cron job repeats, generally within a minute or so.
I would rather avoid this behaviour and just attempt to run the cron job again at the next scheduled interval.
Is there a way to configure the worker environment/SQS so that failed jobs do not repeat?

Simply configure a CloudWatch event to take over your cron, and have it create an SQS message ( either directly or via a Lambda function ).
Your workers will now just have to handle SQS jobs and if needed, you will be able to scale the workers as well.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html

Yes, you can set the Max retries parameter in the Elastic Beanstalk environment and the Maximum Receives parameter in the SQS queue to 1. This will ensure that the message is executed once, and if it fails, it will get sent to the dead letter queue.
With this approach, your instance may turn yellow if there are any failed jobs, because the messages would end up in the dead letter queue, which you can simple observe and ignore, but it may be annoying if you are OCD about needing all environments to be green. You can set the Message Retention Period parameter for the dead letter queue to something short so that it will go away sooner though.
An alternative approach, if you're interested, is to return a status 200 OK in your code regardless of how the job ran. This will ensure that the SQS daemon deletes the message in the queue, so that it won't get picked up again.
Of course, the downside is that you would have to modify your code, but I can see how this would make sense if you don't care about the result.
Here's a link to AWS documentation that explains all of the parameters.

Related

AWS Eventbridge: how do I run a scheduled rule manually (in order to test it)?

In Amazon Web Services (AWS) Eventbridge, I can create cron-style scheduled rules to fire an event regularly.
When I'm creating or editing these, I often want to test that they work immediately (rather than waiting until the next scheduled execution). For testing purposes, triggering the rule's target manually is not always equivalent to the rule running (perhaps because a template is used to customise the event JSON).
Is there an easy way of triggering a AWS EventBridge scheduled job to run immediately, via the user interface or via the command line?
I generally do this by modifying the cron schedule to two minutes in the future, then reverting it, but this is tedious and error prone. Perhaps there's an obvious button I've failed to see, or else a cli command that I haven't found (e.g. at https://awscli.amazonaws.com/v2/documentation/api/latest/reference/events/index.html#cli-aws-events).

I think you are looking for a one-time schedule. For that AWS Recently(10-Nov-2022) launched a new service called EventBridge Scheduler. You can also do a Recurring Schedule, but in your case, I think you need a One-time Schedule. Then you can immediately trigger any target in your own time period.
Hope this will fulfill your need.

AWS ECS Task single instance

In my architecture when I receive a new file on S3 bucket, a lambda function triggers an ECS task.
The problem occurs when I receive multiple files at the same time: the lambda will trigger multiple instance of the same ECS task that acts on the same shared resources.
I want to ensure only 1 instance is running for specific ECS Task, how can I do?
Is there a specific setting that can ensure it?
I tried to query ECS Cluster before run a new instance of the ECS task, but (using AWS Python SDK) I didn't receive any information when the task is in PROVISIONING status, the sdk only return data when the task is in PENDING or RUNNING.
Thank you

I don't think you can control that because your S3 event will trigger new tasks. It will be more difficult to check if the task is already running and you might miss execution if you receive a lot of files.
You should think different to achieve what you want. If you want only one task processing that forget about triggering the ECS task from the S3 event. It might work better if you implement queues. Your S3 event should add the information (via Lambda, maybe?) to an SQS queue.
From there you can have an ECS service doing a SQS long polling and processing one message at a time.

How to see why a long-running AWS Step Function failed

I have an AWS Step Function with many state transitions that can run for a half hour or more.
There are only a few states, and the application loops through them until it runs out of items to process.
I have a run that failed after about half an hour. I can look at the logging under the "Execution event history". However, since this logs every transition and state, there are thousands of events. I cannot page down to show enough events (clicking the "Load More" button) without hanging my browser window.
There is no way to sort or filter this list that I can see.
How can I find the cause of the failure? Is there a way to export the Execution event history somewhere? Or send it to CloudWatch?

You can use the AWS CLI command aws stepfunctions get-execution-history with the --reverse-order flag in order to get the logs from the most recent (where the errors will be) first.

How do you process your steps? Docker containers on ECS or Fargate? Give us some details on that.
Your tasks should be sending out logs to CloudWatch as they execute.
You can also look at the Docker logs themselves on the physical machine if your run docker on a machine you can SSH to.

What to use AWS Fargate or AWS Beanstalk

I have a java application that reads from a SQS queue and does some business processing and finally writes it to a datastore. As the SQS queue grows I want to be able to scale to read more messages and process them. Each SQS message will take about 15 to 20 minutes to process. I was looking at a service like AWS Fargate or AWS Beanstalk to deploy my application. Money is not a concern but usability is. What would be the best platform?

Fargate would be an ideal solution, as it has following advantages over Beanstalk:
It's serverless
More fine-grained control for custom application architectures.
No need to write EB extensions.
Build and Test image locally and Promote same to Fargate.
With application autoscaling, you can scale on the go.
Pricing is per second with a 1-minute minimum
FAQ:
https://aws.amazon.com/fargate/faqs/
Pricing:
https://aws.amazon.com/fargate/pricing/

I've had a very similar use case to this and I used Batch. (which was not available in 2014 when the question was asked)
https://aws.amazon.com/batch/
In my case I was processing audio and video files from the queue.
You can set a lambda to fire on the SQS queue and have that drop the job onto batch for processing.
If you have the minimum cluster size set to zero then you will have no servers running when there is no work to do, but you can have them autoscale up to process as much work as you require when the jobs come in.
The advantage compared to lambda is that the code that executes can be any container with as much resource as you want to throw at it.
For your use case it will be perfect, but for anything that can complete processing in a a few seconds or a minute it's worth making each job process more than one task per execution or all of the time will be spent firing up and shutting down containers.

run scheduled task in AWS without cron

Currently I have a single server in amazon where I put all my cronjobs. I want to eliminate this single point of failure, and expose all my tasks as web services. I'd like to expose the services behind a VPC ELB to a few servers that will run the tasks when called.
Is there some service that Amazon (AWS) offers that can run a reoccurring job (really call a webservice) at scheduled intervals? I'd really like to be able to keep the cron functionality in terms of time/day specification, but farm out the HA of the driver (thing that calls endpoints at the right time) to AWS.
I like how SQS offers web endpoint(s), but from what I can tell you cant schedule them. SWF doesn't seem to be a good fit either.

AWS announced support for scheduled functions in Lambda at its 2015 re:Invent conference. With this feature users can execute Lambda functions on a scheduled basis using a cron-like syntax. The Lambda docs show an example of using Python to perform scheduled events.
Currently, the minimum resolution that a scheduled lambda can run at is 1 minute (the same as cron, but not as fine grained as systemd timers).
The Lambder project helps to simplify the use of scheduled functions on Lambda.
λ Gordon's cron example has perhaps the simplest interface for deploying scheduled lambda functions.
Original answer, saved for posterity.
As Eric Hammond and others have stated, there is no native AWS service for scheduled tasks. There are only workarounds and half solutions as mentioned in other answers.
To recap the current options:
The single-instance autoscale group that starts and stops on a schedule, as described by Eric Hammond.
Using a Simple Workflow Service timer, which is not at all intuitive. This case study mentions that JPL used SWF to build a distributed cron, but there are no implementation details. There is also a reference to a code example buried in the SWF code samples.
Run it yourself using something like cronlock.
Use something like the Unreliable Town Clock (UTC) to run Lambda functions on a schedule. Remember that Lambda cannot currently access resources within a VPC
Hopefully a better solution will come along soon.

Introducing Events in AWS Cloudwatch
You can schedule by minute, hourly, days or using CRON expression using console and without Lambda or any programming.
I just scheduled my ASP.net WEB API(HTTP Post) using SNS HTTP endpoint to execute every minute and it's working perfectly.

Is there some service that Amazon (AWS) offers that can run a reoccurring job at scheduled intervals?
This is one of a few single points of failure that people (including me) keep mentioning when designing architectures with AWS. Until Amazon solves it with a service, here's a hack I've published which is actively used by some companies.
AWS Auto Scaling can run and terminate instances using a recurring schedule specified in the cron format.
http://docs.amazonwebservices.com/AutoScaling/latest/APIReference/API_PutScheduledUpdateGroupAction.html
You can have the instance automatically run a process on startup.
If you don't know how long the job will last, you can set things up so that your job terminates the instance when it has completed.
Here's an article I wrote that walks through exact commands needed to set this up:
Running EC2 Instances on a Recurring Schedule with Auto Scaling
http://alestic.com/2011/11/ec2-schedule-instance
Starting a whole instance just to kick off a set of jobs seems a bit like overkill, but if it's a t1.micro, then it only costs a couple pennies.
That t1.micro doesn't have to do the actual work either. Your instance could inject messages into SQS or through SNS so that the other redundant servers pick up the tasks.

This a hosted third party site that can regularly call scheduled scripts on your domain.
This will not work if you need your script to run in the shell, and not as Apache.

Sounds like this might be useful to you:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-using-task-runner.html
Task Runner is a task agent application that polls AWS Data Pipeline
for scheduled tasks and executes them on Amazon EC2 instances, Amazon
EMR clusters, or other computational resources, reporting status as it
does so. Depending on your application, you may choose to:
Allow AWS Data Pipeline to install and manage one or more Task Runner
applications for you on computational resources that it manages
automatically. In this case, you do not need to install or configure
Task Runner as described in this section. This is the recommended
configuration.
Manually install and configure Task Runner on a computational resource
such as a long-running EC2 instance or a physical server. To do so,
use the procedures in this section.
Develop and install a custom task agent instead of Task Runner. The
procedures for doing so will depend on the implementation of the
custom task agent.

Amazon has introducted Lambda last year for NodeJS, yesterday Amazon added the features Scheduled Functions, VPC Support, and Python Support.
By leveraging Scheduled Function - a proper replacement for CRON can be attained.
More Info - http://aws.amazon.com/lambda/details/

As of August 2020, Amazon has moved the Lambda/CloudWatch events to a service called EventBridge (https://aws.amazon.com/eventbridge/). It was launched in July 2019, after most of the answers to this question.

Looks like this is a relatively new option from AWS BeanStalk:
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-periodictasks
Basically, they act like regular SQS receivers, but they're called on a cron schedule instead of in response to a SQS message.

SWF is a Web service from AWS that can be used to schedule tasks. Most of the work goes into specifying what a task and a schedule is.
http://milindparikh.blogspot.com/2015/07/introducing-diksha-aws-lambda-function.html is a scalable scheduler written against SWF.

CloudWatch Events are great, but there is a limit on their number. If you need a scale and willing to sacrifice the precision you could use DynamoDB's TTL as a timer.
The idea is to put items into a DynamoDB table with a TTL set to the time you need to run a task. DynamoDB will delete those items somewhere around the specified time (within 48 hours of expiration). Those deleted items will appear in the DynamoDB stream, associated with a table. A lambda function could listen the stream and take appropriate actions upon the deletions.
Read more in "DynamoDB TTL as an ad-hoc scheduling mechanism" by theburningmonk.com.

The AWS Elastic Load Balancers will ping your instances to check that they're healthy. You can add your cron-like tasks to the script that the ELB is pinging, and it will execute very regularly.
You'd want to add some logic so that each tasks is executed the right amount of times and at the right interval, but this could be accomplished with a database table that tracks executions. Each time the ELB pings your server, your server would check the database to see if any job is pending, and then execute that job.
The ELB will timeout if the script takes too long to execute, so it's important to not create a situation where your ELB health check will take many seconds to process the cron tasks. To overcome this, you can employ the AWS Simple Notification Service. Your ELB health check script can simply publish a message to an SNS topic, and then that topic can deliver the message via an HTTP request to your web server.
In other words:
ELB pings your EC2 instance...
EC2 instance checks for pending jobs and sends a message to SNS if any are found...
SNS notifies your app via HTTP...
The HTTP call from SNS is what actually processes the cron job

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Failed cron job handling with elastic beanstalk and SQS - amazon-web-services

Related

AWS Eventbridge: how do I run a scheduled rule manually (in order to test it)?

AWS ECS Task single instance

How to see why a long-running AWS Step Function failed

What to use AWS Fargate or AWS Beanstalk

run scheduled task in AWS without cron

Categories

Resources