AWS Autoscaling and CloudWatch with SQS

AWS Autoscaling and CloudWatch with SQS - amazon-web-services

I have an application that performs long running tasks. So it decided to use AWS SQS with Autoscaling policy and CloudWatch.
I read that Amazon SQS queues sends metrics to CloudWatch every five minutes. I known that my single task takes 10 seconds. So one worker can handle 30 tasks for five minutes. I would like that message will live as short as possible in SQS. For example:
if 30 messages are added to SQS I would like to have one worker,
if 60 messages are added to SQS I would like to have two workers,
if 90 messages are added to SQS I would like to have three workers,
etc.
According to documentation I created a AutoScaling policy (that adds 1 instance) and CloudWatch alarm (that fires this policy if there is more than 30 ApproximateNumberOfMessagesVisible). So should I add a second ClouldWatch alarm if there are more than 60 messages? And third ClouldWatch alarm if there are more than 90 messages?

No. Your policy will keep adding machines repeatedly until the metric falls below 30.

Related

Is there a way to send an alert if an AWS Lambda function is not running(or not being triggered) for a specified period of time (let's say 2 hours)?

In our team's infrastructure we have a Databricks job which sends data to an SQS queue which triggers a Lambda function. The Databricks job runs one in every 30 minutes. A week ago the Databricks job was failing continuously so it was not sending data, therefore the Lambda function was not triggered. Is there any way to set up an alert so that I get notified if the lambda function is not triggered for a period of 2 hours?
When I searched for a solution I was only able to see to get an alert if and when a Lambda fails or if a specific log type is found in its cloudwatch logs etc, but couldn't see any solution for the above scenario.

You can create a Cloudwatch alarm for the Invocation metrics for that lambda; you can configure the alarm so that if there are no invocations over a timespan of two hours, it goes into an ALARM state.
If you wish to be notified, you can also configure the Cloudwatch alarm to send a message to an SNS topic, which can then be configured to trigger SES so that it sends you an email (for example).

Alarm when SQS Queue is not Empty After Certain Time?

Use Case
We have 1 sqs that is being consumed by a lambda and we would like to know if all messages in the sqs have been consumed or not after a certain timestamp.
For example, Some other system will start sending messages to the SQS start from 6AM and it will take 4 hours for the lambda to process them all. We would like to know at 10AM, have all messages in the queue been consumed or not.
And we only need to detect the depth of the sqs once every day.
Questions.
Is there a easy way to setup the alarm in cloudwatch to achieve our use case?
There are many potential solutions such as use cloudwatch rule to trigger a lambda and detect the size of queue and send metric. We can alarm on the metric, but this seems to be heavy lifting
Any feedbacks are welcomed!
thanks

Continual checking
You can create an Amazon CloudWatch alarm on the ApproximateNumberOfMessagesVisible metric for the queue.
For example, if you wish to be notified when the queue has not been empty over the past hour, you can create an alarm when the Min of ApproximateNumberOfMessagesVisible > 0 for 60 minutes. This is saying that the smallest number of messages in the queue for the past hour was above zero.
Checking at a specific time
If you want to check the queue length at a particular time, you will need to use Amazon CloudWatch Events to trigger an AWS Lambda function at a given time.
The Lambda function can call get_queue_attributes() to retrieve ApproximateNumberOfMessages. If this is bigger than desired, the Lambda function could send a message to an Amazon SNS topic. Users can subscribe to the topic to receive an email or SMS notification.

AWS Lambda Triggered by SQS increases SQS request count

I have a AWS Lambda function which is triggered by SQS. This function is triggered approximately 100 times daily, but request count to the SQS queue is approximately 20.000 times daily. I don't understand why the number of requests made to the SQS is too high. My expectation is that the number of requests made to the SQS should be same with the Lambda invocation.
I have only one Lambda function and one SQS queue in my account.
Can be related with polling of SQS queue? I tried to change the polling interval of SQS from the queue configuration but nothing changed. Another possibility is to change polling interval from Lambda function configuration. However, I cannot find any related parameter.
Briefy, I want to reduce number of SQS request, how can i do that while invoking Lmabda function with SQS?

When using SQS as an event source for AWS Lambda, AWS Lambda regularly polls the configured SQS queue to fetch new messages. While the official documentation isn't clear really about that, the blog post announcing that feature goes into the details:
When an SQS event source mapping is initially created and enabled, or when messages first appear after a period with no traffic, then the Lambda service will begin polling the SQS queue using five parallel long-polling connections.
According to the AWS documentation, the default duration for a long poll from AWS Lambda to SQS is 20 seconds.
That results in five requests to SQS every 20 seconds for AWS Lambda functions without significant load, which sums up to the ~21600 per day, which is close to the 20000 you're experiencing.
While increasing the long poll duration seems like an easy way to decrease the number of requests, that's not possible, as the 20 seconds AWS Lambda is using by default is already the maximum possible duration for an SQS queue. I'm afraid there is no easy way to decrease the requests to SQS, when using it as event source for AWS Lambda. Instead depending it could be worth evaluating if another event source, like SNS, would fit your use case as well.

Here is how we originally implemented when there is no SQS trigger.
Create a SNS trigger with the SQS Cloudwatch Metric
ApproximateNumberOfMessagesVisible > 0
Trigger a Lambda from SNS, Read Messages from SQS and deliver it to whichever the lambda needs the message.
Alternatively, you can use Kinesis to deliver it to Lambda.
SQS --> Cloudwatch (Trigger Lambda) --> Lambda(Reads Messages) -->
Kinesis (Set Batch Size) --> Lambda (Handle Actual Message)
You can also use Kinesis directly but there is no delayed delivery.
Hope it helps.

AWS Alert to monitor a key is periodically createded a bucket

I'm using an AWS Lambda (hourly triggered by a Cloudwatch rule) to trigger the creation of an EMR cluster to execute a job. The EMR cluster once finished its steps write a result file in a S3 bucket. The key path is the hour of the day
/bucket/2017/04/28/00/result.txt
/bucket/2017/04/28/01/result.txt
..
/bucket/2017/04/28/23/result.txt
I wanted to put some alert in case for some reason the EMR job failed to create the result.txt for the hour.
I have already put some alerts on the Lambda invocation count and on the lambda error count but I didn't manage to find the appropriate alert to test that the EMR actually correctly finishes its job.
Note that the Lambda is triggered every 3 min past the hour and takes about 15 minutes to complete. Would a good solution be to create an other Lambda that is triggered every 30min past the hour and checks that the correct key is present in the bucket? if not then write some logs to cloudwatch that I could monitor and use them to create my alert?
What other way could I achieve this alerting?

S3 offers free metrics on object count per bucket, but doesn't publish often enough for your use case.
CloudWatch Alarm on S3 Request Metrics
For a cost, you can enable CloudWatch metrics for S3 requests to enable request metrics that write data in 1-minute periods. You could, for example, create a relevant alarm on the following S3 CloudWatch metrics:
PutRequests sum <= 0 over each hour
4xxErrors sum >= 1 over 1 minute
5xxErrors sum >= 1 over 1 minute
The HTTP status code alarms on much shorter intervals (down to 1 minute), will offer feedback nearer to when these failures occur.
CloudWatch Alarm on Put Events
If you don't want to incur the cost of S3 request metrics, you could instead configure an event to publish a message to an SNS topic on S3 put. You can use CloudWatch to set up alerting on the sum of messages published (or lack thereof).
You could then create a CloudWatch alarm based on this topic failing to publish a message.
Dimensions: TopicName = YOURSNSTOPIC
Namespace: AWS/SNS
Metric Name: NumberOfMessagesPublished
Threshold: NumberOfMessagesPublished <= 0 for 60 minutes (4 periods)
Statistic: Sum
Period: 15 minutes
Treat missing data as: breaching
Actions: Send notification to another, separate SNS topic that sends you an email/sms, or otherwise publishes to some alerting service.
Discussion
Note that both CloudWatch solutions have the caveat that they won't fire alerts exactly at 30 minutes past the hour, but they will capture your entire monitoring period.
You may be able to further configure from these base examples by adjusting your period or how cloudwatch treats missing data to get better results.
A lambda that triggers 30 minutes past the hour (via cron-style scheduling) to check S3 request metrics or the SNS topic's "NumberOfMessagesPublished" metric instead of relying on CloudWatch alarms could also accomplish this. This may be a better alternative if firing exactly 30 minutes past the hour is important, as the CloudWatch alarm's firing time will not be as precise.
Further Reading
AWS Documentation - Configuring Amazon S3 Event Notifications
AWS Documentation - SNS CloudWatch Metrics
AWS Documentation - S3 CloudWatch Metrics

Dynamic auto scaling AWS based on messages in SQS

Use case:
Every day morning the SQS will be populated (only one time, and the number of messages can vary drastically), I want to spawn new instances as per the number of messages in the queue.
eg: For 200000 messages 4 Instances, 400000 8 instances.
Is there a way by which we can achieve this?

You can set up a cron-job on your server or a time-triggered Lambda to query SQS to find out the number of visible messages in the queue. If you're using the AWS CLI you would run aws sqs get-queue-attributes and read the ApproximateNumberOfMessages response field to get the number of items in the queue. You would then use that number to calculate the number of instances and either call aws ec2 run-instances --count 4 plus the rest of the parameters. Once everything is done you would terminate the instances.
Another way to do this would be to utilize auto-scaling and alarms. You can set up an ScaleOut policy that adds 1 server to your AutoScaling Group and trigger that policy with a CloudWatch alarm on SQS ApproximateNumberOfMessages >= some threshold. This option wouldn't wait for morning to process the queues, you'd have it running all the time. You could also have a ScaleIn policy to reduce the Desired Capacity (# of servers) in your AutoScaling Group when ApproximateNumberOfMessages <= some threshold.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js