Change AWS Lambda Kinesis stream polling frequency - amazon-web-services

I want to change Kinesis stream polling frequency of AWS Lambda function. I was going through this article:
https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html
but, no luck.
The only information it conveys is AWS Lambda then polls the stream periodically (once per second) for new records.
I was also looking for answers in threads, but no luck:
https://forums.aws.amazon.com/thread.jspa?threadID=229037
There is another option though, which can be used if desired frequency is required:
https://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html
So, my question is, can we decrease AWS Lambda's polling frequency to, lets say 1-2 mins? Or do we have to go with AWS Lambda with Scheduled Events?

As far as I know there is now way to decrease the polling frequency if you are using an event source mapping.
These are all the settings you can set (source: https://docs.aws.amazon.com/de_de/lambda/latest/dg/API_CreateEventSourceMapping.html):
{
"BatchSize": number,
"Enabled": boolean,
"EventSourceArn": "string",
"FunctionName": "string",
"StartingPosition": "string",
"StartingPositionTimestamp": number
}
So going with a scheduled event seems to be the only feasible option.
An alternative would be to let the lambda function sleep before exiting so it will only poll again after a desired time. But of course this means you are paying for that.. So this is probably not desired.

I haven't seen a way to decrease the polling frequency, but you can have the same effect as if polling frequency decreased by increasing the MaximumBatchingWindowInSeconds parameter.
Reference: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-property-function-kinesis.html#sam-function-kinesis-maximumbatchingwindowinseconds
Let's say you have new records arriving on average at 1 record/s. Regardless of BatchSize, your lambda might trigger every second as it polls once every second. But if you increase your BatchSize to let's say, 60 and the MaximumBatchingWindowInSeconds to 60, then your lambda only invokes on average once every minute, as if you've changed polling frequency to once per minute.

Related

How to lower Lambda polling on SQS? High invocation usage

I have about 4 lambda's that are triggered by 4 individual SQS queues in a 1:1 mapping. They are triggered via Event Source Mapping
I checked my Billing today and I'm over the 1M free invocation limit:
I check the monitoring of a single SQS and it looks like the Number of Empty Receives is pretty high:
I googled around and I believe this means Event Source is polling the queue to see if it can invoke a lambda functionand because it's empty, it results in an "empty receive".
I'm not sure if there's a way to increase the polling wait time or reduce the number of times the polling can happen but looks like I'm hitting the 1M limit pretty quickly.
In this scenario what I would suggest you is to increase MaximumBatchingWindowInSeconds this will result in long polling and less burning of CPU cycles.
You can refer to this: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#events-sqs-eventsource

Recursive AWS lambda with updating state

I have an AWS Lambda that polls from an external server for new events every 6 hours. On every call, if there are any new events, it publishes the updated total number of events polled to a SNS. So I essentially need to call the lambda on fixed intervals but also pass a counter state across calls.
I'm currently considering the following options:
Store the counter somewhere on a EFS/S3, but it seems an
overkill for a simple number
EventBridge, which would be ok to schedule the execution, but doesn't store state across calls
A step function with a loop + wait on the the lambda would do it, but it doesn't seem to be the most efficient/cost effective way to do it
use a SQS with a delay so that the lambda essentially
triggers itself, passing the updated state. Again I don't think
this is the most effective, and to actually get to the 6 hours delay
I would have to implement some checks/delays within the lambda, as the max delay for SQS is 15 minutes
What would be the best way to do it?
For scheduling Lambda at intervals, you can use CloudWatch Events. Scheduling Lambda using Serverless framework is a breeze. A cronjob type statement can schedule your lambda call. Here's a guide on scheduling: https://www.serverless.com/framework/docs/providers/aws/events/schedule
As for saving data, you can use AWS Systems Manager Parameter Store. It's a simple Key value pair storate for such small amount of data.
https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html
OR you can also save it in DynamoDB. Since the data is small and frequency is less, you wont be charged much and there's no hassle of reading files or parsing.

AWS SQS - how to limit processing queue per minute

in AWS SQS how to limit processing queue per minute?
we have a scenario that we need to limit the calls to 3rd party API to be max 10 calls per minute. so our solution is to make that call Async using AWS SQS and Lambda function. but we know that we can apply delay to each queue but are there a way to limit queue per minute (max 10 receive queue per minute)?
If you review the Trigger detail for your lambda function, you will see two very interesting values Batch size and Batch window, by default you will see that the value in size is 10, to be able to place different values you must remove the detail and add it again for the SQS queue, you could perhaps adjust the Batch Window so that it fulfills what you want to do.
https://aws.amazon.com/es/about-aws/whats-new/2020/11/aws-lambda-now-supports-batch-windows-of-up-to-5-minutes-for-functions/
As the source indicates, lambda will wait according to the time you indicate, adjusting that value in theory could make it fit within the minute, I hope it helps you.

Is there a way to set a walltime on AWS Batch jobs?

Is there a way to set a maximum running time for AWS Batch jobs (or queues)? This is a standard setting in most batch managers, which avoids wasting resources when a job hangs for whatever reason.
As of April, 2018, AWS Batch now supports setting a Job Timeout when submitting a Job, or in the job definition.
https://aws.amazon.com/about-aws/whats-new/2018/04/aws-batch-adds-support-for-automatic-termination-with-job-execution-timeout/
You specify an attemptDurationSeconds parameter, which must be at least 60 seconds, either in your job definition, or when you submit the job. When this number of seconds has passed following the job attempt's startedAt timestamp, AWS Batch terminates the job. On the compute resource, your job's container receives a SIGTERM signal to give your application a chance to shut down gracefully; if the container is still running after 30 seconds, a SIGKILL signal is sent to forcefully shut down the container.
Source: https://docs.aws.amazon.com/batch/latest/userguide/job_timeouts.html
POST /v1/submitjob HTTP/1.1
Content-type: application/json
{
...
"timeout": {
"attemptDurationSeconds": number
}
}
AFAIK there is no feature to do this. However, a workaround was suggested in the forum for a similar question.
One idea is to call Batch as an Activity from Step Functions, pingback
back on a schedule (e.g. every minute) from that job. If it stops
responding then you can detect that situation as a Timeout in the
activity and act accordingly (terminate the job etc.). Not an ideal
solution (especially if the job continues to ping back as a "zombie"),
but it's a start. You'd also likely have to store activity tokens in a
database to trace them to Batch job id.
Alternatively, you split that setup into 2 steps, and schedule a Batch
job from a Lambda in the first state, then pass the Batch job id to
the second step which then polls Batch (from another Lambda) for its
state with Retry and IntervalSeconds (e.g. once every minute, or even
with exponential backoff), and MaxAttempts calculated based on your
timeout. This way, you don't need any external state storage
mechanism, long polling or even a "ping back" from the job (it CAN be
a zombie), but the downside is more steps.
There is no option to set timeout on batch job but you can setup a lambda function that triggers every 1 hour or so and deletes jobs created before say 24 hours.
working with aws for some time now and could not find a way to set a maximum running time for batch jobs.
However there are some alternative way which you could utilize.
AWS Forum
Sadly there is no way to set the limit execution time on AWS Batch.
One solution may be to edit the docker's entry point to schedule the execution time limit.

Scheduled AWS Lambda Task at less than 1 minute frequency

We are trying to develop a true lambda-based application in which certain tasks need to be performed at schedules of variable frequencies. They are actually polling for data, and at certain times of the day, this polling can be as slow as once every hour, while at other times, it has to be once every second. I have looked at the options for scheduling (e.g. Using AWS Lambda with Scheduled Events and AWS re:Invent 2015 | (CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda), but it seems that short of spinning up an EC2 instance or a long-running lambda, there's no built-in way of firing up lambdas at a frequency of less than one minute. The lambda rate expression doesn't have a place for seconds. Is there a way to do this without an EC2 instance or long-running lambda? Ideally, something that can be done without incurring additional cost for scheduling.
You theoretically can wire up a high-frequency task-trigger without an EC2 instance or long-running lambda using an AWS Step Function executed by a CloudWatch Events scheduled event (see Emanuele Menga's blog post for an example), but that approach would actually be more expensive than the other two options, according to my current (as of May 2019) estimates:
(assuming 1 year = 31536000 seconds)
Step Function:
$0.0250 per 1000 state transitions
2 state transitions per tick (Wait, Task) + at least 1 per minute (for setup/configuration)
31536000 * (2 + 1/60) * 0.0250 / 1000 = $1589.94/year, or as low as $65.70/year for lower frequency trigger (2 ticks per minute)
Lambda Function:
$0.000000208 per 100ms (for smallest 128mb function)
31536000 * 0.000000208 * 10 = $65.595488/year
EC2 Instance:
t3a.nano is $0.0047 per hour on-demand, or as low as $0.0014 using Spot instances
31536000 * 0.0047 / 3600 = $41.172/year, or $12.264/year using Spot instances
So there will be some scheduling cost, but as you can see the cost is not that much.
Currently lambda functions are invoked, at the very least, every 1 minute from Cloudwatch schedule events.
A solution that might work would be the following:
Setup an EC2 instance and from a program, that you will run as a background job, use the aws sdk to invoke your lambda function. Example:
while True:
invoke lambda function via aws sdk
sleep for x seconds
At this point in time AWS Lambda allows functions to be scheduled to run every 5 minutes with a maximum execution time of 5 minutes.
This means if you want to run an AWS Lambda function at intervals less than every 5 minutes while not using EC2 you can take a two phased approach. Create an AWS Lambda function to run every 5 minutes and have it actually run for the entire 5 minutes calling other AWS Lambda functions asynchronously at the appropriate time intervals.
This approach will cost you more since the first AWS Lambda function that runs for the entire 5 minutes will essentially be running continuously, but you can reduce the cost by having the smallest amount of RAM allocated.
UPDATE
CloudWatch Events now allow for schedules at a frequency of 1 minute. This means you can now schedule a Lambda function to run every minute and have it run for up to a minute to achieve sub-minute accuracy. It is important to note that scheduled events do not fire at the beginning of every minute so achieving exact sub-minute timing is still a bit tricky.
Don't poll with lambda or you are better off using EC2 if you expect to spend more time polling than performing work. Lambda charges by execution time and polling is costly.
You really need an event driven system with Lambda. What determines when you poll?
There is a simple hack though where you could use a setTimeout or setInterval.
i.e:
'use strict';
async function Task() {
console.log('Ran at'+new Date())
}
const TIMEOUT = 30000;
exports.handler = (event, context) => {
return Promise.all([
Task(),
new Promise(function(resolve, reject) {
let timeout = setTimeout(function() {
Task().then(() => {
clearTimeout(timeout)
})
}, TIMEOUT);
})
])
}