Scheduled AWS Lambda Task at less than 1 minute frequency

Scheduled AWS Lambda Task at less than 1 minute frequency - amazon-web-services

We are trying to develop a true lambda-based application in which certain tasks need to be performed at schedules of variable frequencies. They are actually polling for data, and at certain times of the day, this polling can be as slow as once every hour, while at other times, it has to be once every second. I have looked at the options for scheduling (e.g. Using AWS Lambda with Scheduled Events and AWS re:Invent 2015 | (CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda), but it seems that short of spinning up an EC2 instance or a long-running lambda, there's no built-in way of firing up lambdas at a frequency of less than one minute. The lambda rate expression doesn't have a place for seconds. Is there a way to do this without an EC2 instance or long-running lambda? Ideally, something that can be done without incurring additional cost for scheduling.

You theoretically can wire up a high-frequency task-trigger without an EC2 instance or long-running lambda using an AWS Step Function executed by a CloudWatch Events scheduled event (see Emanuele Menga's blog post for an example), but that approach would actually be more expensive than the other two options, according to my current (as of May 2019) estimates:
(assuming 1 year = 31536000 seconds)
Step Function:
$0.0250 per 1000 state transitions
2 state transitions per tick (Wait, Task) + at least 1 per minute (for setup/configuration)
31536000 * (2 + 1/60) * 0.0250 / 1000 = $1589.94/year, or as low as $65.70/year for lower frequency trigger (2 ticks per minute)
Lambda Function:
$0.000000208 per 100ms (for smallest 128mb function)
31536000 * 0.000000208 * 10 = $65.595488/year
EC2 Instance:
t3a.nano is $0.0047 per hour on-demand, or as low as $0.0014 using Spot instances
31536000 * 0.0047 / 3600 = $41.172/year, or $12.264/year using Spot instances
So there will be some scheduling cost, but as you can see the cost is not that much.

Currently lambda functions are invoked, at the very least, every 1 minute from Cloudwatch schedule events.
A solution that might work would be the following:
Setup an EC2 instance and from a program, that you will run as a background job, use the aws sdk to invoke your lambda function. Example:
while True:
invoke lambda function via aws sdk
sleep for x seconds

At this point in time AWS Lambda allows functions to be scheduled to run every 5 minutes with a maximum execution time of 5 minutes.
This means if you want to run an AWS Lambda function at intervals less than every 5 minutes while not using EC2 you can take a two phased approach. Create an AWS Lambda function to run every 5 minutes and have it actually run for the entire 5 minutes calling other AWS Lambda functions asynchronously at the appropriate time intervals.
This approach will cost you more since the first AWS Lambda function that runs for the entire 5 minutes will essentially be running continuously, but you can reduce the cost by having the smallest amount of RAM allocated.
UPDATE
CloudWatch Events now allow for schedules at a frequency of 1 minute. This means you can now schedule a Lambda function to run every minute and have it run for up to a minute to achieve sub-minute accuracy. It is important to note that scheduled events do not fire at the beginning of every minute so achieving exact sub-minute timing is still a bit tricky.

Don't poll with lambda or you are better off using EC2 if you expect to spend more time polling than performing work. Lambda charges by execution time and polling is costly.
You really need an event driven system with Lambda. What determines when you poll?

There is a simple hack though where you could use a setTimeout or setInterval.
i.e:
'use strict';
async function Task() {
console.log('Ran at'+new Date())
}
const TIMEOUT = 30000;
exports.handler = (event, context) => {
return Promise.all([
Task(),
new Promise(function(resolve, reject) {
let timeout = setTimeout(function() {
Task().then(() => {
clearTimeout(timeout)
})
}, TIMEOUT);
})
])
}

Related

AWS Lambda start with higher concurrency?

I have a lambda function with an average run time of 0.8 seconds and call it about 200 times (Invocations). By the end, it has a concurrency of 3, but it creates a small bottleneck in my event-driven process. Is there anyway I can start this process off with more than 1 lambda running as a standard, instead of the relative slow scaling up in this instance?

Recursive AWS lambda with updating state

I have an AWS Lambda that polls from an external server for new events every 6 hours. On every call, if there are any new events, it publishes the updated total number of events polled to a SNS. So I essentially need to call the lambda on fixed intervals but also pass a counter state across calls.
I'm currently considering the following options:
Store the counter somewhere on a EFS/S3, but it seems an
overkill for a simple number
EventBridge, which would be ok to schedule the execution, but doesn't store state across calls
A step function with a loop + wait on the the lambda would do it, but it doesn't seem to be the most efficient/cost effective way to do it
use a SQS with a delay so that the lambda essentially
triggers itself, passing the updated state. Again I don't think
this is the most effective, and to actually get to the 6 hours delay
I would have to implement some checks/delays within the lambda, as the max delay for SQS is 15 minutes
What would be the best way to do it?

For scheduling Lambda at intervals, you can use CloudWatch Events. Scheduling Lambda using Serverless framework is a breeze. A cronjob type statement can schedule your lambda call. Here's a guide on scheduling: https://www.serverless.com/framework/docs/providers/aws/events/schedule
As for saving data, you can use AWS Systems Manager Parameter Store. It's a simple Key value pair storate for such small amount of data.
https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html
OR you can also save it in DynamoDB. Since the data is small and frequency is less, you wont be charged much and there's no hassle of reading files or parsing.

Change AWS Lambda Kinesis stream polling frequency

I want to change Kinesis stream polling frequency of AWS Lambda function. I was going through this article:
https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html
but, no luck.
The only information it conveys is AWS Lambda then polls the stream periodically (once per second) for new records.
I was also looking for answers in threads, but no luck:
https://forums.aws.amazon.com/thread.jspa?threadID=229037
There is another option though, which can be used if desired frequency is required:
https://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html
So, my question is, can we decrease AWS Lambda's polling frequency to, lets say 1-2 mins? Or do we have to go with AWS Lambda with Scheduled Events?

As far as I know there is now way to decrease the polling frequency if you are using an event source mapping.
These are all the settings you can set (source: https://docs.aws.amazon.com/de_de/lambda/latest/dg/API_CreateEventSourceMapping.html):
{
"BatchSize": number,
"Enabled": boolean,
"EventSourceArn": "string",
"FunctionName": "string",
"StartingPosition": "string",
"StartingPositionTimestamp": number
}
So going with a scheduled event seems to be the only feasible option.
An alternative would be to let the lambda function sleep before exiting so it will only poll again after a desired time. But of course this means you are paying for that.. So this is probably not desired.

I haven't seen a way to decrease the polling frequency, but you can have the same effect as if polling frequency decreased by increasing the MaximumBatchingWindowInSeconds parameter.
Reference: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-property-function-kinesis.html#sam-function-kinesis-maximumbatchingwindowinseconds
Let's say you have new records arriving on average at 1 record/s. Regardless of BatchSize, your lambda might trigger every second as it polls once every second. But if you increase your BatchSize to let's say, 60 and the MaximumBatchingWindowInSeconds to 60, then your lambda only invokes on average once every minute, as if you've changed polling frequency to once per minute.

Where and how to set up a function which is doing GET request every second?

I am trying to setup a function which will be working somewhere on the server. It is a simple GET request and I want to trigger it every second.
I tried google cloud functions and AWS. Both of them don't have a straightforward solution to run it every second. (every 1 minute only)
Could you please suggest me a service, or combination of services that will allow me to do it. (preferably not costly)

Here are some options on AWS ...
Launch a t2.nano EC2 instance to run a script that issues GET, then sleeps for 1 second, and repeats. You can't use cron (doesn't support every second). This costs about 13 cents per day.
If you are going to do this for months/years then reduce the cost by using Reserved Instances.
If you can tolerate periods where the GET requests don't happen then reduce the cost even further by using Spot instances.
That said, why do you need to issue a GET request every second? Perhaps there is a better solution here.

You can create a AWS Lambda function, which simply loops and issues the GET request every second, and exits after 240 requets (i.e. 4 minutes). Then create a CloudWatch event that fires every 4 minutes calling the Lambda function.
Every 4 minutes because the maximum timeout you can set for a Lambda function is 5 minutes.
This setup will likely incur only some trivial cost:
At 1 event per 4 minutes, it's $1/month for the CloudWatch events generated.
At 1 call per 4 minutes to a minimally configured (128MB) Lambda function, it's 324,000 GB-second worth of execution per month, just within the free tier of 400,000 GB-second.
Since network transfer into AWS is free, the response size of your GET request is irrelevant. And the first 1GB of transfer out to the Internet is free, which should cover all the GET requests themselves.

Can I schedule a lambda function execution with a lambda function?

I'm looking for the ability to programmatically schedule a lambda function to run a single time with another lambda function. For example, I made a request to myFirstFunction with date and time parameters, and then at that date and time, have mySecondFunction execute. Is that possible only with stateless AWS services? I'm trying to avoid an always-on ec2 instance.
Most of the results I'm finding for scheduling a lambda functions have to do with cloudwatch and regularly scheduled events, not ad-hoc events.

This is a perfect use case for aws step functions.
Use Wait state with SecondsPath or TimestampPath to add the required delay before executing the Next State.

What you're tring to do (schedule Lambda from Lambda) it's not possible with the current AWS services.
So, in order to avoid an always-on ec2 instance, there are other options:
1) Use AWS default or custom metrics. You can use, for example, ApproximateNumberOfMessagesVisible or CPUUtilization (if your app fires a big CPU utilization when process a request). You can also create a custom metric and fire it when your instance is idle (depending on the app that's running in your instance).
The problem with this option is that you'll waste already paid minutes (AWS always charge a full-hour, no matter if you used your instance for 15 minutes).
2) A better option, in my opinion, would be to run a Lambda function once per minute to check if your instances are idle and shut them down only if they are close to the full hour.
import boto3
from datetime import datetime
def lambda_handler(event, context):
print('ManageInstances function executed.')
environments = [['instance-id-1', 'SQS-queue-url-1'], ['instance-id-2', 'SQS-queue-url-2'], ...]
ec2_client = boto3.client('ec2')
for environment in environments:
instance_id = environment[0]
queue_url = environment[1]
print 'Instance:', instance_id
print 'Queue:', queue_url
rsp = ec2_client.describe_instances(InstanceIds=[instance_id])
if rsp:
status = rsp['Reservations'][0]['Instances'][0]
if status['State']['Name'] == 'running':
current_time = datetime.now()
diff = current_time - status['LaunchTime'].replace(tzinfo=None)
total_minutes = divmod(diff.total_seconds(), 60)[0]
minutes_to_complete_hour = 60 - divmod(total_minutes, 60)[1]
print 'Started time:', status['LaunchTime']
print 'Current time:', str(current_time)
print 'Minutes passed:', total_minutes
print 'Minutes to reach a full hour:', minutes_to_complete_hour
if(minutes_to_complete_hour <= 2):
sqs_client = boto3.client('sqs')
response = sqs_client.get_queue_attributes(QueueUrl=queue_url, AttributeNames=['All'])
messages_in_flight = int(response['Attributes']['ApproximateNumberOfMessagesNotVisible'])
messages_available = int(response['Attributes']['ApproximateNumberOfMessages'])
print 'Messages in flight:', messages_in_flight
print 'Messages available:', messages_available
if(messages_in_flight + messages_available == 0):
ec2_resource = boto3.resource('ec2')
instance = ec2_resource.Instance(instance_id)
instance.stop()
print('Stopping instance.')
else:
print('Status was not running. Nothing is done.')
else:
print('Problem while describing instance.')

UPDATE - I wouldn't recommend using this approach. Things changed in when TTL deletions happen and they are not close to TTL time. The only guarantee is that the item will be deleted after the TTL. Thanks #Mentor for highlighting this.
2 months ago AWS announced DynamoDB item TTL, which allows you to insert an item and mark when you wish for it to be deleted. It will be deleted automatically when the time comes.
You can use this feature in conjunction with DynamoDB Streams to achieve your goal - your first function inserts an item to a DynamoDB table. The record TTL should be when you want the second lambda triggered. Setup a stream that triggers your second lambda. In this lambda you will identify deletion events and if that's a delete then run your logic.
Bonus point - you can use the table item as a mechanism for the first lambda to pass parameters to the second lambda.
About DynamoDB TTL:
https://aws.amazon.com/blogs/aws/new-manage-dynamodb-items-using-time-to-live-ttl/

It does depend on your use case, but the idea that you want to trigger something at a later date is a common pattern. The way I do it serverless is I have a react application that triggers an action to store a date in the future. I take the date format like 24-12-2020 and then convert it using date(), having researched that the date format mentioned is correct, so I might try 12-24-2020 and see what I get(!). When I am happy I convert it to a Unix number in javascript React I use this code:
new Date(action.data).getTime() / 1000
where action.data is the date and maybe the time for the action.
I run React in Amplify (serverless), I store that to dynamodb (serverless). I then run a Lambda function (serverless) to check my dynamodb for any dates (I actually use the Unix time for now) and compare the two Unix dates now and then (stored) which are both numbers, so the comparison is easy. This seems to me to be super easy and very reliable.
I just set the crontab on the Lambda to whatever is needed depending on the approximate frequency required, in most cases running a lambda every five minutes is pretty good, although if I was only operating this in a certain time zone for a business weekday application I would control the Lambda a little more. Lambda is free for the first 1m functions per month and running it every few minutes will cost nothing. Obviously things change, so you will need to look that up in your area.
You will never get perfect timing in this scenario. It will, however, for the vast majority of use cases be close enough according to the timing settings of the Lambda function, you could set it up to check every minute or just once per day, it all depends on your application.
Alternatively, If I wanted an instant reaction to an event I might use SMS, SQS, or Kinesis to instantly stream a message, it all depends on your use case.

I'd opt for enqueuing deferred work to SQS using message timers in myFirstFunction.
Currently, you can't use SQS as a Lambda event source, but you can either periodically schedule mySecondFunction to check the queue via scheduled CloudWatch Events (somewhat of a variant of the other options you've found) or use a CloudWatch alarm on the ApproximateNumberOfMessagesVisible to fire an SNS message to a Lambda and avoid constant polling for queues that are frequently inactive for long periods.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js