Store key/value pair across AWS lambda calls

Store key/value pair across AWS lambda calls - amazon-web-services

I have a event scheduler in AWS cloudwatch which runs my lambda function every 2 min. I want to store some variable from last lambda call which is needed for processing in next lambda call.
Is there any small storage option for this or I have to go for dynamodb type storage? Thanks.

You will have to use external storage like S3 or DynamoDB.

You can use the Targets[].Input field to add data to the next event in the CloudWatch event. You can do this easily from the lambda using the aws-sdk of your language, for example in Nodejs
-- Updated with an example
const AWS = require('aws-sdk');
const events = new AWS.CloudWatchEvents();
/**
* Lambda entry point
*/
exports.dostuff = async (request) => {
let stateData;
// do your stuff
await updateIteration(stateData);
}
function getRuleTargets() {
return events.listTargetsByRule({Rule: eventName}).promise();
}
function updateTargetInput(target, newData) {
const Input = JSON.stringify(newData)
const params = {
Rule: eventName,
Targets: [
{
Arn: target.Arn,
Id: target.Id,
Input
}
]
};
return events.putTargets(params).promise();
}
function updateIteration(data) {
return getRuleTargets().then(({Targets}) => {
// Usually there is only one target but just in case
const target = Targets.find(...)
return updateTargetInput(target, data)
});
}
Pros of this approach is that you don't need to setup extra infrastructure, neither fetch the data. This would not be a good approach if the event is been used in other places.

You can use AWS:Systems Manager: Parameter Store.
This allows a string of up to 4kb to be stored/retrieved, so you can store a comma-delimited list or whatever you wish, eg JSON.
The Lambda potentially remains loaded when an execution terminates, so in principle you can cache some data in a global variable (initialised to None/NULL etc), and check that value first, but you must update the parameter store in case the cache is empty next time !
Look up "lambda warm start"

Related

Lambda Snapstart with Serverless framework

So AWS announced Lambda Snapstart very recently, I tried to give it a go since my application has cold start time ~4s.
I was able to do this by adding the following under resources:
- extensions:
NodeLambdaFunction:
Properties:
SnapStart:
ApplyOn: PublishedVersions
Now, when I actually go to the said lambda, this is what I see :
So far so good!
But, the issue is that when I check my Cloudwatch Logs, there's no trace of Restore Time instead the good old Init Duration for cold starts which means Snapstart isn't working properly.
I dug deeper, so Snapstart only works for versioned ARNs. But the thing is Serverless already claims that :
By default, the framework creates function versions for every deploy.
And on checking the logs, I see that the logStreams have the prefix : 2022/11/30/[$LATEST].
When I check the Versions tab in console, I see version number 240. So I would expect that 240 is the latest version of this lambda function and this is the function version being invoked everytime.
However, clicking on the version number open a lambda function with 240 attached to its ARN and testing that function with Snapstart works perfectly fine.
So I am confused if the LATEST version and version number 240 ( in my case ), are these different?
If no, then why isn't Snapstart automatically activated for LATEST?
If yes, how do I make sure they are same?

SnapStart is only available for published versions of a Lambda function. It cannot be used with $LATEST.
Using Versions is pretty hard for Serverless Framework, SAM, CDK, and basically any other IaC tool today, because by default they will all use $LATEST to integrate with API Gateway, SNS, SQS, DynamoDB, EventBridge, etc.
You need to update the integration with API Gateway (or whatever service you're using) to point to the Lambda Version you publish, after that Lambda deployment has completed. This isn't easy to do using Serverless Framework (and other tools). You may be able to achieve this using this traffic-shifting plugin.

In case you use stepFuntions to call your lambda function, you can set useExactVersion: true in
stepFunctions:
stateMachines:
yourStateMachine:
useExactVersion: true
...
definition:
...
This will reference the latest version of your function you just deployed

This has got to be one of the worst feature launches that I have seen in a long time. How the AWS team could put in all the time and effort required to bring this feature to market, while simultaneously rendering it useless, because we can't script the thing is beyond me.
We were ready to jump on this and start migrating apps to lambda, but now we are back in limbo. Even knowing there was a fix coming down the line would be something. Hopefully somebody from the AWS lambda team can provide some insights...
Here is a working POC of a serverless plugin that updates the lambda references to use the most recent version. This fixed the resulting cloud formation code and was tested with both SQS and API-Gateway.
'use strict'
class SetCycle {
constructor (serverless, options) {
this.hooks = {
// this is where we declare the hook we want our code to run
'before:package:finalize': function () { snapShotIt(serverless) }
}
}
}
function traverse(jsonObj,functionVersionMap) {
if( jsonObj !== null && typeof jsonObj == "object" ) {
Object.entries(jsonObj).forEach(([key, value]) => {
if(key === 'Fn::GetAtt' && value.hasOwnProperty('length') && value.length === 2 && value[1] === "Arn" && functionVersionMap.get(value[0])){
console.log(jsonObj);
let newVersionedMethod = functionVersionMap.get(value[0]);
delete jsonObj[key];
jsonObj.Ref = newVersionedMethod;
console.log('--becomes');
console.log(jsonObj);
}else{
// key is either an array index or object key
traverse(value,functionVersionMap);
}
});
}
else {
// jsonObj is a number or string
}
}
function snapShotIt(serverless){
resetLambdaReferencesToVersionedVariant (serverless)
}
function resetLambdaReferencesToVersionedVariant (serverless) {
const functionVersionMap = new Map();
let rsrc = serverless.service.provider.compiledCloudFormationTemplate.Resources
// build a map of all the lambda methods and their associated versioned resource
for (let key in rsrc) {
if (rsrc[key].Type === 'AWS::Lambda::Version') {
functionVersionMap.set(rsrc[key].Properties.FunctionName.Ref,key);
}
}
// loop through all the resource and replace the non-verioned with the versioned lambda arn reference
for (let key in rsrc) {
if (! (rsrc[key].Type === 'AWS::Lambda::Version' || rsrc[key].Type === 'AWS::Lambda::Function')) {
console.log("--" + key);
traverse(rsrc[key],functionVersionMap);
}
}
// add the snapshot syntax
for (let key in rsrc) {
if (rsrc[key].Type === 'AWS::Lambda::Function') {
console.log(rsrc[key].Properties);
rsrc[key].Properties.SnapStart = {"ApplyOn": "PublishedVersions"};
console.log("--becomes");
console.log(rsrc[key].Properties);
}
}
// prints the method map
//for(let [key,value] of functionVersionMap){
//console.log(key + " : " + value);
//}
}
// now we need to make our plugin object available to the framework to execute
module.exports = SetCycle

I was able to achieve this by updating my serverless version to 3.26.0 and adding the property snapStart: true to the functions that i have created. currently serverless creates version numbers and as soon as the new version is published the SnapStart gets enabled to latest version.
ApiName:
handler: org.springframework.cloud.function.adapter.aws.SpringBootApiGatewayRequestHandler
events:
- httpApi:
path: /end/point
method: post
environment:
FUNCTION_NAME: ApiName
runtime: java11
memorySize: 4096
snapStart: true

how can I prevent dynamoDB Stream handler from infinitely processing a record when I use batchItemFailures

I have a dynamoDB stream which is triggering a lambda handler that looks like this:
let failedRequestId: string
await asyncForEachSerial(event.Records, async (record) => {
try {
await handle(record.dynamodb.OldImage, record.dynamodb.NewImage, record, context)
return true
} catch (e) {
failedRequestId = record.dynamodb.SequenceNumber
}
return false //break;
})
return {
batchItemFailures:[ { itemIdentifier: failedRequestId } ]
}
I have my lambda set up with a DestinationConfig.onFailure pointing to a DLQ I configured in SQS. The idea behind the handler is to process a batch of events and interrupt at the first failure. Then it reports the most recent failure in 'batchItemFailures' which tells the stream to continue at that record next try. (I pulled the idea from this article)
My current issue is that if there is a genuine failure of my handle() function on one of those records, then my exit code will trigger that record as my checkpoint for the next handler call. However the dlq condition doesn't ever trigger and I end up processing that record over and over again. I should also note that I am trying to avoid reprocessing records multiple times since handle() is not idempotent.
How can I elegantly handle errors while maintaining batching, but without triggering my handle() function more than once for well-behaved stream records?

I'm not sure if you have found the answer you were looking for. I'll respond in case someone else come across this issue.
There are 2 other parameters you'd want to use to avoid that issue. Quoting documentation (https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html):
Retry attempts – The maximum number of times that Lambda retries when the function returns an error. This doesn't apply to service errors or throttles where the batch didn't reach the function.
Maximum age of record – The maximum age of a record that Lambda sends to your function.
Basically, you'll have to specify how many time the failures should be retried and how far back in the events Lambda should be looking at.

How to query big data in DynamoDB in best practice

I have a scenario: query the list of student in school, by year, and then use that information to do some other tasks, let say printing a certificate for each student
I'm using the serverless framework to deal with that scenario with this Lambda:
const queryStudent = async (_school_id, _year) => {
var params = {
TableName: `schoolTable`,
KeyConditionExpression: 'partition_key = _school_id AND begins_with(sort_key, _year)',
};
try {
let _students = [];
let items;
do {
items = await dynamoClient.query(params).promise();
_students = items.Items;
params.ExclusiveStartKey = items.LastEvaluatedKey;
} while (typeof items.LastEvaluatedKey != 'undefined');
return _students;
} catch (e) {
console.log('Error: ', e);
}
};
const mainHandler = async (event, context) => {
…
let students = await queryStudent(body.school_id, body.year);
await printCerificate(students)
…
}
So far, it’s working well with about 5k students (just sample data)
My concern: is it a scalable solution to query large data in DynamoDB?
As I know, Lambda has limited time execution, if the number of student goes up to a million, does the above solution still work?
Any best practice approach for this scenario is very appreciated and welcome.

If you think about scaling, there are multiple potential bottlenecks here, which you could address:
Hot Partition: right now you store all students of a single school in a single item collection. That means that they will be stored on a single storage node under the hood. If you run many queries against this, you might run into throughput limitations. You can use things like read/write sharding here, e.g. add a suffix to the partition key and do scatter-gatter with the data.
Lambda: Query: If you want to query a million records, this is going to take time. Lambda might not be able to do that (and the processing) in 15 minutes and if it fails before it's completely through, you lose the information how far you've come. You could do checkpointing for this, i.e. save the LastEvaluatedKey somewhere else and check if it exists on new Lambda invocations and start from there.
Lambda: Processing: You seem to be creating a certificate for each student in a year in the same Lambda function you do the querying. This is a solution that won't scale if it's a synchronous process and you have a million students. If stuff fails, you also have to consider retries and build that logic in your code.
If you want this to scale to a million students per school, I'd probably change the architecture to something like this:
You have a Step Function that you invoke when you want to print the certificates. This step function has a single Lambda function. The Lambda function queries the table across sharded partition keys and writes each student into an SQS queue for certificate-printing tasks. If Lambda notices, it's close to the runtime limit, it returns the LastEvaluatedKey and the step function recognizes thas and starts the function again with this offset. The SQS queue can invoke Lambda functions to actually create the certificates, possibly in batches.
This way you decouple query from processing and also have built-in retry logic for failed tasks in the form of the SQS/Lambda integration. You also include the checkpointing for the query across many items.
Implementing this requires more effort, so I'd first figure out, if a million students per school per year is a realistic number :-)

Invoke a lambda function from another lambda asynchronously for a string of input with different parameters using Java

I need to call my lambda - A from lambda - B. I have done the required code but need few clarifications.
AWSLambda lambdaClient = AWSLambdaClientBuilder.standard().withRegion(region)
.withCredentials(new DefaultAWSCredentialsProviderChain()).build();
InvokeRequest request = new InvokeRequest().withClientContext(clientContext).withFunctionName(functionName)
.withQualifier(alias).withPayload(payload).withInvocationType(InvocationType.Event);
InvokeResult response = lambdaClient.invoke(request);
I have a N number of table names which I need to pass from lambda B to lambda A one by one so that it can do the needful work on that DDB table.
The problem is withPayload takes a JSON payload which is passed to the lambda function.
I will pass the payload to lambda-B but then the code is calling lambdaClient.invoke(request) which will have all the table names and it will call our lambda-A. But handler function in the lambda-A expects a single table name.
I am not sure how to do this.
Also do i need to run this in a loop so that every time it takes new payload value and then calls lambdaClient.invoke(request) or does it happen automatically?

You should create pyload with a single table name. In this case, you will have many payloads so you can have array of them and process them in a loop.
For Example:
InvokeRequest request = null;
InvokeResult response = null;
for(String payload : payloads){
request = new InvokeRequest().withClientContext(clientContext).withFunctionName(functionName)
.withQualifier(alias).withPayload(payload).withInvocationType(InvocationType.Event);
response = lambdaClient.invoke(request);
}
In which case you will get that payload? Lamda normally runs on some event and a lambda cannot run more than 15 minutes(For more than 15 minutes you have to contact the AWS Support Team).

AWS Lambda Function lifetime

Consider the following AWS Lambda function:
var i = 0;
exports.handler = function (event, context) {
context.succeed(++i);
};
Executing this function multiple times, I end up with an output similar like the following:
> 0
> 1
> 2
> 0
> 1
> 0
> 3
> 2
> 4
> 1
> 2
As you can see, it seems like there are 3 singletons of the script, and I am randomly ending up in one of them when I execute the function.
Is this an expected behaviour? I couldn't find any related information on the documentation.
I'm asking this because I intend to connect to MySQL and keep a connection pool:
var MySQL = require('mysql');
var connectionPool = MySQL.createPool({
connectionLimit: 10,
host: '*****',
user: '*****',
pass: '*****',
database: '*****'
});
function logError (err, callback) {
console.error(err);
callback('Unable to perform operation');
}
exports.handler = function (event, context) {
connectionPool.getConnection(function (err, connection) {
err && logError(err, context.fail);
connection.query('CALL someSP(?)', [event.user_id], function (err, data) {
err && logError(err, context.fail);
context.succeed(data[0]);
connection.release();
});
});
};
The connection pool needs to be disposed using connectionPool.end() but where shall I execute this?
If I add it at the end of the script (after the handler), then the connection pool will be closed immediately when the lambda function first executes.
If I dispose the connection pool inside the handler, then the connection pool will be closed for future requests.
Furthermore, should I dispose it? If I don't dispose it, the connections will be kepts in the pool and in memory, but as you have seen in the first code sample, AWS keeps ~ 3 singletons of my module, that would mean that I'd end up with 3 different connection pools, with 10 connections each.

Unless I am badly misunderstanding your question, this is well documented and expected behavior for lambda. See here: https://aws.amazon.com/lambda/faqs/
Lambda spins up instances of your container to match the usage patterns of your lambda function. If it is not being used at the moment, then it will spin it down, if it is being used heavily, then more containers will be created. You should never depend on persistent state in a lambda function. It is ok to use state if it is for the lifecycle of your function, or you are optimizing something.
As far as I know, you can not control the number of function instances in memory at any given time, so if you are worried about using up your mysql connections, you should design accordingly.
From the documentation:
"AWS Lambda can start as many copies of your function as needed without lengthy deployment and configuration delays. There are no fundamental limits to scaling a function. AWS Lambda will dynamically allocate capacity to match the rate of incoming events."
As applies directly to your mysql question, I would always return your connection to the pool when you are finished using it. Then I would do some calculations on how many concurrent requests you expect to have and plan accordingly with your mysql server configuration.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Store key/value pair across AWS lambda calls - amazon-web-services

I have a event scheduler in AWS cloudwatch which runs my lambda function every 2 min. I want to store some variable from last lambda call which is needed for processing in next lambda call. Is there any small storage option for this or I have to go for dynamodb type storage? Thanks.

You will have to use external storage like S3 or DynamoDB.

Related

Lambda Snapstart with Serverless framework

how can I prevent dynamoDB Stream handler from infinitely processing a record when I use batchItemFailures

How to query big data in DynamoDB in best practice

Invoke a lambda function from another lambda asynchronously for a string of input with different parameters using Java

AWS Lambda Function lifetime

Categories

Resources