I have a lambda functions in one AZ in EU and another one in us-east-1 to be used via CloudFront triggers.
CloudFront --> #edge function -> lambda function
Sometimes, it takes a while for the second lambda invoke to be finished which hits the lambda#edge limits. It would be fine if it happens in async but I don't see any results when I run it async. Here is the code:
"use strict";
const AWS = require("aws-sdk");
AWS.config.update({
region: "eu-west-1",
});
const querystring = require("querystring");
exports.handler = async (event, context) => {
let request = event.Records[0].cf.request;
let params = querystring.parse(request.querystring);
if (params.key) {
const payload = {
/* my payload */
};
const lambda_params = {
FunctionName: "lambda-func-name",
Payload: JSON.stringify(payload),
};
const lambda = new AWS.Lambda();
const resp= await lambda.invoke(lambda_params);
console.log("Finished");
} else {
// allow the response to pass through
return {
"status":404,
"body":"an error"
}
}
};
the second lambda func would process some images and putting the results in the S3, but when I call it async, I don't see any results. Am I missing something?
First of all, a lengthy runtime seems like a very bad use case for Lambda#Edge. Maybe you should redesign your system to handle that.
Please note that Lambda#Edge has a function timeout limit of maximum 30 seconds in case of origin events and only 5 seconds for viewer events. Moreover, there is a compute utilization limit:
CloudFront Functions have a limit on the time they can take to run,
measured as compute utilization. Compute utilization is a number
between 0 and 100 that indicates the amount of time that the function took to run as a percentage of the maximum allowed time. For example, a compute utilization of 35 means that the function completed in 35% of the maximum allowed time.
With respect to your current design, in order to invoke Lambda asynchronously the invoke function should specify the correct invocation type argument. So instead
const lambda_params = {
FunctionName: "lambda-func-name",
Payload: JSON.stringify(payload),
};
you should use
const lambda_params = {
FunctionName: "lambda-func-name",
InvocationType: "Event", // <= you are missing this parameter
Payload: JSON.stringify(payload),
};
By default, the InvocationType value is set to RequestResponse. This means synchronous invocation. See docs for reference.
Related
I really tried everything. Surprisingly google has not many answers when it comes to this.
When a certain .csv file is uploaded to a S3 bucket I want to parse it and place the data into a RDS database.
My goal is to learn the lambda serverless technology, this is essentially an exercise. Thus, I over-engineered the hell out of it.
Here is how it goes:
S3 Trigger when the .csv is uploaded -> call lambda (this part fully works)
AAA_Thomas_DailyOverframeS3CsvToAnalytics_DownloadCsv downloads the csv from S3 and finishes with essentially the plaintext of the file. It is then supposed to pass it to the next lambda. The way I am trying to do this is by putting the second lambda as destination. The function works, but the second lambda is never called and I don't know why.
AAA_Thomas_DailyOverframeS3CsvToAnalytics_ParseCsv gets the plaintext as input and returns a javascript object with the parsed data.
AAA_Thomas_DailyOverframeS3CsvToAnalytics_DecryptRDSPass only connects to KMS, gets the encrcypted RDS password, and passes it along with the data it received as input to the last lambda.
AAA_Thomas_DailyOverframeS3CsvToAnalytics_PutDataInRds then finally puts the data in RDS.
I created a custom VPC with custom subnets, route tables, gateways, peering connections, etc. I don't know if this is relevant but function 2. only has access to the s3 endpoint, 3. does not have any internet access whatsoever, 4. is the only one that has normal internet access (it's the only way to connect to KSM), and 5. only has access to the peered VPC which hosts the RDS.
This is the code of the first lambda:
// dependencies
const AWS = require('aws-sdk');
const util = require('util');
const s3 = new AWS.S3();
let region = process.env;
exports.handler = async (event, context, callback) =>
{
var checkDates = process.env.CheckDates == "false" ? false : true;
var ret = [];
var checkFileDate = function(actualFileName)
{
if (!checkDates)
return true;
var d = new Date();
var expectedFileName = 'Overframe_-_Analytics_by_Day_Device_' + d.getUTCFullYear() + '-' + (d.getUTCMonth().toString().length == 1 ? "0" + d.getUTCMonth() : d.getUTCMonth()) + '-' + (d.getUTCDate().toString().length == 1 ? "0" + d.getUTCDate() : d.getUTCDate());
return expectedFileName == actualFileName.substr(0, expectedFileName.length);
};
for (var i = 0; i < event.Records.length; ++i)
{
var record = event.Records[i];
try {
if (record.s3.bucket.name != process.env.S3BucketName)
{
console.error('Unexpected notification, unknown bucket: ' + record.s3.bucket.name);
continue;
}
if (!checkFileDate(record.s3.object.key))
{
console.error('Unexpected file, or date is not today\'s: ' + record.s3.object.key);
continue;
}
const params = {
Bucket: record.s3.bucket.name,
Key: record.s3.object.key
};
var csvFile = await s3.getObject(params).promise();
var allText = csvFile.Body.toString('utf-8');
console.log('Loaded data:', {Bucket: params.Bucket, Filename: params.Key, Text: allText});
ret.push(allText);
} catch (error) {
console.log("Couldn't download CSV from S3", error);
return { statusCode: 500, body: error };
}
}
// I've been randomly trying different ways to return the data, none works. The data itself is correct , I checked with console.log()
const response = {
statusCode: 200,
body: { "Records": ret }
};
return ret;
};
While this shows how the lambda was set up, especially its destination:
I haven't posted on Stackoverflow in 7 years. That's how desperate I am. Thanks for the help.
Rather than getting each Lambda to call the next one take a look at AWS managed service for state machines, step functions which can handle this workflow for you.
By providing input and outputs you can pass output to the next function, with retry logic built into it.
If you haven't much experience AWS has a tutorial on setting up a step function through chaining Lambdas.
By using this you also will not need to account for configuration issues such as Lambda timeouts. In addition it allows your code to be more modular which improves testing the individual functionality, whilst also isolating issues.
The execution roles of all Lambda functions, whose destinations include other Lambda functions, must have the lambda:InvokeFunction IAM permission in one of their attached IAM policies.
Here's a snippet from Lambda documentation:
To send events to a destination, your function needs additional permissions. Add a policy with the required permissions to your function's execution role. Each destination service requires a different permission, as follows:
Amazon SQS – sqs:SendMessage
Amazon SNS – sns:Publish
Lambda – lambda:InvokeFunction
EventBridge – events:PutEvents
I want to run some code only on the first execution of a Lambda version. (NB: I'm not referring to a cold start scenario as this will occur more than once).
computeInstanceInvocationCount is unfortunately reset to 0 upon every cold start.
functionVersion is an available property but unless I store this in memory outside the lambda I cannot calculate if it is indeed the first execution.
Is it possible to deduce this based on runtime values in event or context? Or is there any other way?
There is no way of knowing if this is the first time that a Lambda has ever run from any information passed into the Lambda.
You would have to include functionality to check elsewhere by setting a flag or parameter there, remember though that multiple copies of the Lambda could be invoked at the same time so any data store for this would presumably need to be transactional to ensure that it occurs only once.
One way that you can try is to use AWS parameter store.
On every deployment update the parameter store value with
{"version":"latest","is_firsttime":true}
So run the below command after deployment
aws secretsmanager update-secret --secret-id demo --secret-string '{"version":"latest","is_firsttime":true}'
So this is something that we need to make sure before deployment.
Now we can set logic inside the lambda, in the demo we will look into is_firsttime only.
var AWS = require('aws-sdk'),
region = "us-west-2",
secretName = "demo",
secret,
decodedBinarySecret;
var client = new AWS.SecretsManager({
region: region
});
client.getSecretValue({SecretId: secretName}, function(err, data) {
secret = data.SecretString;
secret=JSON.parse(secret)
if ( secret.is_firsttime == true)
{
console.log("lambda is running first time")
// any init operation here
// init completed, now we are good to set back it `is_firsttime` to false
var params = {
Description: "Init completeed, updating value at date or anythign",
SecretId: "demo",
SecretString : '[{ "version" : "latest", "is_firsttime": false}]'
};
client.updateSecret(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
}
else{
console.log("init already completed");
// rest of logic incase of not first time
}
})
This is just a demo code that will work in a non-lambda environment, adjust it accordingly.
Expected response for first time
{
ARN: 'arn:aws:secretsmanager:us-west-2:12345:secret:demo-0Nlyli',
Name: 'demo',
VersionId: '3ae6623a-1111-4a41-88e5-12345'
}
Second time
init already completed
I have an AWS Lambda that is used to encrypt PII (Personal Identifying Information) using the AWS Encryption SDK before storing it in DynamoDB.
When retrieving the data from DynamoDB using a different Lambda to display to end users, the average time for each call to KMS is 9.48sec. This is averaged across roughly 2k requests with requests ranging from ~14.5 seconds to ~5.1 seconds. The calls to KMS are being made asynchronously.
The total time from the first KMS call to the last is ~20 seconds
We have considered using data key caching and read this AWS blog post about when to use it.
The input of our data may not be frequent enough to take full advantage of caching, and I am trying to find other ways to improve the performance.
Decrypt Code Sippet:
async function decryptWithKeyring(keyring: KmsKeyringNode, ciphertext: string, context: {}) {
const b: Buffer = Buffer.from(ciphertext, 'base64');
const { plaintext, messageHeader } = await decrypt(keyring, b);
const { encryptionContext } = messageHeader;
Object.entries(context).forEach(([key, value]) => {
if (encryptionContext[key] !== value) {
throw new Error('Encryption Context does not match expected values');
}
});
return plaintext.toString();
}
Encrypt Snippet:
async function encryptWithKeyring(keyring: KmsKeyringNode, value: any, context: any) {
const { result } = await encrypt(keyring, value, { encryptionContext: context });
return result.toString('base64');
}
The conversion to base64 was to facilitate storing in DynamoDB.
XRay Trace Map
Sampling of KMS Trace data
Getting a Task timed out after 3.01 seconds message in cloudwatch when trying to read large JSON(66mb) from S3 bucket and write data to dynamodb.
Smaller JSON files are reading and writing to my dynamodb table but when the JSON file contains a larger amount of objects (4000 objects, 66MB file) in this instance, the lambda function just returns Task timed out after 3.01 seconds.
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const documentClient = new AWS.DynamoDB.DocumentClient( {
convertEmptyValues: true
} );
exports.handler = async (event) => {
const{name} = event.Records[0].s3.bucket;
const{key} = event.Records[0].s3.object;
const params = {
Bucket: name,
Key: key
}
try {
const data = await s3.getObject(params).promise();
const carsStr = data.Body.toString();
const usersJSON = JSON.parse(carsStr);
console.log(`USERS ::: ${carsStr}`);
for (var i = 0; i < usersJSON.length; i++) {
var record = usersJSON[i];
console.log("Inserting record: " + record);
var putParams = {
Item: record,
ReturnConsumedCapacity: "TOTAL",
TableName: "cars"
};
await documentClient.put(putParams).promise();
}
} catch(err) {
console.log(err);
}
};
AWS Lambda default timeout is 3 seconds and that is the reason you see a timeout error in the logs. You can increase it as per your need and up to a maximum of 900 seconds
As per official documentation
Timeout – The amount of time that Lambda allows a function to run
before stopping it. The default is 3 seconds. The maximum allowed
value is 900 seconds.
Note: Increasing timeout is surely a solution for task that requires longer execution time. But always consider code optimization before increasing the timeout.
Instead of using the Lambda to write to DynamoDB, populate your table on your development machine.
If you are expecting large JSON files coming into your S3 bucket often, to trigger the Lambda, you need to increase the default timeout of your function. The maximum timeout is 15 minutes. The default timeout is 3 seconds.
You can also increase the number of DynamoDB Write Capacity Units on your table. A write capacity unit represents one write per second, for an item up to 1 KB in size. For example, suppose that you create a table with 10 write capacity units. This allows you to perform 10 writes per second, for items up to 1 KB in size per second.
Documentation: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ProvisionedThroughput.html#ProvisionedThroughput.CapacityUnits.Write
When using a Lambda Function to trigger a Step Function, how do you get the output of the function that triggered it?
Ok, so if you want to pass an input to a Step Function execution (or, more exactly, your 'State Machine''s execution), you just need to set said input the input property when calling StartExecution (see AWS Documentation: Start Execution)
In your case, it would most likely be your lambda's last step before calling it's callback.
If it's a node js lambda, that's what it would look like
const AWS = require("aws-sdk");
const stepfunctions = new AWS.StepFunctions();
exports.myHandler = function(event, context, callback) {
... your function's code
const params = {
stateMachineArn: 'YOUR_STATE_MACHINE_ARN', /* required */
input: 'STRINGIFIED INPUT',
name: 'AN EXECUTION NAME (such as an uuid or whatever)'
};
stepfunctions.startExecution(params, function(err, data) {
if (err) callback(err); // an error occurred
else callback(null, "some success message"); // successful response
});
}
Alternatively, if your payload is too big, you could store the data in S3 or DynamoDB and pass the reference to it as your State Machine's execution's input.