We're trying to develop a self-invoking lambda to process S3 files in chunks. The lambda role has the policies needed for the invocation attached.
Here's the code for the self-invoking lambda:
export const processFileHandler: Handler = async (
event: S3CreateEvent,
context: Context,
callback: Callback,
) => {
let bucket = loGet(event, 'Records[0].s3.bucket.name');
let key = loGet(event, 'Records[0].s3.object.key');
let totalFileSize = loGet(event, 'Records[0].s3.object.size');
const lastPosition = loGet(event, 'position', 0);
const nextRange = getNextSizeRange(lastPosition, totalFileSize);
context.callbackWaitsForEmptyEventLoop = false;
let data = await loadDataFromS3ByRange(bucket, key, nextRange);
await database.connect();
log.debug(`Successfully connected to the database`);
const docs = await getParsedDocs(data, lastPosition);
log.debug(`upserting ${docs.length} records to database`);
if (docs.length) {
try {
// upserting logic
log.debug(`total documents added: ${await docs.length}`);
} catch (err) {
await recurse(nextRange.end, event, context);
log.debug(`error inserting docs: ${JSON.stringify(err)}`);
}
}
if (nextRange.end < totalFileSize) {
log.debug(`Last ${context.getRemainingTimeInMillis()} milliseconds left`);
if (context.getRemainingTimeInMillis() < 10 * 10 * 10 * 6) {
log.debug(`Less than 6000 milliseconds left`);
log.debug(`Invoking next iteration`);
await recurse(nextRange.end, event, context);
callback(null, {
message: `Lambda timed out processing file, please continue from LAST_POSITION: ${nextRange.start}`,
});
}
} else {
callback(null, { message: `Successfully completed the chunk processing task` });
}
};
Where recurse is an invocation call to the same lambda. Rest of the things work as expected it just times out whenever the call stack comes on this invocation request:
const recurse = async (position: number, event: S3CreateEvent, context: Context) => {
let newEvent = Object.assign(event, { position });
let request = {
FunctionName: context.invokedFunctionArn,
InvocationType: 'Event',
Payload: JSON.stringify(newEvent),
};
let resp = await lambda.invoke(request).promise();
console.log('Invocation complete', resp);
return resp;
};
This is the stack trace logged to CloudWatch:
{
"errorMessage": "connect ETIMEDOUT 63.32.72.196:443",
"errorType": "NetworkingError",
"stackTrace": [
"Object._errnoException (util.js:1022:11)",
"_exceptionWithHostPort (util.js:1044:20)",
"TCPConnectWrap.afterConnect [as oncomplete] (net.js:1198:14)"
]
}
Not a good idea to create a self-invoking lambda function. In case of an error (could also be a bad handler call on AWS side) a lambda function might re-run several times. Very hard to monitor and debug.
I would suggest using Step Functions. I believe this tutorial can help Iterating a Loop Using Lambda
From the top of my head, if you prefer not dealing with Step Functions, you could create a Lambda trigger for an SQS queue. Then you pass a message to the queue if you want to run the lambda function another time.
Related
I am trying to invoke lambda B via another lambda A. Call to lambda A is triggered via APIG endpoint. Using curl, a fetch call is done as below:
curl "$#" -L --cookie ~/.midway/cookie --cookie-jar ~/.midway/cookie -X GET -H "Content-Type: application/json" -s https://us-west-2.beta.api.ihmsignage.jihmcdo.com/api/getSignInstances
Above invokes lambda A which handles the request and calls the main handler. Logic for main handler:
const main = (event: any, context: any, lambdaCallback: Function) => {
console.log(JSON.stringify(event, null, 2));
console.log(JSON.stringify(process.env, null, 2));
if (event.path.startsWith('/getUserInfo')) {
const alias = event.headers['X-FORWARDED-USER'];
const userData = JSON.stringify({ alias });
console.info('UserData: ', userData);
return sendResponse(200, userData, lambdaCallback); //This works perfectly fine with api gateway returning proper response
} else if (event.path.startsWith('/api')) {
console.info('Invoke lambda initiate');
invokeLambda(event, context, lambdaCallback); // This somehow invokes lambda B twice
} else {
return sendResponse(404, '{"message": "Resource not found"}', lambdaCallback);
}
};
Also have a wrapper associated as well in order to allow proper response is being sent back to the APIG:
export const handler = (event: any, context: any, lambdaCallback: Function) => {
const wrappedCallback = (error: any, success: any) => {
success.headers['Access-Control-Allow-Origin'] = getAllowedOrigin(event);
success.headers['Access-Control-Allow-Credentials'] = true;
success.headers['Access-Control-Allow-Methods'] = 'GET,PUT,DELETE,HEAD,POST,OPTIONS';
success.headers['Access-Control-Allow-Headers'] =
'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token,Access-Control-Allow-Origin,Access-Control-Allow-Methods,X-PINGOVER';
success.headers['Vary'] = 'Accept-Encoding, Origin';
console.info('Logging sucess--', success);
return lambdaCallback(error, success);
};
// Append headers
return main(event, context, wrappedCallback);
};
And finally this is logic of how lambda B should be invoked within lambda A:
const invokeLambda = async (event: any, context: any, lambdaCallback: Function) => {
context.callbackWaitsForEmptyEventLoop = false;
if (!process.env.INVOKE_ARN) {
console.error('Missing environment variable INVOKE_ARN');
return sendResponse(500, '{"message":"internal server error"}', lambdaCallback);
}
const params = {
FunctionName: process.env.INVOKE_ARN,
InvocationType: 'RequestResponse',
Payload: JSON.stringify(event),
};
event.headers = event.headers || [];
const username = event.headers['X-FORWARDED-USER'];
const token = event.headers['X-CLIENT-VERIFY'];
if (!username || !token) {
console.log('No username or token was found');
return sendResponse(401, '{"message":"You shall not pass"}', lambdaCallback);
}
try {
const data = await lambda.invoke(params).promise();
console.info('Got Request router lambda data: ', data);
const invocationResponse = data?.Payload;
console.info('Got invocationResponse: ', invocationResponse);
return lambdaCallback(null, JSON.parse(invocationResponse as string));
} catch (err) {
console.error('Error while running starlet: ', err);
return sendResponse(500, '{"message":"internal server error"}', lambdaCallback);
}
};
Lambda B:
const main = async (event: any = {}) => {
// Log details
console.log('Request router lambda invoked');
console.log(JSON.stringify(event, null, 2));
return {
statusCode: 200,
body: JSON.stringify({ message: 'Hello from RequestRouter Lambda!' }),
headers: {
'Content-Type': 'application/json',
},
isBase64Encoded: false,
};
};
export const handler = main;
All of above works fine (no error logs from cloudwatch for lambdas), however it seems that Lambda A's handler is invoked, but it doesn't invoke Lambda B's handler ultimately returning a response to APIG which doesn't have proper headers.
Any pointers are highly appreciated!! Thank you :)
AWS recommends that you don't orchestrate your lambda functions in the code (one function calling another function).
For that use case, you can use AWS Step Functions.
You can create a state machine, define API Gateway as the trigger, and pass the result from one Lambda function to the next Lambda function.
If I use SQS long polling and set the "WaitTimeSeconds" to say 10 seconds and "MaxNumberOfMessages" to 1, and a single message is delivered to the queue after say 0.1 seconds, will the call to sqs.receiveMessage() return immediately at that point, or should it not return until the 10 seconds of "WaitTimeSeconds" have elapsed?
In my testing the call to sqs.receiveMessage() seems to not return until the full duration of "WaitTimeSeconds" has elapsed.
Here is the code:
// Load the AWS SDK for Node.js
var AWS = require("aws-sdk");
const fmReqQ = "https://sqs.ap-southeast-2.amazonaws.com/myactid/fmReqQ";
const fmRspQ = "https://sqs.ap-southeast-2.amazonaws.com/myactid/fmRspQ";
const SydneyRegion = "ap-southeast-2";
var credentials = new AWS.SharedIniFileCredentials({ profile: "myprofile" });
AWS.config.credentials = credentials;
// Set the region
AWS.config.update({ region: SydneyRegion });
// Create an SQS service object
var sqs = new AWS.SQS({ apiVersion: "2012-11-05" });
async function sendRequest() {
var sendParams = {
MessageBody: "Information of 12/11/2016.",
QueueUrl: fmReqQ,
};
try {
data = await sqs.sendMessage(sendParams).promise();
console.log("Success, request MessageId: ", data.MessageId);
} catch (err) {
console.log("Error", err);
}
}
async function doModelling() {
console.time("modelling");
await sendRequest();
await receiveResponse();
console.timeEnd("modelling");
}
async function receiveResponse() {
var receiveParams = {
AttributeNames: ["SentTimestamp"],
MaxNumberOfMessages: 1,
MessageAttributeNames: ["All"],
QueueUrl: fmRspQ,
WaitTimeSeconds: 1,
};
let data = null;
try {
data = await sqs.receiveMessage(receiveParams).promise();
console.log("Success, response MessageId: ", data);
} catch (err) {
console.log("Error", err);
}
}
doModelling();
When I set "WaitTimeSeconds: 3" I get output:
Success, request MessageId: e5079c2a-050f-4681-aa8c-77b05ac7da7f
Success, response MessageId: {
ResponseMetadata: { RequestId: '1b4d6a6b-eaa2-59ea-a2c3-3d9b6fadbb3f' }
}
modelling: 3.268s
When I set "WaitTimeSeconds: 10" I get output:
Success, request MessageId: bbf0a429-b2f7-46f2-b9dd-38833b0c462a
Success, response MessageId: {
ResponseMetadata: { RequestId: '64bded2d-5398-5ca2-86f8-baddd6d4300a' }
}
modelling: 10.324s
Notice how the elapsed time durations match the WaitTimeSeconds.
From reading about AWS SQS long polling it says it long polling "Return messages as soon as they become available."
I don't seem to be seeing the messages "as soon as they become available", I seem to be noticing the sqs.receiveMessage() call always taking the duration set in WaitTimeSeconds.
As you can see in the sample code, I have set MaxNumberOfMessage to 1.
Using ReceiveMessage() with Long Polling will return as soon as there is at least one message in the queue.
I'm not a Node person, but here's how I tested it:
Created an Amazon SQS queue
In one window, I ran:
aws sqs receive-message --queue-url https://sqs.ap-southeast-2.amazonaws.com/123/foo --visibility-timeout 1 --wait-time-seconds 10
Then, in another window, I ran:
aws sqs send-message --queue-url https://sqs.ap-southeast-2.amazonaws.com/123/foo --message-body bar
The receive-message command returned very quickly after I used the send-message command.
It is possible that your tests are impacted by messages being 'received' but marked as 'invisible', and remaining invisible for later tests since your code does not call DeleteMessage(). I avoided this by specifically stating --visibility-timeout 1, which made the message immediately reappear on the queue for the next test.
The number of messages being requested (--max-number-of-messages) does not impact this result. It returns as soon as there is at least one message available.
I set the "WaitTimeSeconds" to 0, I seem to get the behaviour that I am after now:
Success, request MessageId: 9f286e22-1a08-4532-88ba-06c88be3dbc3
Success, response MessageId: {
ResponseMetadata: { RequestId: '5c264e27-0788-5772-a990-19d78c8b2565' }
}
modelling: 307.884ms
The value I was specifying for "WaitTimeSeconds" determines the duration of the sqs.receiveMessage() call because there were no messages on my queue, so the sqs.receiveMessage() call will wait for duration specified by "WaitTimeSeconds".
I have an AWS Lambda Function A that calls another Lambda Function B
For sake of discussion I want to invoke it synchronously - wait for results
and process them.
I want to do something like this in Lambda A:
let results = lambda.invoke(LambdaB);
// Do some stuff with results
The issue is that when I use the SDK APi to invoke Lambda B, I pass in a function that processes the results of that invocation. The processing function gets invoke and processes the results, but I'm noticing that the
// Do some other stuff with results
line is executing before that processing function completes, so the results
are not available yet. I'm still somewhat new to the NodeJS way of doing things, so what is the paradigm to ensure that I have the results I want before I move on to more processing with Lambda A? Here is what I have in a nuthell:
// Lambda A code
let payLoad = undefined;
let functionName = "LambdaB";
let params = {
FunctionName: functionName,
InvocationType: "RequestResponse",
LogType: "Tail",
Payload: '{name: "Fred"}'
};
lambda.invoke(params, function(err,data) {
if (err) {
// process error
} else {
payLoad = JSON.parse(data.Payload);
// payLoad is set properly here.
}
});
console.log("Do stuff with results');
console.log("PAYLOAD=" + payLoad);
// payLoad is undefined here
// How can I ensure it is set by the time I get here
// or do I need to use a different paradigm?
You will need to use async/await.
So your code will look like that :
exports.handler = async () => {
// Lambda A code
let functionName = "LambdaB";
let result = undefined;
let payload = undefined;
let params = {
FunctionName: functionName,
InvocationType: "RequestResponse",
LogType: "Tail",
Payload: '{name: "Fred"}'
};
try {
result = await lambda.invoke(params).promise();
payload = JSON.parse(result.Payload);
} catch (err) {
console.log(err);
}
console.log("Do stuff with results');
console.log(payload);
return;
}
Watchout : the lambda handler has to be an async function to use async/await in it !
I am starting a step function from Lambda and the Lambda function is tied to an API Gateway. For some reason, when I try to test the Lambda function, I see hundreds of executions failed and running in loop. I just triggered the step function once. I am missing something here. Can you please advise.
const AWS = require("aws-sdk");
const uuidv4 = require("uuid/v4");
/*----------------------------------------------------------------------- */
/* Implementation */
/*----------------------------------------------------------------------- */
exports.handler = async event => {
var _dt = await ExecuteStepFunction()
return _dt;
}
function ExecuteStepFunction() {
const stepFunctions = new AWS.StepFunctions();
return new Promise((res, rej) => {
var params = {
stateMachineArn: 'arn:aws:states:us-east-1:xxxxxxxxxxxxx:stateMachine:xxTestSateMachine',
input: JSON.stringify(''),
name: uuidv4()
};
stepFunctions.startExecution(params, function (err, data) {
if (err) {
rej(err);
}
else {
res(data);
}
});
});
}
I tried thIS approach provided in the this link (https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-api-gateway.html) where the API gateway directly triggers the step function but I am receiving the following error. After trying to fix this, I move to the above option of starting the function using the API.
{
"__type": "com.amazon.coral.service#UnrecognizedClientException",
"message": "The security token included in the request is invalid"
}
I have a very simple lambda function (nodeJS) which put the event received in kinesis stream. Here is the source code:
'use strict';
const AWS = require('aws-sdk');
const kinesis = new AWS.Kinesis({apiVersion: '2013-12-02'});
exports.handler = async (event, context, callback) => {
let body = JSON.parse(event.body);
let receptionDate = new Date().toISOString();
let partitionKey = "pKey-" + Math.floor(Math.random() * 10);
// Response format needed for API Gateway
const formatResponse = (status, responseBody) => {
return {
statusCode: status,
headers: { "Content-Type": "application/json" },
body: JSON.stringify(responseBody)
}
}
// body.events is an array of events. Just add the reception date in each events.
for(let e of body.events) {
e.reception_date = receptionDate;
}
console.log("put In kinesis stream");
let kinesisParams = {
Data: new Buffer(JSON.stringify(body) + "\n"),
PartitionKey: partitionKey,
StreamName: 'event_test'
};
kinesis.putRecord(kinesisParams, (err, res) => {
console.log("Kinesis.putRecord DONE");
if(err) {
console.log("putRecord Error:", JSON.stringify(err));
callback(null, formatResponse(500, "Internal Error: " + JSON.stringify(err)));
} else {
console.log("putRecord Success:", JSON.stringify(res));
callback(null, formatResponse(200));
}
});
};
When this code is executed, here are the logs in cloudwatch:
START RequestId: 5d4d7526-1a40-401f-8417-06435f0e5408 Version: $LATEST
2019-01-11T09:39:11.925Z 5d4d7526-1a40-401f-8417-06435f0e5408 put In kinesis stream
END RequestId: 5d4d7526-1a40-401f-8417-06435f0e5408
REPORT RequestId: 5d4d7526-1a40-401f-8417-06435f0e5408 Duration: 519.65 ms Billed Duration: 600 ms Memory Size: 128 MB Max Memory Used: 28 MB
It seems that kinesis.putRecord is not called... I don't see anything in kinesis stream logs. I'm certainly wrong somewhere, but I don't know where !
kinesis.putRecord is an asynchronous operation, which calls callback (The second param) when it's finished (whether successful or with an error).
async function is a function that returns a promise. Lambda will finish its execution when this promise is resolved, even if there are other asynchronous operations which are not done yet.
Since your function returns nothing, then the promise is immediately resolved when the function ends and therefore the execution will be finished immediately - without waiting to your async kinesis.putRecord task.
When using an async handler, you don't need to call callback. Instead, you return what ever you want, or throw an error. Lambda will get it and respond respectively.
So you have 2 options here:
Since you don't have any await in your code, just remove the async. In this case Lambda is waiting for the event loop to be emtpy (Unless you explicitly change context.callbackWaitsForEmptyEventLoop)
Change the kinesis.putRecord to something like:
let result;
try {
result = await kinesis.putRecord(kinesisParams).promise();
} catch (err) {
console.log("putRecord Error:", JSON.stringify(err));
throw Error(formatResponse(500, "Internal Error: " + JSON.stringify(err));
}
console.log("putRecord Success:", JSON.stringify(result));
return formatResponse(200);
In the second option, the lambda will keep running until kinesis.putRecord is finished.
For more information about Lambda behavior in this case, you can see the the main code which execute your handler under /var/runtime/node_modules/awslambda/index.js in the lambda container.
#ttulka could you explain a bit more? Give advices or code samples ? –
Adagyo
It's about the async processing evolution in JavaScript.
First, everything was done with callback, it's the oldest approach. Using callbacks everywhere leads to "Callback Hell" (http://callbackhell.com).
Then Promises was introduced. Working with Promises looks a bit like working with Monads, everything is packed into a "box" (Promise), so you have to chain all your calls:
thisCallReturnsPromise(...)
.then(data => ...)
.then(data => ...)
.then(data => ...)
.catch(err => ...)
Which is a bit unnatural to humans, so ECMAScript 2017 proposed a syntactic sugar in async functions (async/await) https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function
Async/await syntax allows you to work with async promises like with a normal sync code:
const data = await thisCallReturnsPromise(...)
Don't forget, the await call must be inside an async function:
async () => {
const data = await thisCallReturnsPromise(...)
return await processDataAsynchronouslyInPromise(data)
}
AWS Lambda supports Node.js v8.10, which fully implements this syntax.
Just found the solution: Removing "async" keyword make it work !
exports.handler = (event, context, callback) => { ... }