Lambda trigger is not working as intended with bulk data - amazon-web-services

I'm using lambda triggers to detect an insertion into a DynamoDB table (Tweets). Once triggered, I want to take the message in the event, and get the sentiment for it using Comprehend. I then want to update a second DynamoDB table (SentimentAnalysis) where I ADD + 1 to a value depending on the sentiment.
This works fine if I manually insert a single item, but I want to be able to use the Twitter API to insert bulk data into my DynamoDB table and have every tweet analysed for its sentiment. The lambda function works fine if the count specified in the Twitter params is <= 5, but anything above causes an issue with the update in the SentimentAnalysis table, and instead the trigger keeps repeating itself with no sign of progress or stopping.
This is my lambda code:
let AWS = require("aws-sdk");
let comprehend = new AWS.Comprehend();
let documentClient = new AWS.DynamoDB.DocumentClient();
exports.handler = (event, context) => {
event.Records.forEach(record => {
if (record.eventName == "INSERT") {
//console.log(JSON.stringify(record.dynamodb.NewImage.tweet.S));
let params = {
LanguageCode: "en",
Text: JSON.stringify(record.dynamodb.NewImage.tweet.S)
};
comprehend.detectSentiment(params, (err, data) => {
if (err) {
console.log("\nError with call to Comprehend:\n " + JSON.stringify(err));
} else {
console.log("\nSuccessful call to Comprehend:\n " + data.Sentiment);
//when comprehend is successful, update the sentiment analysis data
//we can use the ADD expression to increment the value of a number
let sentimentParams = {
TableName: "SentimentAnalysis",
Key: {
city: record.dynamodb.NewImage.city.S,
},
UpdateExpression: "ADD " + data.Sentiment.toLowerCase() + " :pr",
ExpressionAttributeValues: {
":pr": 1
}
};
documentClient.update(sentimentParams, (err, data) => {
if (err) {
console.error("Unable to read item " + JSON.stringify(sentimentParams.TableName));
} else {
console.log("Successful Update: " + JSON.stringify(data));
}
});
}
});
}
});
};
This is the image of a successful call, it works with the first few tweets
This is the unsuccessful call right after the first image. The request is always timed out

The timeout is why it’s happening repeatedly. If the lambda times out or otherwise errs it will cause the batch to be reprocessed. You need to handle this because the delivery is “at least once”. You also need to figure out the cause of the timeout. It might be as simple as smaller batches, or a more complex solution using step functions. You might just be able to increase the timeout on the lambda.

Related

Trigger DynamoDB Stream for Large Data Set

I have a lambda which gets triggered by DynamoDB Streams.
I would like to trigger that lambda for each item in the table (more than 100.000 items in one table - one item is around 500bytes).
How could I achieve that?
I have created a lambda which queries the DynamoDB table, gets each item and sends a message on the EventBridge containing the ID, so that another Lambda (Triggered by that ID) could update the item with a field 'createdAt' where I would just pass the EPOCH time.
After 300 items queried, I receive a timeout because memory of the lambda is exceed (256mb). This is unfortunately not a good solution.
My code looks something like this
updateByMId: async (mId, fn) => {
const paginator = paginateQuery(paginatorConfig, {
TableName: tableName,
KeyConditionExpression: '#pk = :pk',
ExpressionAttributeNames: { '#pk': 'mId' },
ExpressionAttributeValues: marshall({ ':pk': mId }, { removeUndefinedValues: true }),
})
const promises: Promise<void>[] = []
for await (const page of paginator) {
const records = (page.Items || []).map(item => unmarshall(item) as Record)
for (const record of records) {
promises.push(fn(record))
}
}
await Promise.all(promises)
},
the function which is being passed to this method is:
putEvent: async (message) => {
const output = await client.send(new eventbridge.PutEventsCommand({
Entries: [{
EventBusName: eventBusName,
Source: 'foo',
DetailType: 'bar',
Detail: JSON.stringify(message),
}],
}))
if (output.FailedEntryCount !== undefined && output.FailedEntryCount > 0) {
throw new Error(`Error putting event on bus! ${JSON.stringify(output.Entries)}`)
}
logger.info(`Successfully put event on bus`, { message })
},
Here's one way to do it that's reasonably tolerant of failures:
scan the table
write the item keys to an SQS queue
configure a Lambda function on that SQS queue to process the queue in batches of, say, 10 messages
have the Lambda function write a date attribute to the item associated with each key, as needed
Personally, I would not use a Lambda function to perform that initial scan.

Lambda not deleting DynamoDB records when triggered with Cloud Watch events

I am trying to delete items in my Dynamodb table using Cloud Watch event triggered Lambda. This lambda scans the dynamo table and deletes all expired items. My code seems to be working when I test it using the test event in the console (i.e it deletes all the expired items). But when lambda gets triggered automatically using the Cloud Watch event it does not delete, event though I see that the lambda is being triggered.
exports.handler = async function () {
var params = {
TableName: TABLE_NAME
}
try {
const data = await docClient.scan(params).promise();
const items = data.Items;
if (items.length != 0) {
Promise.all(items.map(async (item) => {
const expirationDT = new Date(item.ExpiresAt);
const now = new Date();
if (now > expirationDT) {
console.log("Deleting item with otc: " + item.Otc + " and name: " + item.SecretName);
const deleteParams = {
TableName: TABLE_NAME,
Key: {
"Otc": item.Otc,
"SecretName": item.SecretName,
},
};
try {
await docClient.delete(deleteParams).promise();
} catch (err) {
console.log("The Secret was not deleted due to: ", err.message);
}
}
}))
}
} catch (err) {
console.log("The items were not able to be scanned due to : ", err.message)
}}
I know using DynamoDB TTL is an option, but I need these deletions to be somewhat precise, and TTL can sometimes take up to 48 hours, and I am aware I can use a filter when retrieving records to counter-act that. Just wondering what's wrong with my code here.
You need to await Promise.all or your lambda will end execution before it resolves
await Promise.all(items.map(async (item) => {
const expirationDT = new Date(item.ExpiresAt);
const now = new Date();
// ...

Dropzone.js + AWS S3 stalling queue

I'm trying to impliment a dropzone.js uploader to amazon S3 using the aws-sdk.js for the browser. But when I exceed the 'parallelUploads' maximum in the settings, the queue never completes. I'm using the approach in the following link:
amazon upload
relevant parts of my code:
var dz = new Dropzone("#DZContainer", {
acceptedFiles: "image/*,.jpg,.jpeg,.png,.gif",
autoQueue: true,
autoProcessQueue: true,
parallelUploads: 10,
clickable: [".uploadButton"],
accept: function(file, done){
let params = {
"Bucket": "upload-bucket",
"Key": getFullKey(file.name),
Body: file,
Region: "us-east-1,
ContentType: file.type
}
file.s3upload = AWS.S3.ManagedUpload(params);
if (typeof(done) === 'function') done();
},
canceled: function(file) {
if (file.s3upload) file.s3upload.abort();
},
init: function () {
this.on('removedfile', function (file) {
if (file.s3upload) file.s3upload.abort();
});
}
)
dz.uploadFiles = function (files) {
for (var j = 0; j < files.length; j++) {
var file = files[j];
dz.SendFile(file);
}
};
dz.SendFile = function(file) {
file.s3upload.send(function (err, data) {
if (err) {
console.err(err)
dz.emit("error", file, err.message);
} else {
dz.emit("complete", file);
}
});
if I drag in (or use the clickable) more than 10 files, the first 10 complete but it never processes the rest of the queue. What am I missing? All help is appreciated
EDIT: With a little more digging into Dropzone, it looks as though the file status is never getting set to complete. I see a function called _finished() in the dropzone code, but I'm having a hard time figuring out what specifically is supposed to trigger that function. I have tried dz.emit("complete", file) listed below as well as adding dz.emit("success",file) but my breakpoint at the first line of the _finished() function never triggers. Thus the file.status never gets set to completed.
Does anyone know when/what/how _finished() is supposed to be run?
As mentioned in the edit, I was able to track down where the .status was not properly getting set. This seemed to be in a private Dropzone function called _finished()
With further examination, I noticed that _finished() seemed to also be calling emit("complete", file) after setting file.status to Dropzone.SUCCESS and also emitting "success". It then checks if autoProcessQueue is set and if it is, returns the result of a processQueue() call.
I had a hard time figuring out what triggered this function as it was on an onload event that eventually realized was tied to an XHTTPRequest object used by the internal uploader (which is being overridden by the S3 uploader)
So I modified the function to emulate what the Dropzone._finished() was doing and it's behaving as expected:
dz.SendFile = function(file) {
file.s3upload.send(function (err, data) {
if (err) {
console.err(err)
dz.emit("error", file, err.message);
} else {
file.status = Dropzone.SUCCESS;
dz.emit("success", file, data, err);
dz.emit("complete", file);
if(dz.options.autoProcessQueue)
dz.processQueue()
}
});

can we use recursion with AWS lambda function that it stops execution within 15 minutes time limit and save the point at which it stopped?

track the time elapsed in lambda function stop it at 9-10 min .
save the point at which it stopped and continue untill task is completed
compulsory use lambda function
I support John Rotenstein's response, stating that if you're using lambda this way, you probably shouldn't be using lambda. Out of my own curiosity though, I think the code you'd be looking for is something along the following lines (written in Node.JS)
let AWS = require('aws-sdk');
let functionName = 'YOUR_LAMBDA_FUNCTION_NAME'
let timeThreshholdInMillis = 5000 // 5 seconds
exports.handler = async (event, context, callback) => {
let input = JSON.parse(event['foo']); // input from previous lambda, omitted try-catch for brevity
let done // set this variable to true when your code is done
let interval = setInterval(() => {
if (done) callback(null, true)
let remainingTime = context.get_remaining_time_in_millis();
if (remainingTime < timeThreshholdInMillis) {
// Going to restart
clearInterval(interval);
// save your progress (local or remote) to `input` variable
let lambda = new AWS.Lambda();
return lambda.invoke({
FunctionName: functionName,
Payload: JSON.stringify({ 'foo': input }) // Pass your progress here
}, (err, data) => {
// kill container
if (err) callback(err)
else callback(null, data)
});
}
}, 1000);
}
Edit: an example
This is to clarify how 'passing progress' would work in recursive lambdas.
Let's say you want to increment a variable (+1) every second and you want to do this in Lambda.
Givens
We will increment the variable by 1 once every 1000 ms.
Lambda will run until remaining time is < 5000 ms.
Lambda execution timeout is 60,000 ms (1 minute).
Lambda function pseudo code:
function async (event, context) {
let counter = event.counter || 0
setInterval(() => {
if (context.get_remaining_time_in_millis() < 5000) {
// start new lambda and pass `counter` variable
// so it continues counting and we don't lose progress
lambda.invoke({ payload: { counter } })
} else {
// Add 1 to the counter variable
counter++
}
}, 1000)
}
Not sure what you are trying out, but have a look at AWS Step Functions to better orchestrate your serverless recursion fun.
Also be aware of costs.
Getting Started:
https://aws.amazon.com/getting-started/hands-on/create-a-serverless-workflow-step-functions-lambda/
Example:
https://docs.aws.amazon.com/step-functions/latest/dg/sample-project-transfer-data-sqs.html

Can you trigger an AWS Lambda on a dynamic timer?

Is there a way to trigger an AWS Lambda on a dynamic timer? Currently, I am utilizing scheduled-events to trigger the lambda, but this is a set timer. Is there a way to dynamically set a time for the Lambda to be triggered from within the Lambda?
The idea here is that this Lambda does specific checks and executes code to know when it should run next (because I only want this lambda to run when it needs to). I want to 1) determine the next time it needs to run and 2) set the time from within the Lambda code.
I see there are a lot of resources that are used for triggering Lambda functions (SNS, Kinesis, etc.), but I cant seem to find a good way to dynamically kick one off.
This can be accomplished by setting a CloudWatch event rule to trigger your Lambda function. On each invocation of your Lambda function, the function will need to determine its next run time and modify the event rule appropriately.
var AWS = require("aws-sdk");
exports.handler = function(event, context) {
var cloudwatchevents = new AWS.CloudWatchEvents();
var intervals = Array(3, 5, 7);
var nextInterval = intervals[Math.floor(Math.random()*intervals.length)];
var currentTime = new Date().getTime(); // UTC Time
var nextTime = dateAdd(currentTime, "minute", nextInterval);
var nextMinutes = nextTime.getMinutes();
var nextHours = nextTime.getHours();
// =================================
// DO YOUR WORK HERE
// =================================
var scheduleExpression = "cron(" + nextMinutes + " " + nextHours + " * * ? *)";
var params = {
Name: "YOUR CLOUDWATCH EVENT RULE NAME",
ScheduleExpression: scheduleExpression
};
cloudwatchevents.putRule(params, function(err, data) {
if (err) {
console.log(err, err.stack);
}
else {
console.log(data);
}
})
};
var dateAdd = function(date, interval, units) {
var ret = new Date(date); // don't change original date
switch(interval.toLowerCase()) {
case 'year' : ret.setFullYear(ret.getFullYear() + units); break;
case 'quarter': ret.setMonth(ret.getMonth() + 3*units); break;
case 'month' : ret.setMonth(ret.getMonth() + units); break;
case 'week' : ret.setDate(ret.getDate() + 7*units); break;
case 'day' : ret.setDate(ret.getDate() + units); break;
case 'hour' : ret.setTime(ret.getTime() + units*3600000); break;
case 'minute' : ret.setTime(ret.getTime() + units*60000); break;
case 'second' : ret.setTime(ret.getTime() + units*1000); break;
default : ret = undefined; break;
}
return ret;
}
You should be able to swap my random determination with your own scheduling logic and insert whatever work you need in place of my comment.
You will need to substitute your event rule's name for "YOUR CLOUDWATCH EVENT RULE NAME" in my snippet.
Great question for a blog: AWS Lambda Functions That Dynamically Schedule Their Next Runtime
This can now be accomplished without polling using a step function. You can find more information on AWS, but basically you would define a state machine for your step function that uses the Wait state and the TimestampPath field. Your state machine might end up looking something like
{
"SartAt": "WaitState",
"States": {
"WaitState": {
"Type": "Wait",
"TimestampPath": "$.timestamp",
"Next": "ExecuteLambda"
},
"ExecuteLambda": {
"Type": "Task",
"Resource": "lambda-arn",
"End": true,
"Retry": [
{ ... },
...
]
}
}
Assuming you're using Node, you could then invoke the step function with the following code:
const AWS = require('aws-sdk');
const stepFunctions = new AWS.StepFunctions();
await stepFunctions.startExecution({
stateMachineArn: process.env.STATE_MACHINE_ARN,
name: "unique-name",
input: JSON.stringify({
timestamp: (new Date(/*Date you want to start execution*/)).toISOString(),
// Any extra context you want to pass to the step function.
}),
}).promise();
You could create a CloudWatch rule to run at a particular time. CloudWatch rules can be run periodically or with a cron like syntax that lets you specify a single run time. You'll likely need to clean these up later though.
You can keep executing the lambda on a schedule but persist a value in a data store such as DynamoDB or S3 that tells the lambda when it should run next.
The lambda will keep executing periodically but you can control when it actually does what its intended for. There are billing considerations here (the lambda will continue consuming a minimal amount of resources in the background) but hopefully things shouldnt get too out of hand. Probably simpler than trying to manage the triggers from within the same lambda.