Is it possible to invoke lambda only on item expiry from Dynamo - amazon-web-services

I have set up Dynamo table and have enabled stream and also enabled TTL (timetolive) on one of the columns. I also have one lambda which will pull entry from Dynamo Stream.
Now either I add, delete, or edit, Or TTL gets expired - all this will cause the lambda invocation.
I am not interested in add or edit event, I only want stream to receive the deleted, TTL expired entries, is this possible?
Also, I can definitely put a check in my lambda code and process only when event type of "delete", but still lambda invocation for add, edit will take place regardless. Kindly guide

You can't control DynamoDB stream, it will always post events for any changes happened to your table, however you can control the lambda invocation by adding property "FilterCriteria" in your "EventSourceMapping"
https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventfiltering.html
FilterCriteria: {"Filters": [{"Pattern": "{ \"userIdentity\": { \"type\": [ \"Service\" ] ,\"principalId\": [\"dynamodb.amazonaws.com\"] }}"}]}
using above filter your lambda will be only invoked if TTL expiry event posted in the DymnamoDB stream.

Sadly, you can't make DynamoDB stream, to stream only deletion or expiration of items. Everything is streamed, and its up to your lambda function to filter the events of interests.
For TTL expired items, your function needs to check:
"userIdentity":{
"type":"Service",
"principalId":"dynamodb.amazonaws.com"
}
An alternative way, is to have second table, only with TTL markers. This could be useful, if your primary table experiences a lot of updates and modifications. This way, the stream on your second table would only invoke your function twice for each item, i.e. creation and TTL expiration, rather then for all the updates you are not interested in.

Haven't found a way to trigger lambdas only on TTL expiracy, but you can recon TTL from the event record payload you receive, checking the properties eventName + userIdentity as said above
Records: [
{
eventID: '36df5f15e7429cc986999f68349e6fef',
eventName: 'REMOVE',
eventVersion: '1.1',
eventSource: 'aws:dynamodb',
awsRegion: 'us-west-2',
dynamodb: [Object],
userIdentity: [Object],
eventSourceARN: 'arn:aws:dynamodb:us-west-2:XXXXX:table/table-with-ttl/stream/2021-09-27T17:03:42.812'
}
]
if (record.eventName === 'REMOVE') {
// Check if deletion was done manually or triggered by the TTL timer.
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-streams.html
if (
record.userIdentity &&
record.userIdentity.type === 'Service' &&
record.userIdentity.principalId === 'dynamodb.amazonaws.com'
) {
const itemThatWasRemoved = unmarshall(record.dynamodb.OldImage)
// Your code that only runs for TTL removals here
}

Related

Lambda event filtering for DynamoDB trigger

Here is a modified version of an Event type I am receiving in my handler for a lambda function with a DynamoDB someTableName table trigger that I logged using cargo lambda.
Event {
records: [
EventRecord {
change: StreamRecord {
approximate_creation_date_time: ___
keys: {"id": String("___")},
new_image: {
....
"valid": Boolean(true),
},
...
},
...
event_name: "INSERT",
event_source: Some("aws:dynamodb"),
table_name: None
}
]
}
Goal: Correctly filter with event_name=INSERT && valid=false
I have tried a number of options, for example;
{"eventName": ["INSERT"]}
While the filter is added correctly, it does not trigger the lambda on item inserted.
Q1) What am I doing incorrectly here?
Q2) Why is table_name returning None? The lambda function is created with a specific table name as trigger. The returned fields are returning an option (Some(_)) so I'm asssuming it returns None if the table name is specified on lambda creation, but seems odd to me?
Q3) From AWS Management Console > Lambda > ... > Trigger Detail, I see the following (which is slightly different from my code mentioned above), where does "key" come from and what does it represent in the original Event?
Filters must follow the documented syntax for filtering in the Event Source Mapping between Lambda and DynamoDB Streams.
If you are entering the filter in the Lambda console:
{ "eventName": ["INSERT"], "dynamodb": { "NewImage": {"valid": { "BOOL" : [false]}} } }
The attribute name is actually eventName, so your filter should look like this:
{"eventName": ["INSERT"]}

How to identify the first execution of a Lambda version at runtime?

I want to run some code only on the first execution of a Lambda version. (NB: I'm not referring to a cold start scenario as this will occur more than once).
computeInstanceInvocationCount is unfortunately reset to 0 upon every cold start.
functionVersion is an available property but unless I store this in memory outside the lambda I cannot calculate if it is indeed the first execution.
Is it possible to deduce this based on runtime values in event or context? Or is there any other way?
There is no way of knowing if this is the first time that a Lambda has ever run from any information passed into the Lambda.
You would have to include functionality to check elsewhere by setting a flag or parameter there, remember though that multiple copies of the Lambda could be invoked at the same time so any data store for this would presumably need to be transactional to ensure that it occurs only once.
One way that you can try is to use AWS parameter store.
On every deployment update the parameter store value with
{"version":"latest","is_firsttime":true}
So run the below command after deployment
aws secretsmanager update-secret --secret-id demo --secret-string '{"version":"latest","is_firsttime":true}'
So this is something that we need to make sure before deployment.
Now we can set logic inside the lambda, in the demo we will look into is_firsttime only.
var AWS = require('aws-sdk'),
region = "us-west-2",
secretName = "demo",
secret,
decodedBinarySecret;
var client = new AWS.SecretsManager({
region: region
});
client.getSecretValue({SecretId: secretName}, function(err, data) {
secret = data.SecretString;
secret=JSON.parse(secret)
if ( secret.is_firsttime == true)
{
console.log("lambda is running first time")
// any init operation here
// init completed, now we are good to set back it `is_firsttime` to false
var params = {
Description: "Init completeed, updating value at date or anythign",
SecretId: "demo",
SecretString : '[{ "version" : "latest", "is_firsttime": false}]'
};
client.updateSecret(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
}
else{
console.log("init already completed");
// rest of logic incase of not first time
}
})
This is just a demo code that will work in a non-lambda environment, adjust it accordingly.
Expected response for first time
{
ARN: 'arn:aws:secretsmanager:us-west-2:12345:secret:demo-0Nlyli',
Name: 'demo',
VersionId: '3ae6623a-1111-4a41-88e5-12345'
}
Second time
init already completed

AWS AppSync & DynamoDB Single Table Design: PutItem based on another item's field value

I'm new to AWS AppSync and I have adopted the single table pattern in DynamoDB. Now I am trying to create an item based on a particular field value in the existing item in the same table. For example, I have a table called transaction which holds 2 types of records.
Request
Response
As you can see the above table, I can insert (PutItem) multiple responses for a particular request. Before I insert a new response, I need to validate whether the request (RequestID) is already exists. Is there any way to do via conditional expression in the resolver? Below is my current request resolver code which is not working as expected.
#set( $Id = $util.autoId() )
{
"version" : "2017-02-28",
"operation" : "PutItem",
"key" : {
"PK": $util.dynamodb.toDynamoDBJson("USER#$ctx.args.input.UserId"),
"SK": $util.dynamodb.toDynamoDBJson("RESPONSE#$Id"),
},
"attributeValues" : $util.dynamodb.toMapValuesJson($ctx.args.input),
"condition": {
"expression": "SK = :SK",
"expressionValues" : {
":SK" : {
"S" : "REQUEST#${ctx.args.input.RequestId}"
}
}
}
}
You could do this in one request mapping template by using DynamoDB transactions, see (https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-dynamodb-transact.html).
In one of the "transactWriteItems", you update an arbitrary value (or perhaps, number of responses) in the request item with a condition that checks if the request item exists with that requestId in the SK. If the conditions succeeds, the request item is updated.
Also make sure you have the response item in your "transactWriteItems" so that the response item is also written if the request condition passes.
You won't be able to do a conditional PutItem based on another entry no. In this case you'd want to use pipeline resolvers. In your first function you'd fetch the request item and in the second you can then do your PutItem condition - the previous GetItem result will be available as $ctx.prev.result.

Not able to solve throttlingException in DynamoDB

I have a lambda function which does a transaction in DynamoDB similar to this.
try {
const reservationId = genId();
await transactionFn();
return {
statusCode: 200,
body: JSON.stringify({id: reservationId})
};
async function transactionFn() {
try {
await docClient.transactWrite({
TransactItems: [
{
Put: {
TableName: ReservationTable,
Item: {
reservationId,
userId,
retryCount: Number(retryCount),
}
}
},
{
Update: {
TableName: EventDetailsTable,
Key: {eventId},
ConditionExpression: 'available >= :minValue',
UpdateExpression: `set available = available - :val, attendees= attendees + :val, lastUpdatedDate = :updatedAt`,
ExpressionAttributeValues: {
":val": 1,
":updatedAt": currentTime,
":minValue": 1
}
}
}
]
}).promise();
return true
} catch (e) {
const transactionConflictError = e.message.search("TransactionConflict") !== -1;
// const throttlingException = e.code === 'ThrottlingException';
console.log("transactionFn:transactionConflictError:", transactionConflictError);
if (transactionConflictError) {
retryCount += 1;
await transactionFn();
return;
}
// if(throttlingException){
//
// }
console.log("transactionFn:e.code:", JSON.stringify(e));
throw e
}
}
It just updating 2 tables on api call. If it encounter a transaction conflict error, it simply retry the transaction by recursively calling the function.
The eventDetails table is getting too much db updates. ( checked it with aws Contributor Insights) so, made provisioned unit to a higher value than earlier.
For reservationTable Provisioned capacity is on Demand.
When I do load test over this api with 400 (or more) users using JMeter (master slave configuration) I am getting Throttled error for some api calls and some api took more than 20 sec to respond.
When I checked X-Ray for this api found that, DynamoDB is taking too much time for this transasction for the slower api calls.
Even with much fixed provisioning ( I tried on demand scaling too ) , I am getting throttled exception for api calls.
ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded.
Consider increasing your provisioning level with the UpdateTable API.
UPDATE
And one more thing. When I do the load testing, I am always uses the same eventId. It means, I am always updating the same row for all the api requests. I have found this article, which says that, a single partition can only have upto 1000 WCU. Since I am always updating the same row in the eventDetails table during load testing, is that causing this issue ?
I had this exact error and it helped me to change the On Demand to Provisioned under Read/write capacity mode. Try to change that, if that doesn't help, we'll go from there.
From the link you cite in your update, also described in an AWS help article here, it sounds like the issue is that all of your load testers are writing to the same entry in the table, which is going to be in the same partition, subject to the hard limit of 1,000 WCU.
Have you tried repeating this experiment with the load testers writing to different partitions?

Grabbing items from an S3 bucket

So I want to get a list of all the objects in my S3 bucket. I've just put it in an express route application I quickly setup (doesn't really matter it's in express just what i'm comfortable with).
So i'm doing :
var allObjs = [];
s3.listObjects({Bucket: 'myBucket'}, function(err, data) {
var stringifiedObjs = JSON.stringify(allObjs);
fs.writeFile("test", stringifiedObjs, function(err) {})
}
Which grabs my objects, stringifys them and writes them to a file called test. The issue i'm having is that it's only getting 1,000 results.
I read somewhere (I can't find where) that AWS limits you to 1,000 results per call.
How can I rerun this and grab the next 1,000? But so make sure that it's the next incremented 1,000 not still the first one?
In short, how can I get every object in my S3 bucket? I've been getting lost in the documentation.
Thank you!
EDIT
This is my object I get back :
{ Key: 'bucket_path/e11_19_9a31mv3ot51tm384grjd6rdj51boxx_q_q112.png',
LastModified: Sat Apr 23 2016 09:16:23 GMT+0100 (BST),
ETag: '"7d50fsdfsd4sda159b32cf85c683c5924"',
Size: 704222,
StorageClass: 'STANDARD',
Owner:
{ DisplayName: 'servers',
ID: '58af203151c51eddf2sdfs411e0b91d274a8fda23f58280f9b06371e436f7' } },
You need to set the marker property as the last element of the previous get
check the documentation as reference (as you already did :) )
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html
When you receive your response from the listObjects call, your response should include 2 very important fields in the data property:
IsTruncated - True if there are more keys to return. False otherwise.
NextMarker - The value to use for the Marker property in the next call to listObjects.
So after you call listObjects, you need to check the IsTruncated field to see if it's True. If it is, then feed the value from NextMarker into the value for Marker and call listObjects again.
Update:
It would appear that AWS.Request object has an .eachPage method which can be used to automatically make multiple calls. So there is a magical function to do this work for you.
var pages = 1;
s3.listObjects().eachPage(function(err, data) {
if (err) return;
console.log("Page", pages++);
console.log(data);
});
Source: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Request.html