Trigger DynamoDB Stream for Large Data Set - amazon-web-services

I have a lambda which gets triggered by DynamoDB Streams.
I would like to trigger that lambda for each item in the table (more than 100.000 items in one table - one item is around 500bytes).
How could I achieve that?
I have created a lambda which queries the DynamoDB table, gets each item and sends a message on the EventBridge containing the ID, so that another Lambda (Triggered by that ID) could update the item with a field 'createdAt' where I would just pass the EPOCH time.
After 300 items queried, I receive a timeout because memory of the lambda is exceed (256mb). This is unfortunately not a good solution.
My code looks something like this
updateByMId: async (mId, fn) => {
const paginator = paginateQuery(paginatorConfig, {
TableName: tableName,
KeyConditionExpression: '#pk = :pk',
ExpressionAttributeNames: { '#pk': 'mId' },
ExpressionAttributeValues: marshall({ ':pk': mId }, { removeUndefinedValues: true }),
})
const promises: Promise<void>[] = []
for await (const page of paginator) {
const records = (page.Items || []).map(item => unmarshall(item) as Record)
for (const record of records) {
promises.push(fn(record))
}
}
await Promise.all(promises)
},
the function which is being passed to this method is:
putEvent: async (message) => {
const output = await client.send(new eventbridge.PutEventsCommand({
Entries: [{
EventBusName: eventBusName,
Source: 'foo',
DetailType: 'bar',
Detail: JSON.stringify(message),
}],
}))
if (output.FailedEntryCount !== undefined && output.FailedEntryCount > 0) {
throw new Error(`Error putting event on bus! ${JSON.stringify(output.Entries)}`)
}
logger.info(`Successfully put event on bus`, { message })
},

Here's one way to do it that's reasonably tolerant of failures:
scan the table
write the item keys to an SQS queue
configure a Lambda function on that SQS queue to process the queue in batches of, say, 10 messages
have the Lambda function write a date attribute to the item associated with each key, as needed
Personally, I would not use a Lambda function to perform that initial scan.

Related

DynamoDB - The provided key element does not match the schema

I am trying to delete a user's reservation at a specific date and time but I encountered this error and am not sure how to resolve it. Any advice will be appreciated.
const AWS = require("aws-sdk");
const dynamo = new AWS.DynamoDB.DocumentClient();
exports.handler = (event, context, callback) => {
let body;
let response;
switch (event.routeKey) {
case 'DELETE /bookings/{user_name}/{restaurant_name}/{time}/{date}':
//have to specify date and time because user might make reservation on same date,same restaurant, at 2 different timings
var params = {
TableName: 'hearty_eats_bookings',
Key: {
'user_name': event.pathParameters.user_name,
'restaurant_name': event.pathParameters.restaurant_name,
'time': event.pathParameters.time,
'date': event.pathParameters.date
},
};
dynamo.delete(params, function(err, result) {
if (err) throw err;
return callback(null, { "message": "booking cancelled" });
});
break;
default:
throw new Error("Unsupported route: " + event.routeKey);
}
}
Event JSON
Error Message
Event JSON
DynamoDB Details
DynamoDB's DeleteItem API only takes the keys of the item as a parameter, however you have included much more than the keys in your request:
Key: {
'user_name': event.pathParameters.user_name,
'restaurant_name': event.pathParameters.restaurant_name,
'time': event.pathParameters.time,
'date': event.pathParameters.date
},
If you need to manage an item at the time and data level, then you should include that as part of your key, for example:
PK
SK
Data
User123
GreatChinese#2022-12-10T18:00:000Z
Table for 2
User789
GreatIndian#2022-12-09T19:00:000Z
Table for 4
Key: {
'PK': event.pathParameters.user_name,
'SK': `${event.pathParameters.restaurant_name}#${event.pathParameters.date}`
},
If you wish to continue with your current approach then use the following as Key
Key: {
'user_name': event.pathParameters.user_name
},
In summary, you must only specify the tables partition key and sort key in the Keys parameter of the DeleteItem request.

Lambda not deleting DynamoDB records when triggered with Cloud Watch events

I am trying to delete items in my Dynamodb table using Cloud Watch event triggered Lambda. This lambda scans the dynamo table and deletes all expired items. My code seems to be working when I test it using the test event in the console (i.e it deletes all the expired items). But when lambda gets triggered automatically using the Cloud Watch event it does not delete, event though I see that the lambda is being triggered.
exports.handler = async function () {
var params = {
TableName: TABLE_NAME
}
try {
const data = await docClient.scan(params).promise();
const items = data.Items;
if (items.length != 0) {
Promise.all(items.map(async (item) => {
const expirationDT = new Date(item.ExpiresAt);
const now = new Date();
if (now > expirationDT) {
console.log("Deleting item with otc: " + item.Otc + " and name: " + item.SecretName);
const deleteParams = {
TableName: TABLE_NAME,
Key: {
"Otc": item.Otc,
"SecretName": item.SecretName,
},
};
try {
await docClient.delete(deleteParams).promise();
} catch (err) {
console.log("The Secret was not deleted due to: ", err.message);
}
}
}))
}
} catch (err) {
console.log("The items were not able to be scanned due to : ", err.message)
}}
I know using DynamoDB TTL is an option, but I need these deletions to be somewhat precise, and TTL can sometimes take up to 48 hours, and I am aware I can use a filter when retrieving records to counter-act that. Just wondering what's wrong with my code here.
You need to await Promise.all or your lambda will end execution before it resolves
await Promise.all(items.map(async (item) => {
const expirationDT = new Date(item.ExpiresAt);
const now = new Date();
// ...

Does `appsync` support `withFilter` in subscription?

I have a graphql written in nodejs with Apollo Server. Below is the subscription code. As you can see that it uses withFilter which takes two function parameters.
In the first function, it takes the arguments and call pubSub.subscribe('TRANSACTION_REQUEST' + args.transactionId) to subscribe to a topic. Note that the topic name is a dynamic one which includes the transaction ID from the user request.
In the second function, it filter out unmatched userId.
So my question is how can I implement these two functions in Appsync.
const resolvers = {
...
Subscription: {
requestTransaction: {
subscribe: withFilter(
(rootValue: any, args: any, context: any, info: any) => {
console.log('req txn with filter args', args);
return pubSub.subscribe('TRANSACTION_REQUEST' + args.transactionId)(
rootValue,
args,
context,
info,
);
},
(transactionResponse: any, transactionRequest: any) => {
console.log('with filter transaction');
console.log('subscribe:', transactionResponse, transactionRequest);
return (
transactionResponse.userId ===
transactionRequest.transactionInput.userId
);
},
),
},
},
...
In AppSync you won't be able to log as you incrementally filter the subscription events, but you can have the user supply the attributes to filter by so that the resulting subscription events are the same.
Here Event is just the type of the object that your mutation returns:
type Subscription {
subscribeTransaction(topic: String userId: String): Event
#aws_subscribe(mutations: ["fooMutation"])
}
to start a subscription:
subscription onTransact {
subscribeTransaction(topic: "TRANSACTION_REQUEST" + args.transactionId userId: args.userId){
id
foo
bar
}
}
Note:
the name onTransact is arbitrary
assumes transactionId and userId were passed inside args
id, foo and bar will only be returned if the mutation also requested these attributes

Lambda trigger is not working as intended with bulk data

I'm using lambda triggers to detect an insertion into a DynamoDB table (Tweets). Once triggered, I want to take the message in the event, and get the sentiment for it using Comprehend. I then want to update a second DynamoDB table (SentimentAnalysis) where I ADD + 1 to a value depending on the sentiment.
This works fine if I manually insert a single item, but I want to be able to use the Twitter API to insert bulk data into my DynamoDB table and have every tweet analysed for its sentiment. The lambda function works fine if the count specified in the Twitter params is <= 5, but anything above causes an issue with the update in the SentimentAnalysis table, and instead the trigger keeps repeating itself with no sign of progress or stopping.
This is my lambda code:
let AWS = require("aws-sdk");
let comprehend = new AWS.Comprehend();
let documentClient = new AWS.DynamoDB.DocumentClient();
exports.handler = (event, context) => {
event.Records.forEach(record => {
if (record.eventName == "INSERT") {
//console.log(JSON.stringify(record.dynamodb.NewImage.tweet.S));
let params = {
LanguageCode: "en",
Text: JSON.stringify(record.dynamodb.NewImage.tweet.S)
};
comprehend.detectSentiment(params, (err, data) => {
if (err) {
console.log("\nError with call to Comprehend:\n " + JSON.stringify(err));
} else {
console.log("\nSuccessful call to Comprehend:\n " + data.Sentiment);
//when comprehend is successful, update the sentiment analysis data
//we can use the ADD expression to increment the value of a number
let sentimentParams = {
TableName: "SentimentAnalysis",
Key: {
city: record.dynamodb.NewImage.city.S,
},
UpdateExpression: "ADD " + data.Sentiment.toLowerCase() + " :pr",
ExpressionAttributeValues: {
":pr": 1
}
};
documentClient.update(sentimentParams, (err, data) => {
if (err) {
console.error("Unable to read item " + JSON.stringify(sentimentParams.TableName));
} else {
console.log("Successful Update: " + JSON.stringify(data));
}
});
}
});
}
});
};
This is the image of a successful call, it works with the first few tweets
This is the unsuccessful call right after the first image. The request is always timed out
The timeout is why it’s happening repeatedly. If the lambda times out or otherwise errs it will cause the batch to be reprocessed. You need to handle this because the delivery is “at least once”. You also need to figure out the cause of the timeout. It might be as simple as smaller batches, or a more complex solution using step functions. You might just be able to increase the timeout on the lambda.

AWS javascript SDK request.js send request function execution time gradually increases

I am using aws-sdk to push data to Kinesis stream.
I am using PutRecord to achieve realtime data push.
I am observing same delay in putRecords as well in case of batch write.
I have tried out this with 4 records where I am not crossing any shard limit.
Below is my node js http agent configurations. Default maxSocket value is set to infinity.
Agent {
domain: null,
_events: { free: [Function] },
_eventsCount: 1,
_maxListeners: undefined,
defaultPort: 80,
protocol: 'http:',
options: { path: null },
requests: {},
sockets: {},
freeSockets: {},
keepAliveMsecs: 1000,
keepAlive: false,
maxSockets: Infinity,
maxFreeSockets: 256 }
Below is my code.
I am using following code to trigger putRecord call
event.Records.forEach(function(record) {
var payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
// put record request
evt = transformEvent(payload );
promises.push(writeRecordToKinesis(kinesis, streamName, evt ));
}
Event structure is
evt = {
Data: new Buffer(JSON.stringify(payload)),
PartitionKey: payload.PartitionKey,
StreamName: streamName,
SequenceNumberForOrdering: dateInMillis.toString()
};
This event is used in put request.
function writeRecordToKinesis(kinesis, streamName, evt ) {
console.time('WRITE_TO_KINESIS_EXECUTION_TIME');
var deferred = Q.defer();
try {
kinesis.putRecord(evt , function(err, data) {
if (err) {
console.warn('Kinesis putRecord %j', err);
deferred.reject(err);
} else {
console.log(data);
deferred.resolve(data);
}
console.timeEnd('WRITE_TO_KINESIS_EXECUTION_TIME');
});
} catch (e) {
console.error('Error occured while writing data to Kinesis' + e);
deferred.reject(e);
}
return deferred.promise;
}
Below is output for 3 messages.
WRITE_TO_KINESIS_EXECUTION_TIME: 2026ms
WRITE_TO_KINESIS_EXECUTION_TIME: 2971ms
WRITE_TO_KINESIS_EXECUTION_TIME: 3458ms
Here we can see gradual increase in response time and function execution time.
I have added counters in aws-sdk request.js class. I can see same pattern in there as well.
Below is code snippet for aws-sdk request.js class which executes put request.
send: function send(callback) {
console.time('SEND_REQUEST_TO_KINESIS_EXECUTION_TIME');
if (callback) {
this.on('complete', function (resp) {
console.timeEnd('SEND_REQUEST_TO_KINESIS_EXECUTION_TIME');
callback.call(resp, resp.error, resp.data);
});
}
this.runTo();
return this.response;
},
Output for send request:
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 1751ms
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 1816ms
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 2761ms
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 3248ms
Here you can see it is increasing gradually.
Can anyone please suggest how can I reduce this delay?
3 seconds to push single record to Kinesis is not at all acceptable.