Hold DynamoDB Stream Events in Lambda for a Small Period - amazon-web-services

We're planning to add a few updates in Lambda Function which has been set as the trigger for DynamoDB New item Events
DynamoDB --> DynamoDB Streams --> AWS Lambda
So while performing an update, we need to make sure no events should be received by lambda.
Are there any easy-to-implement methods to do this??

Option 1: Streams -> Lambda (with enable/disable)
DynamoDB Streams can send events directly to Lambda using the Event Source Mapping integration. An Event Source Mapping is an AWS-managed poller resource that pulls events for you. If you remove EventBridge from the equation and use an Event Source Mapping, you can pause the flow of messages to the Lambda consumer with its enabled property:
docs: Set to true to enable the event source mapping. Set to false to stop processing records. Lambda keeps track of the last record processed and resumes processing from that point when the mapping is reenabled.
The UpdateEventSourceMapping API sets the enabled property:
aws lambda update-event-source-mapping --uuid <mapping-uuid> --enabled false
Option 2: Streams -> EventBridge -> Lambda (with archive and replay)
If you really need the EventBridge integration, you can temporarily disable the rule that triggers your Lambda, then use the EventBridge archive and replay functionality to catch up when ready. You take responsibility for determining which events need reprocessing. This will be easier if your application is idempotent.

First of all why are you using EventBridge?
The path should be DynamoDB -> DynamoDB Streams -> Lambda.
To ensure the Lambda only receives new items you can use a Lambda Event Filter
{
"filters": [
{
"pattern": "{\"eventName\" : [\"INSERT\"] }"
}
]
}

Related

aws transcribe callback function

I want to call AWS transcribe function from an AWS Lambda.
In that lambda handler, I want to start the transcription job but not wait for it to finish in a while loop since it will not be cost-efficient. I don't see any way for the transcription job finish to call another Lambda, or something like that, to store the transcription information in an s3 bucket for example.
Any idea how to solve this?
See Using Amazon EventBridge with Amazon Transcribe.
With Amazon EventBridge, you can respond to state changes in your Amazon Transcribe jobs by initiating events in other AWS services. When a transcription job changes state, EventBridge automatically sends an event to an event stream. You create rules that define the events that you want to monitor in the event stream and the action that EventBridge should take when those events occur. For example, routing the event to another service (or target), which can then take an action. You could, for example, configure a rule to route an event to an AWS Lambda function when a transcription job has completed successfully.
Another alternative is:
when you call StartTranscriptionJob, you supply an S3 bucket name and S3 object key that will receive the transcribed results
you can use the Amazon S3 Event Notifications feature to notify you or to automatically trigger a Lambda function

EventBridge and Lambda different inputs

I need to run a Lambda Function with a Thousand of different inputs, I did it with a Event Bus and another lambda that send the events reading de input from a DynamoDB.
Is that the best way to do this? The lambda that send the events to the event bus, take too much time, and I need to do a loop to send 10 entries at time for the event bus limitations, in boto3.
You can still trigger a stepfunction from Eventbridge and perform a parallel scan with dynamodb to produce the input for the lambda.
Alternatively you can update every dynamodb item with lambda or stepfunction and use dynamodb streams to trigger your lambda.

S3 Event -> Lambda vs S3->SNS->Lambda

I'm trying to understand the behavior of S3 Event Notification trigger. I have s3 events to trigger lambda. Lambda captures the event and file metadata to dynamodb. There would be around 50k event triggers in short burst across the day. If I had to add SNS in the workflow and have SNS trigger lambda, what are the advantages with sns vs s3 directly invoking lambda?
There is no gained advantage. Both S3 and SNS events are asynchronous event sources and behave the same way. See: Lambda supported event sources And: Lambda Retry on Errors (Asynchronous invocation part), which highlights nicely the lambda behavior with specific types of event sources.
Simply doing S3 -> Lambda is sufficient.
The advantage is flexibility for the future. If you use SNS in the middle, you can easily send (fan-out) the notifications to multiple destinations with more SNS topic subscriptions -- another Lambda function, an SQS Queue, an HTTPS endpoint, or even email, which can be very useful for non-intrusive observation, testing, troubleshooting, and developing new capabilities that need the same notification.

Trigger AWS lambda for specific Kinesis event push

I am new in AWS lambda area. I am creation a function which will consume Kinesis events. But I want to trigger my lambda function when specific event get push to kinesis (not for all events push to Kinesis). Is there a way that I can configure a filter upfront or my function needs to implement that filter after consuming all events?
One way of doing it is to split out the event you are interested on a separate stream either by:
Use Amazon Kinesis Analytics to copy records to an "event of interest" stream
Trigger another AWS Lambda function to copy records to an "event of interest" stream
Both of these in front of the lambda you currently have, and then connect that lambda to the new stream.

how should i architect aws lambda to support parallel process in batch model?

i have an aws lambda function to do some statistics on over 1k of stock tickers after market close. i have an option like below.
setup a cron job in ec2 instance and trigger a cron job to submit 1k http request asyn (e.g. http://xxxxx.lambdafunction.xxxx?ticker= to trigger the aws lambda function (or submit 1k request to SNS and let lambda to pickup.
i think it should run fine, but much appreciate if there is any serverless/PaaS approach to trigger task
On top of my head, Here are a couple of ways to achieve what you need:
Option 1: [Cost-Effective]
Post all the ticks to AWS FIFO SQS queue.
Define triggers on this queue to invoke lambda function.
Result: Since you are posting all the events in FIFO queue that maintains the order, all the events will be polled sequentially. More-over SQS to lambda trigger will help you scale automatically based on the number of message in the queue.
Option 2: [Costly and can easily scale for real-time processing]
Same as above, but instead of posting to FIFO queue, post to Kinesis Stream.
Enable Kinesis stream to trigger lambda function.
Result: Kinesis will ensure the order of event arriving in the stream and lambda function invocation will be invoked based on the number of shards in the stream. This implementation scales significantly. If you have any future use-case for real-time processing of tickers, this could be a great solution.
Option 3: [Cost Effective, alternate to Option:1]
Collect all ticker events(1k or whatever) and put it into a file.
Upload this file to AWS S3 bucket.
Enable S3 event notification to trigger proxy lambda function.
This proxy lambda function reads the s3 file and based on the total number of events in the file, it will spawn n parallel actor lambda function.
Actor lambda function will process each event.
Result: Easy to implement, cost-effective and provides easy scaling based on your custom algorithm to distribute the load in the proxy lambda function.
Option 4: [All-serverless]
Write a lambda function that gets the list of tickers from some web-server.
Define an AWS cloud watch rule for generating events based on cron/frequency.
Add a trigger to this cloudwatch rule to invoke proxy lambda function.
Proxy lambda function will use any combination of above options[1, 2 or 3] to trigger the actor lambda function for processing the records.
Result: Everything can be configured via AWS console and easy to use. Alternatively, you can also write your AWS cloud formation template to generate all the required resources in a single go.
Having said that, now I will leave this up to you to choose the right solution based on your business/cost requirements.
You can use lambda fanout option.
You can follow these steps to process 1k or more using serverless aproach.
1.Store all the stock tickers in a S3 file.
2.Create a master lambda which will read the s3 file and split the stocks in groups of 10.
3. Create a child lambda which will make the async call to external http service and fetch the details.
4. In the master lambda Loop through these groups and invoke 100 child lambdas passing in each group and return the results to the
Master lambda
5. Collect all the information returned from the child lambdas and continue with your processing here.
Now you can trigger this master lambda at the end of markets everyday using CloudWatch time based rule scheduler.
This is a complete serverless approach.