Kinesis Batch size and Lambda - amazon-web-services

I have confusion about Kinesis Batch size and Lambda
suppose our batchsSize is 100 and Kinesis got 200 notifications. Then It will trigger 2 lambda threads?

It does not work like this by default. By default:
Lambda invokes a function with one batch of data records from one shard at a time.
To process multiple batches from same shard at the same time you have to setup:
Concurrent batches per shard – Process multiple batches from the same shard concurrently.

Related

How to guarantee to process Kinesis event stream serially when using paralleization factor?

Kinesis stream has only 1 shard and when creating Lambda, concurrent batches per shard for Kinesis stream source has been set as 10. When there is a spike in stream data, it will increase the concurrencies to 10. That means we will have 10 lambdas working in parallel. My question in this case is, how we can guarantee to process event stream serailly? It seems to me that it is impossible to do that because we can't control concurrencies. Can anyone have an idea for this? I can't get my head round.
AWS Lambda supports concurrent batch processing per shard and serial event processing, as long as all events in the Kinesis stream have the same partition key.
From AWS documentation:
You can also increase concurrency by processing multiple batches from each shard in parallel. Lambda can process up to 10 batches in each shard simultaneously. If you increase the number of concurrent batches per shard, Lambda still ensures in-order processing at the partition-key level.
References:
Using AWS Lambda with Amazon Kinesis (AWS)
Partition Key (Amazon Kinesis Data Streams Terminology and Concepts)

Unexpected DynamoDB streams batching behavior

I have created DynamoDB with stream enabled and given Batch window =60 sec and batch size=100 so that my Lambda will wait 60 seconds to trigger with all items in single Lambda.
I have added 7 items into DynamoDB and Lambda triggered after 60 seconds only but at 60 seconds it triggered three lambdas with 2 items,2 items,4 items in respective lambdas event.
What could be the issue?
Presumably you have multiple partitions in DynamoDB. Each will initiate its own lambda invocations.
If your table is in On Demand mode then it starts with four partitions. Your updates touched three of them.

How can I increase the frequency that a lambda pull dynamodb stream?

My lambda is triggered by a dynamodb table stream. Based on the doc: https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html, Lambda polls shards in your DynamoDB stream for records at a base rate of 4 times per second. When records are available, Lambda invokes your function and waits for the result. If processing succeeds, Lambda resumes polling until it receives more records.
This means I will get about 250 millionseconds latency to trigger my lambda when there is an update happens on dynamodb. Is there a way to improve this pull rate?
You can not change the polling interval, you can only change things like the batch size or the parallelization factor.
Here you can look up the possibilities of configuration, when invoking a lambda through DynamoDBStreams:
https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-property-function-dynamodb.html

Does AWS Lambda process DynamoDB stream events strictly in order?

I'm in the process of writing a Lambda function that processes items from a DynamoDB stream.
I thought part of the point behind Lambda was that if I have a large burst of events, it'll spin up enough instances to get through them concurrently, rather than feeding them sequentially through a single instance. As long as two events have a different key, I am fine with them being processed out of order.
However, I just read this page on Understanding Retry Behavior, which says:
For stream-based event sources (Amazon Kinesis Data Streams and DynamoDB streams), AWS Lambda polls your stream and invokes your Lambda function. Therefore, if a Lambda function fails, AWS Lambda attempts to process the erring batch of records until the time the data expires, which can be up to seven days for Amazon Kinesis Data Streams. The exception is treated as blocking, and AWS Lambda will not read any new records from the stream until the failed batch of records either expires or processed successfully. This ensures that AWS Lambda processes the stream events in order.
Does "AWS Lambda processes the stream events in order" mean Lambda cannot process multiple events concurrently? Is there any way to have it process events from distinct keys concurrently?
With AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sources, the order is still guaranteed for each partition key, but not necessarily within each shard when Concurrent batches per shard is set to be greater than 1. Therefore the accepted answer needs to be revised.
Stream records are organized into groups, or shards.
According to Lambda documentation, the concurrency is achieved on shard-level. Within each shard, the stream events are processed in order.
Stream-based event sources : for Lambda functions that process Kinesis
or DynamoDB streams the number of shards is the unit of concurrency.
If your stream has 100 active shards, there will be at most 100 Lambda
function invocations running concurrently. This is because Lambda
processes each shard’s events in sequence.
And according to Limits in DynamoDB,
Do not allow more than two processes to read from the same DynamoDB
Streams shard at the same time. Exceeding this limit can result in
request throttling.

AWS Lambda Limits when processing Kinesis Stream

Can someone explain what happens to events when a Lambda is subscribed to Kinesis item create events. There is a limit of 100 concurrent requests for an account in AWS, so if 1,000,000 items are added to kinesis how are the events handled, are they queued up for the next available concurrent lambda?
From the FAQ http://aws.amazon.com/lambda/faqs/
"Q: How does AWS Lambda process data from Amazon Kinesis streams and Amazon DynamoDB Streams?
The Amazon Kinesis and DynamoDB Streams records sent to your AWS Lambda function are strictly serialized, per shard. This means that if you put two records in the same shard, Lambda guarantees that your Lambda function will be successfully invoked with the first record before it is invoked with the second record. If the invocation for one record times out, is throttled, or encounters any other error, Lambda will retry until it succeeds (or the record reaches its 24-hour expiration) before moving on to the next record. The ordering of records across different shards is not guaranteed, and processing of each shard happens in parallel."
What this means is if you have 1M items added to Kinesis, but only one shard, the throttle doesn't matter - you will only have one Lambda function instance reading off that shard in serial, based on the batch size you specified. The more shards you have, the more concurrent invocations your function will see. If you have a stream with > 100 shards, the account limit you mention can be easily increased to whatever you need it to be through AWS customer support. More details here. http://docs.aws.amazon.com/lambda/latest/dg/limits.html
hope that helps!