How to monitor ProvisionedThroughputExceededException in AWS kinesis stream? - amazon-web-services

From the Kinesis Stream side, is there a way to display how many times or how many records were thrown away because of ProvisionedThroughputExceededException, the quota that allows no more than 1000 records or 6 mb of data to be written per second to a shard?

Related

AWS CloudWatch Logs Data limitation

Is there any data limitation on aws cloudwatch logs to send the logs , because in my case I am getting the logs data 6 million records per 3 days from my application. So is aws cloudwatch logs will able to handle that much data?
Check out the aws quotas page. Not sure what you mean by "60lac" but the limits on CloudWatch are more than adequate for the majority of use cases.
There is no published limit on the overall data volume held. There'll be a practical limit somewhere but it won't be hit by a single AWS customer. If you're using the putLogEvents API you could be constrained by the limit of 5 requests per second per log stream, in which case consider using more streams or larger batches of events (up to 1MB).

Kinesis Firehose delivers data from DynamoDB Steam to S3: Why the numbers of JSON objects in files is different?

I'm new to AWS, and I'm working on archiving data from DynamoDB to S3. This is my solution and I have done the pipeline.
DynamoDB -> DynamoDB TTL + DynamoDB Stream -> Lambda -> Kinesis Firehose -> S3
But I found that the files in S3 has different number of JSON objects. Some files has 7 JSON objects, some has 6 or 4 objects. I have done ETL in lambda, the S3 only saves REMOVE item, and the JSON has been unmarshall.
I thought it would be a JSON object in a file, since the TTL value is different for each item, and the lambda would deliver the item immediately when the item is deleted by TTL.
Does it because the Kinesis Firehose batches the items? (It would wait for sometime after collecting more items then saving them to a file) Or there's other reason? Could I estimate how many files it will save if DynamoDB has a new item is deleted by TTL every 5 minutes?
Thank you in advance.
Kinesis Firehose splits your data based on buffer size or interval.
Let's say you have a buffer size of 1MB and an interval of 1 minute.
If you receive less than 1MB within the 1 minute interval, Kinesis Firehose will anyway create a batch file out of the received data, even if it is less than 1MB of data.
This is likely happening in scenarios with few data arriving. You can adjust your buffer size and interval to your needs. E.g. You could increase the interval to collect more items within a single batch.
You can choose a buffer size of 1–128 MiBs and a buffer interval of 60–900 seconds. The condition that is satisfied first triggers data delivery to Amazon S3.
From the AWS Kinesis Firehose Docs: https://docs.aws.amazon.com/firehose/latest/dev/create-configure.html

What is the maximum size of data can be stored in kinesis stream?

AWS Kinesis stream supports saving a record from 24 hours to 7 days of time. And it has a maximum one record size which is 1 megabyte but it doesn't say the maximum data can be saved in a kinesis stream. I wonder what if I put a large volume of data to a stream, will it run out of space?
You can put as much data as you want which will require you to create more shards and as per below documentation. There is no upper limit on the number of shards you can have in a stream or account. However, shards will cost you money.
Data will be there depending upon your retention period and after that, it will be expired.

AWS Lambda Limits when processing Kinesis Stream

Can someone explain what happens to events when a Lambda is subscribed to Kinesis item create events. There is a limit of 100 concurrent requests for an account in AWS, so if 1,000,000 items are added to kinesis how are the events handled, are they queued up for the next available concurrent lambda?
From the FAQ http://aws.amazon.com/lambda/faqs/
"Q: How does AWS Lambda process data from Amazon Kinesis streams and Amazon DynamoDB Streams?
The Amazon Kinesis and DynamoDB Streams records sent to your AWS Lambda function are strictly serialized, per shard. This means that if you put two records in the same shard, Lambda guarantees that your Lambda function will be successfully invoked with the first record before it is invoked with the second record. If the invocation for one record times out, is throttled, or encounters any other error, Lambda will retry until it succeeds (or the record reaches its 24-hour expiration) before moving on to the next record. The ordering of records across different shards is not guaranteed, and processing of each shard happens in parallel."
What this means is if you have 1M items added to Kinesis, but only one shard, the throttle doesn't matter - you will only have one Lambda function instance reading off that shard in serial, based on the batch size you specified. The more shards you have, the more concurrent invocations your function will see. If you have a stream with > 100 shards, the account limit you mention can be easily increased to whatever you need it to be through AWS customer support. More details here. http://docs.aws.amazon.com/lambda/latest/dg/limits.html
hope that helps!

Low Throughput of AWS Kinesis

I use python's boto.kinesis module to write records to AWS Kinesis. The maximum throughput that is reached is about 40 puts/sec. However, according to the Kinesis FAQ:
Each shard can support up to 1000 PUT records per second.
So my current approach reaches only 4% what is theoretically possible, which seems terribly low.
Does anyone have an idea how the throughput can be improved?
Setup: The Kinesis Stream is an instance with one shard. The producer is on a dedicated AWS EC2 instance (t3.medium) in the same region as the Kinesis Stream. It creates strings of about 20 characters lengths and sends them to the Kinesis Stream via boto.kinesis.Connection.put_record("my_stream", my_message).
Simplified code:
from boto import kinesis
import time
connection = kinesis.connect_to_region(REGION)
stream = connection.create_stream("my_stream", shard_count=1)
time.sleep(60) # wait a minute until stream is created
for i in range(NUM_MESSAGES):
my_message = "This is message %d" % i
connection.put_record(my_message, "my_stream", "partition_key")
http://docs.aws.amazon.com/kinesis/latest/dev/service-sizes-and-limits.html
The limit is for records/second
you should use putRecords to improve write throughput. the way you do that is that you place multiple records inside the same call. so you keep appending and the end you do the put records.
also
take a look at: https://github.com/awslabs/kinesis-poster-worker