Scaling my Kinesis consumers when consumption is slow

Scaling my Kinesis consumers when consumption is slow - amazon-web-services

Assume I have a AWS Kinesis stream with 2 shards.
Therefore I have two consumers consuming from each shard.
There are large number of entries in the queues and my consumer is consuming it slowly.
To solve this I can go with Kafka consumer group like approach ,create two consumer applications and consume records.
But I want to know whether I can Reshard my queue (which will distribute the records across shards) and add consumers for that shards.
i.e After resharding my stream will have 4 shards and hence 4 consumers.
This will also increase the consumption and solve my problem.
Whats the pros and cons of the second approach as the second approach is generally suggested when the queue has ingestion issue?

Adding shards to increase consuming capacity is exactly what kinesis is about. It will increase the parallelism of your "consumer application" and should be pretty seamless.
Note that aws recently introduced a serverless kinesis model where you don't have to care about the shard count anymore. It's pretty much equivalent to letting aws care about the number of shards by itself so you don't have to worry about that anymore.

Related

Concurrency of Lambda not increasing as expected

We've configured an MSK (kafka) event source as the trigger for our Lambda function. Even though the offset lag is increasing the lambda concurrency is limited to 4-5 almost all the time as can be seen in the graph below. The configuration used for the MSK event source is:
Batch Size: 50
Batch window: 30 seconds
Number of partitions in the Kafka topic: 10
I made sure that the load is distributed equally across all the partitions. Is there anything I'm missing here which is causing the concurrency issue? Any solution is appreciated. Thanks in advance.

I think you are hitting the same limitation we found some months ago, this link led us in the right way (aka workaround in our case):
AWS MSK lambda concurrent consumers
It honestly makes sense that the partitions are not being used in all their capability because the jump from the msk EC2 setup to the lambda runtime is not something trivial. Maybe you can try other connectors.
https://docs.confluent.io/kafka-connectors/aws-lambda/current/overview.html#multiple-tasks
It also makes sense that bridging through Kinesis you would not have these specific issues as it is all Amazon native stuff.

Ideally concurrency should be a number with no decimals match the count of consumers count.
When you initially create an an Apache Kafka event source, Lambda allocates one consumer to process all partitions in the Kafka topic. Each consumer has multiple processors running in parallel to handle increased workloads. Additionally, Lambda automatically scales up or down the number of consumers, based on workload. To preserve message ordering in each partition, the maximum number of consumers is one consumer per partition in the topic.
source: https://docs.amazonaws.cn/en_us/lambda/latest/dg/with-kafka.html#services-kafka-scaling
The Offset Lag indicates performance issue for this blog gives better explanation Offset lag metric

multiple nodes for message processing Kafka

we have a spring boot app deployed on Kubernetes that processes messages: it reads from a Kafka topic and then it does some mappings and finally, it writes to Kafka topics
In order to achieve higher performance, we need to process the messages faster and hence we introduce multiple nodes of this spring boot app.
but I believe this would lead to a problem because:
The messages should be processed in order
the message contains a state
Is there any solution to keep the messages in order and to guarantee that a message already processed by a node wouldn't be processed by another and to resolve any other issues caused by the processing in multiple nodes.
Please feel free to address all possible solutions because we are building a POC.
does the use apache flink or spring-cloud-stream helpful for this matter?

When consuming messages from Kafka it is important to keep the concept of a Consumer Group in mind. This concept ensures that nodes that read from a Kafka topic and sharing the same Consumer Group will not interfere with each other. Whatever has been read by one of the consumers within the Consumer Group will not be read again by another consumer of the same Consumer Group.
In addition, applications reading and writing to Kafka scale with the number of partitions in a Kafka topic.
It would not have any impact if you have multiple nodes consuming a topic with only one partition, as one partition can only be read from a single consumer within a Consumer Group. You will find more information in the Kafka documentation on Consumers.
When you have a topic with more than one partition, ordering might become an issue. Kafka only guarantees the order within a partition.
Here is an excerpt of the Kafka documentation describing the interaction between consumer group and partitions:
By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances in a consumer group than partitions.

The limit to scaling up with Flink will be the number of partitions in your Kafka topic -- in other words, each instance of Flink's Kafka consumer will connect to and read from one or more partitions. With Flink, the ordering will be preserved unless you re-partition the data. Flink does provide exactly-once guarantees.
A quick way to experience Flink and Kafka in action together is explore Flink's operations playground. This dockerized playground is set up to let you explore rescaling, failure recovery, etc., and should make all this much more concrete.

You can run several consumer threads in a single application or even run several applications with several consumer threads. When all consumers belongs to the same group and Kafka topic has enough partitions Kafka will do balancing among topic partitions.
Messages in one partition are always ordered but to keep an order by the message key you should set max.in.flight.requests.per.connection=1. The broker always writes messages with the same key in the same partition (unless you change the partition number), so you will have all messages with the same key ordered.
One partition is readed by the only one consumer so the only way when another consumer gets processed messages is partitions rebalance before the message has ben acknowledged. You can set ack-mode=MANUAL_IMMEDIATE and acknowledge a message immediately after processing or use other acknowledge methods.
I'd recommend to read this article https://medium.com/#felipedutratine/kafka-ordering-guarantees-99320db8f87f

When should I use auto-scaling and when to use SQS?

I was studying about DynamoDb where I am stuck on a question for which I can't find any common solution.
My question is: if I have an application with dynamodb as db with initial write capacity of 100 writes per second and there is heavy load during peak hours suppose 300 writes per sec. In order to reduce load on the db which service should I use?
My take says we should go for auto-scaling but somewhere I studied that we can use sqs for making queue for data and kinesis also if order of data is necessary.

In the old days, before DynamoDB Auto-Scaling, a common use pattern was:
The application attempts to write to DynamoDB
If the request is throttled, the application stores the information in an Amazon SQS queue
A separate process regularly checks the SQS queue and attempts to write the data to DynamoDB. If successful, it removes the message from SQS
This allowed DynamoDB to be provisioned for average workload rather than peak workload. However, it has more parts that need to be managed.
These days, DynamoDB can use adaptive capacity and burst capacity to handle temporary changes. For larger changes, you can implement DynamoDB Auto Scaling, which is probably easier to implement than the SQS method.

The best solution depends on the characteristics of your application. Can it tolerate asynchronous database writes? Can it tolerate any throttling on database writes?
If you can handle some throttling from DynamoDB when there’s a sudden increase in traffic, you should use DynamoDB autoscaling.
If throttling is not okay, but asynchronous writes are okay, then you could use SQS in front of DynamoDB to manage bursts in traffic. In this case, you should still have autoscaling enabled to ensure that your queue workers have enough throughout available to them.
If you must have synchronous writes and you can not tolerate any throttling from DynamoDB, you should use DynamoDB’s on demand mode. (However, do note that there can still be throttling if you exceed 1k WCU or 3k RCU for a single partition key.)
Of course cost is also a consideration. Using DynamoDB with autoscaling will be the most cost effective method. I’m not sure how On Demand compares to the cost of using SQS.

Avoid throttle dynamoDB

I am new to cloud computing, but had a question if a mechanism as what I am about to describe exists or is possible to create?
Dynamodb has provisioned throughput (eg. 100 writes/second). Of course, in real world application real life throughput is very dynamic and will almost never be your provisioned amount of 100 writes/second. I was thinking what would be great would be some type of queue for dynamodb. For example, my dynamodb during peak hours may receive 500 write requests per second (5 times what I have allocated) and would return errors. Is it there some queue I can put in between the client and database, so the client requests go to the queue, the client gets acknowledged their request has been dealt with, then the queue spits out the request to the dynamodb at a rate of 100/ writes per second exactly, so that way there are no error returned and I don't need to raise the through put which will raise my costs?

Putting AWS SQS is front of DynamoDB would solve this problem for you, and is not an uncommon design pattern. SQS is already well suited to scale as big as it needs to, and ingest a large amount of messages with unpredictable flow patterns.
You could either put all the messages into SQS first, or use SQS as an overflow buffer when you exceed the design thoughput on your DynamoDB database.
One or more worker instances can than read messages from the SQS queue and put them into DynamoDB at exactly the the pace you decide.
If the order of the messages coming in is extremely important, Kinesis is another option for you to ingest the incoming messages and then insert them into DynamoDB, in the same order they arrived, at a pace you define.
IMO, SQS will be easier to work with, but Kineses will give you more flexibility if your needs are more complicated.

This cannot be accomplished using DynamoDB alone. DynamoDB is designed for uniform, scalable, predictable workloads. If you want to put a queue in front of DynamoDB you have do that yourself.
DynamoDB does have a little tolerance for burst capacity, but that is not for sustained use. You should read the best practices section Consider Workload Uniformity When Adjusting Provisioned Throughput, but here are a few, what I think are important, paragraphs with a few things emphasized by me:
For applications that are designed for use with uniform workloads, DynamoDB's partition allocation activity is not noticeable. A temporary non-uniformity in a workload can generally be absorbed by the bursting allowance, as described in Use Burst Capacity Sparingly. However, if your application must accommodate non-uniform workloads on a regular basis, you should design your table with DynamoDB's partitioning behavior in mind (see Understand Partition Behavior), and be mindful when increasing and decreasing provisioned throughput on that table.
If you reduce the amount of provisioned throughput for your table, DynamoDB will not decrease the number of partitions . Suppose that you created a table with a much larger amount of provisioned throughput than your application actually needed, and then decreased the provisioned throughput later. In this scenario, the provisioned throughput per partition would be less than it would have been if you had initially created the table with less throughput.
There are tools that help with auto-scaling DynamoDB, such as sebdah/dynamic-dynamodb which may be worth looking into.

One update for those seeing this recently, for having burst capacity DynamoDB launched on 2018 the On Demand capacity mode.
You don't need to decide on the capacity beforehand, it will scale read and write capacity following the demand.
See:
https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing/

Why should I use Amazon Kinesis and not SNS-SQS?

I have a use case where there will be stream of data coming and I cannot consume it at the same pace and need a buffer. This can be solved using an SNS-SQS queue. I came to know the Kinesis solves the same purpose, so what is the difference? Why should I prefer (or should not prefer) Kinesis?

Keep in mind this answer was correct for Jun 2015
After studying the issue for a while, having the same question in mind, I found that SQS (with SNS) is preferred for most use cases unless the order of the messages is important to you (SQS doesn't guarantee FIFO on messages).
There are 2 main advantages for Kinesis:
you can read the same message from several applications
you can re-read messages in case you need to.
Both advantages can be achieved by using SNS as a fan out to SQS. That means that the producer of the message sends only one message to SNS, Then the SNS fans-out the message to multiple SQSs, one for each consumer application. In this way you can have as many consumers as you want without thinking about sharding capacity.
Moreover, we added one more SQS that is subscribed to the SNS that will hold messages for 14 days. In normal case no one reads from this SQS but in case of a bug that makes us want to rewind the data we can easily read all the messages from this SQS and re-send them to the SNS. While Kinesis only provides a 7 days retention.
In conclusion, SNS+SQSs is much easier and provides most capabilities. IMO you need a really strong case to choose Kinesis over it.

On the surface they are vaguely similar, but your use case will determine which tool is appropriate. IMO, if you can get by with SQS then you should - if it will do what you want, it will be simpler and cheaper, but here is a better explanation from the AWS FAQ which gives examples of appropriate use-cases for both tools to help you decide:
FAQ's

Semantics of these technologies are different because they were designed to support different scenarios:
SNS/SQS: the items in the stream are not related to each other
Kinesis: the items in the stream are related to each other
Let's understand the difference by example.
Suppose we have a stream of orders, for each order we need to reserve some stock and schedule a delivery. Once this is complete, we can safely remove the item from the stream and start processing the next order. We are fully done with the previous order before we start the next one.
Again, we have the same stream of orders, but now our goal is to group orders by destinations. Once we have, say, 10 orders to the same place, we want to deliver them together (delivery optimization). Now the story is different: when we get a new item from the stream, we cannot finish processing it; rather we "wait" for more items to come in order to meet our goal. Moreover, if the processor process crashes, we must "restore" the state (so no order will be lost).
Once processing of one item cannot be separated from processing another one, we must have Kinesis semantics in order to handle all the cases safely.

Kinesis support multiple consumers capabilities that means same data records can be processed at a same time or different time within 24 hrs at different consumers, similar behavior in SQS can be achieved by writing into multiple queues and consumers can read from multiple queues. However writing again into multiple queue will add sub seconds {few milliseconds} latency in system.
Second, Kinesis provides routing capability to selective route data records to different shards using partition key which can be processed by particular EC2 instances and can enable micro batch calculation {Counting & aggregation}.
Working on any AWS software is easy but with SQS is easiest one. With Kinesis, there is a need to provision enough shards ahead of time, dynamically increasing number of shards to manage spike load and decrease to save cost also required to manage. it's pain in Kinesis, No such things are required with SQS. SQS is infinitely scalable.

Excerpt from AWS Documentation:
We recommend Amazon Kinesis Streams for use cases with requirements that are similar to the following:
Routing related records to the same record processor (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor.
Ordering of records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements.
Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently.
Ability to consume records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis Streams stores data for up to 7 days, you can run the audit application up to 7 days behind the billing application.
We recommend Amazon SQS for use cases with requirements that are similar to the following:
Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have a queue of work items and want to track the successful completion of each item independently. Amazon SQS tracks the ack/fail, so the application does not have to maintain a persistent checkpoint/cursor. Amazon SQS will delete acked messages and redeliver failed messages after a configured visibility timeout.
Individual message delay. For example, you have a job queue and need to schedule individual jobs with a delay. With Amazon SQS, you can configure individual messages to have a delay of up to 15 minutes.
Dynamically increasing concurrency/throughput at read time. For example, you have a work queue and want to add more readers until the backlog is cleared. With Amazon Kinesis Streams, you can scale up to a sufficient number of shards (note, however, that you'll need to provision enough shards ahead of time).
Leveraging Amazon SQS’s ability to scale transparently. For example, you buffer requests and the load changes as a result of occasional load spikes or the natural growth of your business. Because each buffered request can be processed independently, Amazon SQS can scale transparently to handle the load without any provisioning instructions from you.

The biggest advantage for me is the fact that Kinesis is a replayable queue, and SQS is not. So you can have multiple consumers of the same messages of Kinesis (or the same consumer at different times) where with SQS, once a message has been ack'd, it's gone from that queue.
SQS is better for worker queues because of that.

Another thing: Kinesis can trigger a Lambda, while SQS cannot. So with SQS you either have to provide an EC2 instance to process SQS messages (and deal with it if it fails), or you have to have a scheduled Lambda (which doesn't scale up or down - you get just one per minute).
Edit: This answer is no longer correct. SQS can directly trigger Lambda as of June 2018
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

The pricing models are different, so depending on your use case one or the other may be cheaper. Using the simplest case (not including SNS):
SQS charges per message (each 64 KB counts as one request).
Kinesis charges per shard per hour (1 shard can handle up to 1000 messages or 1 MB/second) and also for the amount of data you put in (every 25 KB).
Plugging in the current prices and not taking into account the free tier, if you send 1 GB of messages per day at the maximum message size, Kinesis will cost much more than SQS ($10.82/month for Kinesis vs. $0.20/month for SQS). But if you send 1 TB per day, Kinesis is somewhat cheaper ($158/month vs. $201/month for SQS).
Details: SQS charges $0.40 per million requests (64 KB each), so $0.00655 per GB. At 1 GB per day, this is just under $0.20 per month; at 1 TB per day, it comes to a little over $201 per month.
Kinesis charges $0.014 per million requests (25 KB each), so $0.00059 per GB. At 1 GB per day, this is less than $0.02 per month; at 1 TB per day, it is about $18 per month. However, Kinesis also charges $0.015 per shard-hour. You need at least 1 shard per 1 MB per second. At 1 GB per day, 1 shard will be plenty, so that will add another $0.36 per day, for a total cost of $10.82 per month. At 1 TB per day, you will need at least 13 shards, which adds another $4.68 per day, for a total cost of $158 per month.

Kinesis solves the problem of map part in a typical map-reduce scenario for streaming data. While SQS doesnt make sure of that. If you have streaming data that needs to be aggregated on a key, kinesis makes sure that all the data for that key goes to a specific shard and the shard can be consumed on a single host making the aggregation on key easier compared to SQS

Kinesis Use Cases
Log and Event Data Collection
Real-time Analytics
Mobile Data Capture
“Internet of Things” Data Feed
SQS Use Cases
Application integration
Decoupling microservices
Allocate tasks to multiple worker nodes
Decouple live user requests from intensive background work
Batch messages for future processing

I'll add one more thing nobody else has mentioned -- SQS is several orders of magnitude more expensive.

In very simple terms, and keeping costs out of the picture, the real intention of SNS-SQS are to make services loosely coupled. And this is only primary reason to use SQS where the order of the msgs are not so important and where you have more control of the messages. If you want a pattern of job queue using an SQS is again much better. Kinesis shouldn't be used be used in such cases because it is difficult to remove messages from kinesis because kinesis replays the whole batch on error. You can also use SQS as a dead letter queue for more control. With kinesis all these are possible but unheard of unless you are really critical of SQS.
If you want a nice partitioning then SQS won't be useful.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js