Does Kinesis Streams have a routing concept? - amazon-web-services

Is there something in Kinesis similar to Kafka topics to route different messages on a single endpoint? Let's say I receive the following messages
{
machine: 353e62ad-255e-44d0-85df-768093bffacd
origin: AWS
payload: ...
},
{
machine: 870f9e41-d033-466d-a0db-bad04db9303d
origin: AZURE
payload: ...
},
{
machine: 353e62ad-255e-44d0-85df-768093bffacd
origin: AWS
payload: ...
},
{
machine: f0c88d1d-dd73-40a6-b84e-91dd34328a46
origin: GCP
payload: ...
}
I now want to use different kinesis streams has a high volume FIFO queue for worker pools talking with AWS, GCP and Azure REST APIs. Parition key on machine for FIFO order since payloads should be delivered in-order.
Is there something that would route on kinesis level while maintaining FIFO?
SNS FIFO limit is too low, there are 1000 req/sec/origin.

use different kinesis streams has a high volume FIFO queue
Use statistics, detect Partion key what has high volumn. Then use PutRecord command for create a new stream.
For what called "routing", use partion key https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html#partition-key
Is there something that would route on kinesis level while maintaining FIFO? SNS FIFO limit is too low, there are 1000 req/sec/origin.
You can config rate limit by parameter CollectionMaxCount . Reference document https://docs.aws.amazon.com/streams/latest/dev/kinesis-producer-adv-retries-rate-limiting.html .

Related

What happens to the response sent by a lambda function back to the kinesis stream that triggers it?

In my architecture:
producer -> AWS Kinesis Stream -> lambda function -> MongoDB Atlas
What happens to the status code and data sent back from the lambda function to the kinesis trigger? Will the kinesis send it back to the producer?
No this is not possible. See AWS documentation for details. The kinesis trigger reads the response from the lambda and performs actions depending on what has been defined. The producer is not automatically notified.
What you can do is to define a On-failure destination which is using a i.E. an SQS queue. The producer might listen on that queue and handle failures

What is the low latency event sourcing service in AWS?

I am using EventBridge as event bus in our application. Based on its doc: https://aws.amazon.com/eventbridge/faqs/, the latency between sending and receiving an event is half second which is unacceptable in my application.
I am thinking about other alternatives. Kinesis has a problem about filtering events. Once a consumer attaches on a stream, it needs to provide some logics to filter out uninterested events. Since I am using lambda as the consumer and there will be many uninterested events trigger my lambda which will lead to high AWS bill.
AWS SNS can only support target of AWS services.
Another option is Kafka. But I can't find what the latency is when using AWS managed Kafka service.
What is the lowest latency event sourcing solution when using AWS
Kinesis is probably the best way to go now, thanks to the newly release "event filtering" feature. This allows you to configure an event source mapping which filters kinesis (or SQS, Dynamo Streams) events.
Doing this means you can use Kinesis as an event bus, without having to invoke a lambda with every event.
See: https://aws.amazon.com/about-aws/whats-new/2021/11/aws-lambda-event-filtering-amazon-sqs-dynamodb-kinesis-sources/

SQS or Kinesis which one is good for queuing?

I have a server which can only process 20 request at a time. When lots of request coming, I want to store the request data, in some queues. and read a set of request (i.e 20) and process them by batch. What would be the ideal way to that ? Using SQS, or kinesis. I'm totally confused.
SQS = Simple Queue Service is for queuing messages in a 1:1 (once the message is consumed, it is removed from the queue)
Kinesis = low latency, high volumetry data streaming ... typically for 1:N (many consumers of messages)
As Kinesis is also storing the data for a period of time, both are often confused, but their architectural patterns are totally different.
Queue => SQS.
Data Streams => Kinesis.
Taken from https://aws.amazon.com/kinesis/data-streams/faqs/ :
Q: How does Amazon Kinesis Data Streams differ from Amazon SQS?
Amazon Kinesis Data Streams enables real-time processing of streaming
big data. It provides ordering of records, as well as the ability to
read and/or replay records in the same order to multiple Amazon
Kinesis Applications. The Amazon Kinesis Client Library (KCL) delivers
all records for a given partition key to the same record processor,
making it easier to build multiple applications reading from the same
Amazon Kinesis data stream (for example, to perform counting,
aggregation, and filtering).
Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly
scalable hosted queue for storing messages as they travel between
computers. Amazon SQS lets you easily move data between distributed
application components and helps you build applications in which
messages are processed independently (with message-level ack/fail
semantics), such as automated workflows.
Q: When should I use Amazon Kinesis Data Streams, and when should I
use Amazon SQS?
We recommend Amazon Kinesis Data Streams for use cases with
requirements that are similar to the following:
Routing related records to the same record processor (as in streaming MapReduce). For example, counting and aggregation are
simpler when all records for a given key are routed to the same record
processor.
Ordering of records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining
the order of log statements.
Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a
real-time dashboard and another that archives data to Amazon Redshift.
You want both applications to consume data from the same stream
concurrently and independently.
Ability to consume records in the same order a few hours later. For example, you have a billing application and an audit application that
runs a few hours behind the billing application. Because Amazon
Kinesis Data Streams stores data for up to 7 days, you can run the
audit application up to 7 days behind the billing application.
We recommend Amazon SQS for use cases with requirements that are
similar to the following:
Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have a queue of work items and want to track
the successful completion of each item independently. Amazon SQS
tracks the ack/fail, so the application does not have to maintain a
persistent checkpoint/cursor. Amazon SQS will delete acked messages
and redeliver failed messages after a configured visibility timeout.
Individual message delay. For example, you have a job queue and need to schedule individual jobs with a delay. With Amazon SQS, you can
configure individual messages to have a delay of up to 15 minutes.
Dynamically increasing concurrency/throughput at read time. For example, you have a work queue and want to add more readers until the
backlog is cleared. With Amazon Kinesis Data Streams, you can scale up
to a sufficient number of shards (note, however, that you'll need to
provision enough shards ahead of time).
Leveraging Amazon SQS’s ability to scale transparently. For example, you buffer requests and the load changes as a result of occasional
load spikes or the natural growth of your business. Because each
buffered request can be processed independently, Amazon SQS can scale
transparently to handle the load without any provisioning instructions
from you.

Where to run my Kinesis Producer?

I need to build a Kinesis Producer app that simply puts data into a Kinesis Stream. The app will need to connect to a remote host and maintain a TCP socket to which data will be pushed from the remote host. There is very little data transformation, so the producer application will be very simple... I know I could setup an EC2 instance for this, but if there's a better way I'd like to explore that.
Examples:
You can build a producer on AWS Lambda, but since I have to maintain a long-running TCP connection, that wouldn't work.
You can maintain a connection to a WebSocket with AWS IoT and invoke a Lambda function on each message, but my connection is just a standard TCP connection
Question: What other products in the AWS suite of products that I could use to build a producer?
There are no suitable managed options, here. If your task is to...
originate and maintain a persistent TCP connection to a third-party remote device that you don't control,
consume wherever payload comes down the pipe,
process/transform it, and
feed it to code that serves as a Kinesis producer
...then you need a server, because there is not a service that does all of these things. EC2 is the product you are looking for.
The Producer code typically runs on the thing that is the source of the information you wish to capture.
For example:
When capturing network events, the Producer should be the networking equipment that is monitoring traffic.
When capturing retail purchases, the Producer is the system processing the transactions.
When capturing earth tremors, the Producer is the equipment that is monitoring vibrations.
In your case, the remote host should be the Producer, which sends the data to Kinesis. Rather than having the remote host push data to a Lambda function, simply have the remote host push directly to Kinesis.
Update
You mention Kinesis Agent:
Kinesis Agent is a stand-alone Java software application that offers an easy way to collect and send data to Kinesis Firehose.
If you are using Amazon Kinesis Firehose, then the Kinesis Agent can be your Producer. It sends the data to the Firehose. Or, you can write your own Producer for Firehose.
From Writing to a Kinesis Firehose Delivery Stream Using the AWS SDK:
You can use the Kinesis Firehose API to send data to a Kinesis Firehose delivery stream using the AWS SDK for Java, .NET, Node.js, Python, or Ruby.
If you are using Amazon Kinesis Streams, you will need to write your own Producer. From Producers for Amazon Kinesis Streams:
A producer puts data records into Kinesis streams. For example, a web server sending log data to an Kinesis stream is a producer
So, a Producer is just the term applied to whatever sends the data into Kinesis, and it is retrieved by a Consumer.
A couple of options:
You may be able to use IoT with a kinesis action for your remote host to push into a kinesis stream. In this case your remote app would be a device that talks directly to the AWS IoT infrastructure. You'd then setup a rule to forward all of the messages to a kinesis stream for processing. See https://aws.amazon.com/iot-platform/how-it-works/.
A benefit of this is that you no longer have to host a producer app anywhere. But you would need to be able to modify the app running on the remote host.
You don't have to use the Kinesis Producer Library (KPL), your data source could simply make repeated calls to PutRecord or PutRecords. Again this would require modifications to the remote app.
Or as you know, you could run your KPL app on an EC2. Talk to it over the network. This may give you more control over how the thing runs and would require less modifications to the remote app. But you now have a greater dev ops burden.

AWS Kinesis read from past

How do we read from AWS Kinesis stream going back in time?
Using AWS Kinesis stream, one can send stream of events and the consumer application can read the events. Kinesis Stream worker fetches the records and passes them to IRecordProcessor#processRecords from the last check point.
However If I have a need to read the records going back in time, such as start processing records from 2 hours ago, how do I configure my kinesis worker to fetch me such records?
You can start your kinesis consumer again (or a different one) with different settings regarding the Shard iterator.
see here GetShardIterator
The usual setting is LATEST or TRIM_HORIZON (oldest):
{
"ShardId": "ShardId",
"ShardIteratorType": "LATEST",
"StreamName": "StreamName",
}
But you can change it to a specific time (from the last 24 hours)
{
"ShardId": "ShardId",
"ShardIteratorType": "AT_TIMESTAMP",
"StreamName": "StreamName",
"Timestamp": 2016-06-29T19:58:46.480-00:00
}
Keep in mind that usually the kinesis consumer saves its checkpoints in a dynamodb table, so if you are using the same kinesis application you need to delete those checkpoints first.