I'm evaluating AWS Kinesis vs Managed Service Kafka (MSK). Our requirement is sending some messages (JSON) to AWS to from the on-prem system (system develop using c++). Then we need to persist above messages into the relational database like PostgreSQL, and same time we need to stream above data into some other microservices (java) which hosted in AWS.
I have the following queries:
i) How can I access(connect and send messages) to AWS Kinesis from my on-premise system? Is there any C++ API supporting that? (There are java client API, but our on-prem system written on C++)
ii) How can I access(connect and send messages) to AWS MSK from my on-premise system?
iii) Is it possible to integrate MSK with other AWS service (e.g lambda, Redshift, EMR, etc)?
iv) To persist data into a database can we use AWS lambda? (AWS Kinesis supporting that functionality, what about AWS MSK)
v) Our message rate is 50msg/second and what is the cost-effective solution?
To be blunt, your use case sounds simple and 50 messages a second is a very low rate.
Kinesis is a firehose where you need a straw. Kinesis is meant to ingest, transform and process terabytes of moving data. ]
Have you considered rather looking at SQS or Amazon MQ ? Both are considerably simpler to use and manage than Kafka or Kinesis. Just from your questions it's clear you have not interacted with Kafka at all, so you're going to have a steep learning curve. SQS is a simple api-based queueing system - you publish to an SQS queue, and you consume from the queue. If you don't need to worry about ordering, routing, etc it is a persistent and reliable (if clunky) technology that lots of people use to great success.
To answer your actual questions:
Amazon publishes a C++ SDK for their services - I would be stunned if there wasn't a Kinesis client as part of this. You would either need a public Kinesis endpoint, or a private Kinesis endpoint accessible via some sort of tunnel or gateway between your on-prem network and your AWS vpc.
MSK is Kafka. You need an Apache Kafka C++ client, and similar to kinesis above you will need some sort of tunnel or gateway from your on-prem network to the AWS vpc where you have provisioned MSK
It's possible, but it's unlikely there are any turn-key solutions for this. You will have to write some sort of bridging software from Kafka -> Other systems
You can possibly use Lambda, so long as you cater for failures, timeouts, and other failure modes. To be honest, a stand-alone consumer running as a service in your vpc or on-prem is a better idea.
SQS or Amazon MQ as previously mentioned are likely to be simpler and more cost-effective than MSK, and will almost certainly be cheaper than Kinesis.
Related
Is there a server-less way I can consume the Kafka topics content into an S3 bucket ?
(with or without kinesis)
I have :
AWS MSK kafka - it gets data from multiple topic sources
S3 bucket.
I want to take the data generated by the MSK Kafka topics, and save it to S3 (for archiving).
one way to do it is to use kinesis firehose.
I succeeded applying this work flow using MSK kafka kinesis connector.
(https://aws.amazon.com/premiumsupport/knowledge-center/kinesis-kafka-connector-msk/)
Problem is, I don't like this solution, because it is not server-less.
I have to use EC2 to run the connector and the kafka client on it.
I find it odd that I have 2 AWS services (meaning server-less) but for them to talk between themselves , I need to have a server to run on it processes (kafka kinesis connector + kafka client).
For example I thought of a filebeat (which will run on ECS fargate) to TAKE the data from MSK and put it in S3, but I'm afraid there will be performance issues with this solution.
Thanks in advance for your answers
I'm an AWS noob, I'm trying to figure out what the difference between Amazon's Kinesis Data Stream and EventBridge products. Can someone explain this for someone not familiar with the AWS tech stack?
Kinesis is a real-time stream processing service. Typically gets used for storing logs or end-user data coming from browser.
Event bridge is typically used to reliably communicate between apps / microservices, so it's quite similar to SQS, but has some added features.
Differences between SQS and Event Bridge are explained in the post below:
https://www.reddit.com/r/aws/comments/cjnw2l/what_makes_eventbridge_different_than_sqs_and/
We have a third party Apache Kafka producer which is on premise. We need to read message from third party kafka component into AWS using AWS services and triggering lambda function. What approach should be taken to consume message from kafka in AWS
You have a few options. You could just schedule a lambda call every few seconds and read from the kafka topic using your favourite language. That is quite simple and depending on the data volume you are getting, it might be good enough.
Alternatively, you can install a community contributed kafka connector for lambda, and just invoke lambda directly.
Or you can use the awslabs kafka connector for Kinesis, that relays messages from Kafka into Kinesis data streams or kinesis firehose, where you can use lambda natively.
I'd like to use AWS IoT to manage a grid of devices. Data by device must be sent to a queue service (RabbitMQ) hosted on an EC2 instance that is the starting point for a real time control application. I read how to make a rule to write data to other Service: Here
However there isn't an example for EC2. Using the AWS IoT service, how can I connect to a service on EC2?
Edit:
I have a real time application developed with storm that consume data from RabbitMQ and puts the result of computation in another RabbitMQ queue. RabbitMQ and storm are on EC2. I have devices producing data and connected to IoT. Data produced by devices must be redirected to the queue on EC2 that is the starting point of my application.
I'm sorry if I was not clear.
The AWS IoT supports pushing the data directly to other AWS services. As you have probably figured out by now publishing to third party APIs isn't directly supported.
From the choices AWS offers Lambda, SQS, SNS and Kinesis would probably work best for you.
With Lambda you could directly forward the incoming message using the one of Rabbit MQs APIs.
With SQS you would put it into an AWS queue first and than poll this queue transfering it to RabbitMQ.
Kinesis would allow more sophisticated processing, but is probably too complex.
I suggest you program a Lamba with the programming language of your choice using one of the numerous RabbitMQ APIs.
We would like to stream data directly from EC2 web server to RedShift. Do I need to use Kinesis? What is the best practice? I do not plan to do any special analysis before the storage on this data. I would like a cost effective solution (it might be costly to use DynamoDB as a temporary storage before loading).
If cost is your primary concern than the exact number of records/second combined with the record sizes can be important.
If you are talking very low volume of messages a custom app running on a t2.micro instance to aggregate the data is about as cheap as you can go, but it won't scale. The bigger downside is that you are responsible for monitoring, maintaining, and managing that EC2 instance.
The modern approach would be to use a combination of Kinesis + Lambda + S3 + Redshift to have the data stream in requiring no EC2 instances to mange!
The approach is described in this blog post: A Zero-Administration Amazon Redshift Database Loader
What that blog post doesn't mention is now with API Gateway if you do need to do any type of custom authentication or data transformation you can do that without needing an EC2 instance by using Lambda to broker the data into Kinesis.
This would look like:
API Gateway -> Lambda -> Kinesis -> Lambda -> S3 -> Redshift
Redshift is best suited for batch loading using the COPY command. A typical pattern is to load data to either DynamoDB, S3, or Kinesis, then aggregate the events before using COPY to Redshift.
See also this useful SO Q&A.
I implemented a such system last year inside my company using Kinesis and Kinesis connector. Kinesis connector is just a standalone app released by AWS we are running in a bunch of ElasticBeanStalk servers as Kinesis consumers, then the connector will aggregate messages to S3 every a while or every amount of messages, then it will trigger the COPY command from Redshift to load data into Redshift periodically. Since it's running on EBS, you can tune the auto-scaling conditions to make sure the cluster grows and shrinks with the volume of data from Kinesis stream.
BTW, AWS just announced Kinesis Firehose yesterday. I haven't played it but it definitely looks like a managed version of the Kinesis connector.