IIDR CDC Kafka on AWS - amazon-web-services

We are trying to publish data form db2-IIDR (IBM CDC) to Kafka on AWS . Subscription fails due to below error-
An error occurred during the conversation with Kafka.
Error: org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for cdckafka-subsname-commitstream-0
Are there any pre-requisites to be performed to publish the streams from IIDR to Kafka on aws ?
Some more details -
-Kafka cluster is running on AWS
-IIDR CDC engines --> on premise (both source and target)
-On premise IPs have been white-listed and I can ping/telnet ports from on-prem to aws and vice versa .
Thanks!

You need to set up your Kafka brokers with listeners that will work with external clients. You can see details here.
Simply pinging from client on-premises to AWS is not enough - you need to validate it with a Kafka client such as kafkacat

Related

Is there a server-less way I can consume the MSK Kafka topics content into an S3 bucket?

Is there a server-less way I can consume the Kafka topics content into an S3 bucket ?
(with or without kinesis)
I have :
AWS MSK kafka - it gets data from multiple topic sources
S3 bucket.
I want to take the data generated by the MSK Kafka topics, and save it to S3 (for archiving).
one way to do it is to use kinesis firehose.
I succeeded applying this work flow using MSK kafka kinesis connector.
(https://aws.amazon.com/premiumsupport/knowledge-center/kinesis-kafka-connector-msk/)
Problem is, I don't like this solution, because it is not server-less.
I have to use EC2 to run the connector and the kafka client on it.
I find it odd that I have 2 AWS services (meaning server-less) but for them to talk between themselves , I need to have a server to run on it processes (kafka kinesis connector + kafka client).
For example I thought of a filebeat (which will run on ECS fargate) to TAKE the data from MSK and put it in S3, but I'm afraid there will be performance issues with this solution.
Thanks in advance for your answers

How to deliver AWS SNS message to all instances of particular micro service

I've got rather rare requirement to deliver SNS topic message to all micro service instances.
Basically it's kind of notification that related data had changed
and all micro service instances should reload their internals from data source.
We are using TerraForm to create our infrastructure, with Kong api gateway.
Micro Service instances could be created 'on the fly' as system load is increased,
so subscriptions to topic could not be created in TerraForm stage.
Micro Service is standard SpringBoot app.
My first approach is:
micro service is exposing http endpoint that can be subscribed to SNS topic
micro service on start will subscribe itself (above endpoint) to required SNS topic, unsubscribe on service shutdown.
My problem is to determine individual micro service instances urls, that can be used in subscription process.
Alternative approach would be to use SQS, create SQS queue per micro srv instance (subscribe it to sns).
Maybe I'm doing it wrong on conceptual level ?
Maybe different architecture approach is required ?
It might be easier for the microservices to check an object in Amazon S3 to "pull" the configuration updates (or at least call HeadObject to check if the configuration has changed) rather than trying to "push" the configuration update to all servers.
Or, use AWS Systems Manager Parameter Store and have the servers cache the credentials for a period (eg 5 minutes) so they aren't always checking the configuration.
Kinda old right now but here is my solution:
create SNS, subscribe with SQS, publish the SQS to redis pub/sub, subscribe to pub/sub
now all your instances will get the event.

Amazon Kinesis vs AWS Manage Service Kafka (MSK) - (Connect from on-prem)

I'm evaluating AWS Kinesis vs Managed Service Kafka (MSK). Our requirement is sending some messages (JSON) to AWS to from the on-prem system (system develop using c++). Then we need to persist above messages into the relational database like PostgreSQL, and same time we need to stream above data into some other microservices (java) which hosted in AWS.
I have the following queries:
i) How can I access(connect and send messages) to AWS Kinesis from my on-premise system? Is there any C++ API supporting that? (There are java client API, but our on-prem system written on C++)
ii) How can I access(connect and send messages) to AWS MSK from my on-premise system?
iii) Is it possible to integrate MSK with other AWS service (e.g lambda, Redshift, EMR, etc)?
iv) To persist data into a database can we use AWS lambda? (AWS Kinesis supporting that functionality, what about AWS MSK)
v) Our message rate is 50msg/second and what is the cost-effective solution?
To be blunt, your use case sounds simple and 50 messages a second is a very low rate.
Kinesis is a firehose where you need a straw. Kinesis is meant to ingest, transform and process terabytes of moving data. ]
Have you considered rather looking at SQS or Amazon MQ ? Both are considerably simpler to use and manage than Kafka or Kinesis. Just from your questions it's clear you have not interacted with Kafka at all, so you're going to have a steep learning curve. SQS is a simple api-based queueing system - you publish to an SQS queue, and you consume from the queue. If you don't need to worry about ordering, routing, etc it is a persistent and reliable (if clunky) technology that lots of people use to great success.
To answer your actual questions:
Amazon publishes a C++ SDK for their services - I would be stunned if there wasn't a Kinesis client as part of this. You would either need a public Kinesis endpoint, or a private Kinesis endpoint accessible via some sort of tunnel or gateway between your on-prem network and your AWS vpc.
MSK is Kafka. You need an Apache Kafka C++ client, and similar to kinesis above you will need some sort of tunnel or gateway from your on-prem network to the AWS vpc where you have provisioned MSK
It's possible, but it's unlikely there are any turn-key solutions for this. You will have to write some sort of bridging software from Kafka -> Other systems
You can possibly use Lambda, so long as you cater for failures, timeouts, and other failure modes. To be honest, a stand-alone consumer running as a service in your vpc or on-prem is a better idea.
SQS or Amazon MQ as previously mentioned are likely to be simpler and more cost-effective than MSK, and will almost certainly be cheaper than Kinesis.

AWS IoT scaling issues and metrics for monitoring IoT

I am using AWS IoT Service.
When a device sends a registration message to MQTT broker, I have a rule to store it in a SQS queue.
A Lambda function is triggered, when the message is added to the Queue. The Thing is created for the device and it's certificate is registered.
While carrying out the load testing, I observed that, after some time, the incoming messages are not received on the AWS MQTT broker and are not processed.
I have written some test clients which run on EC2 instances to simulate the MQTT clients.
If I restart the test clients after some time, again I can see the messages coming to AWS IoT.
I am not sure, if this is the issue of MQTT broker or if it is the issue with the clients running on EC2 instances.
I can think of possible issues because of limits on AWS IoT ,
https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_iot
I want to know what are the possible AWS IoT matrices, I need to monitor for this or which IoT specific alarms I need to configure?
Is it a possible issue on EC2 side? ( maybe network out bytes per second, etc.)
There is another load testing scenario, in which I am not doing registration of devices, but just capturing the connect or disconnect events. In this case, I am not observing similar issues.
As you know, there is some limits about AWS IoT.
API Transactions per Second
CreateCertificateFromCsr 15
CreateDynamicThingGroup 5
CreateJob 10
CreatePolicy 10
CreatePolicyVersion 10
CreateRoleAlias 10
CreateThing 15
Generally, AWS API throws Exception when it run over limts.
How about catch Exception?
If you want to check EC2 network issue, use some command ( netstat, tcpdump, ... )

Aws IoT : How to use an application service on EC2?

I'd like to use AWS IoT to manage a grid of devices. Data by device must be sent to a queue service (RabbitMQ) hosted on an EC2 instance that is the starting point for a real time control application. I read how to make a rule to write data to other Service: Here
However there isn't an example for EC2. Using the AWS IoT service, how can I connect to a service on EC2?
Edit:
I have a real time application developed with storm that consume data from RabbitMQ and puts the result of computation in another RabbitMQ queue. RabbitMQ and storm are on EC2. I have devices producing data and connected to IoT. Data produced by devices must be redirected to the queue on EC2 that is the starting point of my application.
I'm sorry if I was not clear.
The AWS IoT supports pushing the data directly to other AWS services. As you have probably figured out by now publishing to third party APIs isn't directly supported.
From the choices AWS offers Lambda, SQS, SNS and Kinesis would probably work best for you.
With Lambda you could directly forward the incoming message using the one of Rabbit MQs APIs.
With SQS you would put it into an AWS queue first and than poll this queue transfering it to RabbitMQ.
Kinesis would allow more sophisticated processing, but is probably too complex.
I suggest you program a Lamba with the programming language of your choice using one of the numerous RabbitMQ APIs.