Is there any way to send data from AWS Kinesis to Azure Event Hubs? - amazon-web-services

My company is doing a POC on some streaming data and one of the tasks is sending data from AWS Kinesis to Azure Event Hubs.
Has anyone tryed to do something like this before?
I was thinking of a lambda function listening to kinesis firehose and sending the data to event hubs but I have no experience on Azure at all and I don't even know if this is possible.

Yes, this is very much possible.
Inter-Cloud environment where data can be streamed among two services can be achieved using AWS Kinesis and Azure Event Hub.
You can stream data from Amazon Kinesis directly to Azure Event Hub in Real-Time. Using ‘serverless’ model and cloud computing to process and transfer events without having the need to manage any native application written on an on-premise server.
You will be required connection string, SharedAccessKeyName, and SharedAccessKey from the Azure Event Hub. This will be needed to send data to Event Hub. Also, make sure the Event hub can receive data from the IP address you are running the program from.
Refer this third-party tutorial to accomplish the same

Related

How to create a topic in Amazon Sqs/Sns

I have a process which publish some data(json) onto a queue on Aws-Sqs. Another process reads from this queue. All this is working fine.
Now I want to create a topic which can be listened by mutiple processes and the data is delivered to all the processes. For example Activemq and many other messaging servers have this capability to create a topic. I could not find any such thing on AWS. The closest I could find is AWS SNS.
From what I understand AWS-SNS allows multiple clients to subscribe to a topic. But the type of subscription is either Email, Http, or Sms and so on ... This does not really serve my purpose. I want to recieve json data in all my clients just like Sqs.
Is that achievable? If so how?
You can subscribe multiple SQS into single SNS topic: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-subscribe-queue-sns-topic.html
Then it will be distributed to all of them.
The other option is to use Kinesis - https://aws.amazon.com/kinesis/ but it is more difficult to set up. There you can also read from multiple clients from the stream.
amazon mq is a managed active mq service. maybe this will help with your needs?

Do I need SQS queues to store remote data in the Amazon Web Services (AWS) cloud?

My first question is, do I need SQS queues to receive my remote data, or can it go directly into an Amazon cloud storage solution like S3 or EC2?
Currently, my company uses a third-party vendor to gather and report on our remote data. By remote data, I mean data coming from our machines out in the wilderness. These data are uploaded a few times each day to Amazon Web Services SQS queues (setup by the third party vendor), and then the third-party vendor polls the data from the queues, removing it and saving it in their own on-premises databases for one year only. This company only provides reporting services to us, so they don't need to store the data long-term.
Going forward, we want to own the data and store it permanently in Amazon Web Services (AWS). Then we want to use machine learning to monitor the data and report any potential problems with the machines.
To repeat my first question, do we need SQS queues to receive this data, or can it go directly into an Amazon cloud storage solution like S3 or EC2?
My second question is, can an SQS queue send data to two different places? That is, can the queue send the data to the third party vendor, and also to an Amazon Web Services database?
I am an analyst/data scientist, so I know how to use the data once it's in a database. I just don't know the best way of getting it into a database.
You don't really need to have a queue. Whenever you push an item in Queue a function gets triggered and you can perform your custom logic in that. whether you want to store the information to S3/EC2 or sending it to anyother Http service.
Your Lambda(function) can send the data to anyother 3rd party service easily.

Pointing to Azure Event Hub instead of AWS Kinesis

My company currently uses Azure for our Data Warehousing infrastructure. In the past we have used Azure Event Hubs for streaming data. When working on previous projects this hasn't been an issue we just provide the connection details and they start sending us data.
However we have recently started working on a new project where most of their infrastructure is hosted on AWS, we have been asked to set up a Amazon Kinesis endpoint instead as they do not support Azure Event Hubs.
I don't know much about sending the data, but is it asking a lot to send to an Event Hub instead of Kinesis?
My suggestion for this is you could introduce a middle layer which understands both Kinesis and Event hub. And the middle layer I know is Spring Cloud Stream. It provides binder abstraction to supports various message middleware such as Kafka, Kinesis and Event hub.

Where to run my Kinesis Producer?

I need to build a Kinesis Producer app that simply puts data into a Kinesis Stream. The app will need to connect to a remote host and maintain a TCP socket to which data will be pushed from the remote host. There is very little data transformation, so the producer application will be very simple... I know I could setup an EC2 instance for this, but if there's a better way I'd like to explore that.
Examples:
You can build a producer on AWS Lambda, but since I have to maintain a long-running TCP connection, that wouldn't work.
You can maintain a connection to a WebSocket with AWS IoT and invoke a Lambda function on each message, but my connection is just a standard TCP connection
Question: What other products in the AWS suite of products that I could use to build a producer?
There are no suitable managed options, here. If your task is to...
originate and maintain a persistent TCP connection to a third-party remote device that you don't control,
consume wherever payload comes down the pipe,
process/transform it, and
feed it to code that serves as a Kinesis producer
...then you need a server, because there is not a service that does all of these things. EC2 is the product you are looking for.
The Producer code typically runs on the thing that is the source of the information you wish to capture.
For example:
When capturing network events, the Producer should be the networking equipment that is monitoring traffic.
When capturing retail purchases, the Producer is the system processing the transactions.
When capturing earth tremors, the Producer is the equipment that is monitoring vibrations.
In your case, the remote host should be the Producer, which sends the data to Kinesis. Rather than having the remote host push data to a Lambda function, simply have the remote host push directly to Kinesis.
Update
You mention Kinesis Agent:
Kinesis Agent is a stand-alone Java software application that offers an easy way to collect and send data to Kinesis Firehose.
If you are using Amazon Kinesis Firehose, then the Kinesis Agent can be your Producer. It sends the data to the Firehose. Or, you can write your own Producer for Firehose.
From Writing to a Kinesis Firehose Delivery Stream Using the AWS SDK:
You can use the Kinesis Firehose API to send data to a Kinesis Firehose delivery stream using the AWS SDK for Java, .NET, Node.js, Python, or Ruby.
If you are using Amazon Kinesis Streams, you will need to write your own Producer. From Producers for Amazon Kinesis Streams:
A producer puts data records into Kinesis streams. For example, a web server sending log data to an Kinesis stream is a producer
So, a Producer is just the term applied to whatever sends the data into Kinesis, and it is retrieved by a Consumer.
A couple of options:
You may be able to use IoT with a kinesis action for your remote host to push into a kinesis stream. In this case your remote app would be a device that talks directly to the AWS IoT infrastructure. You'd then setup a rule to forward all of the messages to a kinesis stream for processing. See https://aws.amazon.com/iot-platform/how-it-works/.
A benefit of this is that you no longer have to host a producer app anywhere. But you would need to be able to modify the app running on the remote host.
You don't have to use the Kinesis Producer Library (KPL), your data source could simply make repeated calls to PutRecord or PutRecords. Again this would require modifications to the remote app.
Or as you know, you could run your KPL app on an EC2. Talk to it over the network. This may give you more control over how the thing runs and would require less modifications to the remote app. But you now have a greater dev ops burden.

Aws IoT : How to use an application service on EC2?

I'd like to use AWS IoT to manage a grid of devices. Data by device must be sent to a queue service (RabbitMQ) hosted on an EC2 instance that is the starting point for a real time control application. I read how to make a rule to write data to other Service: Here
However there isn't an example for EC2. Using the AWS IoT service, how can I connect to a service on EC2?
Edit:
I have a real time application developed with storm that consume data from RabbitMQ and puts the result of computation in another RabbitMQ queue. RabbitMQ and storm are on EC2. I have devices producing data and connected to IoT. Data produced by devices must be redirected to the queue on EC2 that is the starting point of my application.
I'm sorry if I was not clear.
The AWS IoT supports pushing the data directly to other AWS services. As you have probably figured out by now publishing to third party APIs isn't directly supported.
From the choices AWS offers Lambda, SQS, SNS and Kinesis would probably work best for you.
With Lambda you could directly forward the incoming message using the one of Rabbit MQs APIs.
With SQS you would put it into an AWS queue first and than poll this queue transfering it to RabbitMQ.
Kinesis would allow more sophisticated processing, but is probably too complex.
I suggest you program a Lamba with the programming language of your choice using one of the numerous RabbitMQ APIs.