Kafka cluster security for IOT - amazon-web-services

I am new to the Kafka and want to deploy Kafka Production cluster for IOT. We will be receiving messages from Raspberry Pi over the internet to our Kafka cluster which we will be hosting on AWS.
Now the concern, since we need to open the KAFKA PORT to the outer internet we are opening a way to system threat as it will compromise with the security by opening port to outer world.
Please let me know what can be done so that we can prevent malicious access using KAFKA port over the internet.
Pardon me if I am not clear with the question, do let me know if rephrasing of queation is needed.

Consider using a REST Proxy in front of your Kafka brokers (such as the one from Confluent). Then you can secure your Kafka cluster just as you would secure any REST API exposed to the public internet. This architecture is proven in production for several very large IoT use cases.

There are two ways that are most effective for Kafka Security.
Implement SSL Encryption for Kafka.
Authentication using SASL
You can follow this guide. http://kafka.apache.org/documentation.html#security_sasl

Related

Proxy in between device and Google IoT Core using MQTT?

I have a situation where I want to use Google IoT Core to support bi-directional communication between my devices and existing GCP stack. The trouble is, some of my devices cannot connect to GCP's MQTT bridge because they are blocked from reaching it directly. The communication must instead go through my own hosted server. In fact, some devices will not be allowed to trust traffic either inbound or outbound to anything but my own hosted server, and this is completely out of my control.
Basically all suggested solutions that I have found propose the use of MQTT over WebSockets. WebSockets consume too many system resources for the server I have available, and so MQTT proxy over WebSockets is extremely undesirable and likely is not even feasible for my use case. It also defeats the purpose of using a lightweight, low-bandwidth protocol like MQTT in the first place.
To make matters more complicated, Google IoT Core documentation explicitly says that it does not support bridging MQTT brokers with their MQTT bridge. So hosting my own MQTT server seems to be out of the question.
Is it even possible to create a proxy -- either forward or reverse -- for this use case that allows for native, encrypted, full-duplex MQTT traffic? If so, what would be the recommended way to achieve this?
If you have hybrid set-up, meaning you have on-premise servers and a cloud server and you want to bridge them using Google IoT by using MQTT.
You can try in this github link, upon checking this MQTT broker has been tested to Google IoT. Since Google IoT is not supporting 3rd paryt MQTT broker.

Choosing AWS service for MQTT broker

I need to build IOT MQTT broker that should work on secure MQTT protocol. I also need to manage users that connects to this service and manage subscription access control. Idon't need MQTT via web socket.
At first glance I was planning to use EC2 service in order to create Ubuntu virtual machine and install Mosquitto service in it. But later I found Internet of Things section that contains set of services.
Is it possible to construct MQTT service according my requirements by using Internet of Things. By choosing Internet of Things I hope to get more specialized functionality.
You can use AWS IoT for this instead, they have a managed MQTT endpoint that you can add 'things' to it.
https://docs.aws.amazon.com/iot/latest/developerguide/mqtt.html
You'll be able to easily connect the endpoint to other services as this is part of their cloud solutions.
https://docs.aws.amazon.com/iot/latest/developerguide/iot-gs.html

How to receive messages/events from public internet in Kafka/Nifi cluster hosted in the private subnet of AWS?

I am working on a project were lots of machines/sensors will be sending messages to Kafka/Nifi cluster directly. This machine/sensors will be pushing messages from public internet not from the corporate network. We are using a Hortonworks distribution on the AWS cloud.
My question is: what is the best architectural practice to setup Kafka /Nifi cluster for such use cases, I don't want to put my cluster in the public subnet in order to receive messages from the public internet.
Can you please help me with this?
Obviously you shouldn't expose your Kafka to the world. Therefore "sensor data directly to Kafka" is the wrong approach, IMO. At least, without using some SSL channel
You could allow a specific subnet of your external devices to reach the internal subnet, assuming you know that range, however I think your better option here is to use either Minifi or Streamsets SDC which are event collectors sitting on the sensors, which can encrypt traffic to an open Nifi or Streamsets cluster, which can then forward events to the internal Kafka cluster. You already have Nifi apparently, and therefore Minifi was built for this purpose
Another option could be the Kafka REST proxy, but you'll still need to setup authentication / security layers around it
Use AWS IoT to receive the devices communication, this option gives you a security layer and isolates you HDF sandbox from the internet.
AWS IoT Core provides mutual authentication and encryption at all points of connection, so that data is never exchanged between devices and AWS IoT Core without a proven identity.
Then import the information with a NiFi processor.

best architecture to deploy TCP/IP and UDP service on amazon AWS (Without EC2 instances)

i am traying to figure it out how is the best way to deploy a TCP/IP and UDP service on Amazon AWS.
I made a previous research to my question and i can not find anything. I found others protocols like HTTP, MQTT but no TCP or UDP
I need to refactor a GPS Tracking service running right now in AMAZON EC2. The GPS devices sent the position data using udp and tcp protocol. Every time a message is received the server have to respond with an ACKNOWLEDGE message, giving the reception confirmation to the gps device.
The problem i am facing right now and is the motivation to refactor is:
When the traffic increase, the server is not able to catch up all the messages.
I try to solve this issue with load balancer and autoscaling but UDP is not supported.
I was wondering if there is something like Api Gateway, which gave me a tcp or udp endpoint, leave the message on a SQS queue and process with a lambda function.
Thanks in advance!
Your question really doesn't make a lot of sense - you are asking how to run a service without running a server.
If you have reached the limits of a single instance, and you need to grow, look at using the AWS Network Load Balancer with an autoscaled group of EC2 instances. However, this will not support UDP - if you really need that, then you may have to look at 3rd party support in the AWS Marketplace.
Edit: Serverless architectures are designed for http based application, where you send a request and get a response. Since your app is TCP based, and uses persistent connections, most existing serverless implementations simply won't support it. You will need to rewrite your app to support http, or use traditional server based infrastructures that can support persistent connections.
Edit #2: As of Dec. 2018, API gateway supports WebSockets. This probably doesn't help with the original question, but opens up other alternatives if you need to run lambda code behind a long running connection.
If you want to go more Serverless, I think the ECS Container Service has instances that accept TCP and UDP. Also take a look at running Docker Containers with with Kubernetes. I am not sure if they support those protocols, but I believe they do.
If not, some EC2 instances with load balancing can be your best bet.

ELB for Websockets SSL

Does AWS support websockets with SSL ?
Can EWS ELB be used for websockets over SSL ?
What happens when a EC2 instance(machine) is added or removed to this ELB. Especially removed; what if a machine goes down. are the existing sockets routed to some other machine or reseted to connected.
can ELB be a bottleneck at any point in time.
any other alternatives .. let me know
This link might prove partially helpful for you - it would appear that you can do web sockets over SSL, but currently I'm struggling to implement it.
StackOverflow - Websocket with Tomcat 7 on AWS Elastic Beanstalk
Currently AWS ELB doesn't support Websocket balancing, there is a trick to do it via SSL, but it has some limitation and depends on your app logic. So if websocket connection is used only as server-client communication, it will work. But if you have more advanced logic when clients must communicate with each other via a server then this solution won't work. For example one client has established connection for a chatroom, then other clients can connect to the established chatroom and communicate with each other.
Then only possible way to use HA-proxy http://blog.haproxy.com/2012/11/07/websockets-load-balancing-with-haproxy/
But shown example just shows how to configure HA-proxy base on two servers. So if you do not use Amazon Autoscalling Group, the solution is good. But if you will need use ASG, the question about add/remove instances to ha-proxy config is other challenge.