How to receive messages/events from public internet in Kafka/Nifi cluster hosted in the private subnet of AWS? - amazon-web-services

I am working on a project were lots of machines/sensors will be sending messages to Kafka/Nifi cluster directly. This machine/sensors will be pushing messages from public internet not from the corporate network. We are using a Hortonworks distribution on the AWS cloud.
My question is: what is the best architectural practice to setup Kafka /Nifi cluster for such use cases, I don't want to put my cluster in the public subnet in order to receive messages from the public internet.
Can you please help me with this?

Obviously you shouldn't expose your Kafka to the world. Therefore "sensor data directly to Kafka" is the wrong approach, IMO. At least, without using some SSL channel
You could allow a specific subnet of your external devices to reach the internal subnet, assuming you know that range, however I think your better option here is to use either Minifi or Streamsets SDC which are event collectors sitting on the sensors, which can encrypt traffic to an open Nifi or Streamsets cluster, which can then forward events to the internal Kafka cluster. You already have Nifi apparently, and therefore Minifi was built for this purpose
Another option could be the Kafka REST proxy, but you'll still need to setup authentication / security layers around it

Use AWS IoT to receive the devices communication, this option gives you a security layer and isolates you HDF sandbox from the internet.
AWS IoT Core provides mutual authentication and encryption at all points of connection, so that data is never exchanged between devices and AWS IoT Core without a proven identity.
Then import the information with a NiFi processor.

Related

How to connect to Kafka in Kubernetes from outside?

The challenge is to create a Kafka producer that connects to a Kafka cluster that lives within a Kubernetes cluster from outside that cluster. We have several RDBMS databases that sit on premise and we want to stream data directly to Kafka that lives in Kubernetes on AWS. We have tried a few things and deployed the Confluent Open Source Platform but nothing worked so far. Does anyone have a clear answer to this problem?
You might have a look at deploying Kafka Connect inside of Kubernetes. Since you want to replicate data from various RDMBS databases, you need to setup source connectors,
A source connector ingests entire databases and streams table updates
to Kafka topics. It can also collect metrics from all of your
application servers into Kafka topics, making the data available for
stream processing with low latency.
Depending on your source databases, you'd have to configure the corresponding connectors.
If you are not familiar with Kafka Connect, this article might be quite helpful as it explains the key concepts.
Kafka clients need to connect to specific node to produce or consume messages.
The kafka protocol can connect to any node to get metadata. Then the client connects to a specific node which has been elected as leader of the partition which the client wants to produce/consume.
Each kafka pod has to be individually accessible, so you need a L4 load balancer per pod. The advertised listener config can be set in the kafka config to advertise different IP/hostname for internal and external clients. Configure the ADVERTISED_LISTENERS EXTERNAL to use the load balancer, and the INTERNAL to use pod IP. The ports has to be different for internal and external.
Checkout https://strimzi.io/, https://bitnami.com/stack/kafka, https://github.com/confluentinc/cp-helm-charts
Update:
Was trying out installing kafka in k8s running in AWS EC2. Between
confluent-operator, bitnami-kafka and strimzi, only strimzi configured
EXTERNAL in the kafka settings to the load balancer.
bitnami-kafka used
headless service, which is not useful outside the k8s network.
Confluent-operator configures to node's IP which makes it accessible
outside k8s, but to those who can reach the EC2 instance via private
IP.

Is HTTPS->HTTP behind load balancer considered secure?

I have a secure web API in the AWS cloud and I'm trying to figure out the best way to put it behind a load balancer without compromising security.
Right now, all communications are conventionally encrypted end-to-end. The API server has a Let's Encrypt certificate, which is used to treat all messages exchanged with clients. Unless the encryption is broken, nobody besides the server and its clients can view the raw contents of messages.
If I start using a load balancer and allow multiple instances of my server to run concurrently, I'll have to give up on LE and use centralized certificate management (e.g. ACM). AWS conveniently supports linking ACM-generated certificates to load balancer HTTPS listeners. This is especially useful for automatic renewal. However, the load balancer would then remove the encryption layer, and all communications with the instances of my server would be decrypted from that point on.
I'm not too comfortable having my raw data traveling in a public cloud. Still, I'd welcome a second opinion on this.
My question therefore is: Is it considered secure to have load balancer strip HTTPS encryption layer and forward all traffic as HTTP to internal server instances?
Since I can guess the answer, I would appreciate any suggestions on how to deploy load balancing securely.
I consider it secure because each AWS VPC is isolated from another.
The traffic of one VPC cannot be captured in another VPC. Of course whether AWS VPC technology is secure remains to be seen as others have said.
Also check out the documentation from EBS about secure end-to-end encryption. It says that:
Terminating secure connections at the load balancer and using HTTP on the backend may be sufficient for your application. Network traffic between AWS resources cannot be listened to by instances that are not part of the connection, even if they are running under the same account.

Amazon-Guard-Duty for my spring boot application running on AWS

I have a spring boot application running in an EC2 instance in AWS. It basically exposes REST endpoints and APIs for other application. Now I want to improve the security measures for my app such as preventing DDoS attacks, requests from malicious hosts and using our own certificates for communications. I came across Amazon guard duty but I don't understand how it will help in securing my app and what are the alternatives? Any suggestions and guidelines are welcomed.
Amazon GuardDuty is simply a security monitoring tool akin to a Intrusion Detection System you may run in a traditional data center. It analyzes logs generated by AWS (CloudTrial, VPC Flows, etc.) and compares them with threat feeds, as well as uses machine learning to discover anomalies. It will alert you to traffic from known malicious hosts, but will not block. To do this you would need to use AWS Web Application Firewall or a 3rd party network appliance.
You get some DDOS protection just by using AWS. All workloads running in AWS are protected against Network and Transport layer attacks by AWS Shield. If you are using CloudFront and Route 53, you also get layer 3 and 4 protections.
You should be able to use your own certificates in AWS in a similar manner to how you would use them anywhere else.

How to set up Tomcat session state in AWS EC2 for failover and security

I am setting up a Tomcat application in EC2. For reliability, I am running two or more instances. If one server goes down, my users should be redirected to the other instance. This suggests that session state should be kept in an external source, or mirrored between the servers.
AWS offers a hosted service, Elasticache, which seems like it would work well. I even found a nice library, memcached-session-manager. However, I soon ran into some issues.
Unless someone can convince me otherwise, I need the session states to be encrypted in transit. Otherwise someone could intercept the network traffic and pretend to be someone else on my site. I don't see any built-in Amazon method to keep traffic off the internet. (Is peering available here?)
The library mentioned earlier does have Redis support with SSL, but it does not support a Redis cluster. Someone put in a pull request for this but it has not been incorporated and this library is a complex build. I may talk myself into living without the cluster, but that puts us back at a single point of failure.
Tomcat is running on EC2 in your VPC, and ElastiCache is in your VPC. Your AWS VPC is an isolated network. Nobody can intercept the traffic between the EC2 and Elasticache servers unless your VPC network becomes compromised in some way.
If you want to use Redis instead, with SSL connections, then I believe at this time you would need a Tomcat Session Manager implementation that uses Jedis. This one uses Jedis, but you would need to upgrade the version of Jedis it uses in order to use SSL connections.

Kafka cluster security for IOT

I am new to the Kafka and want to deploy Kafka Production cluster for IOT. We will be receiving messages from Raspberry Pi over the internet to our Kafka cluster which we will be hosting on AWS.
Now the concern, since we need to open the KAFKA PORT to the outer internet we are opening a way to system threat as it will compromise with the security by opening port to outer world.
Please let me know what can be done so that we can prevent malicious access using KAFKA port over the internet.
Pardon me if I am not clear with the question, do let me know if rephrasing of queation is needed.
Consider using a REST Proxy in front of your Kafka brokers (such as the one from Confluent). Then you can secure your Kafka cluster just as you would secure any REST API exposed to the public internet. This architecture is proven in production for several very large IoT use cases.
There are two ways that are most effective for Kafka Security.
Implement SSL Encryption for Kafka.
Authentication using SASL
You can follow this guide. http://kafka.apache.org/documentation.html#security_sasl