KAFKA consumer setup behind a proxy on prem. Producer is in AWS - amazon-web-services

We have a KAFKA setup in AWS and we publish message from the publisher there, we want to consume those messages through Kafka consumer from on-prem boxes which have access to the internet through a proxy. Is there any setup in the KAFKA consumer so that we can update the proxy detail.
Note: We are able to connect and get a message from a local box (Kafka consumer) which has direct access to the internet (without proxy).

You need to set advertised.listeners to the external IP so that clients can correctly connect to it. Otherwise they'll try to connect to the internal IP (since advertised.listeners will default to listeners unless explicitly set)
Ref: https://kafka.apache.org/documentation/#brokerconfigs

Related

Notify all EC2 instances running in ASG

I've a microservice application that has multiple instances running in ASG. All these applications maintains some internal state. This application exposes Actuator endpoints to refresh it's state. I've some applications which are running on-prem. The scenario is, On some event, I want to call those Actuator endpoints of applications running in AWS to refresh their state. The problem is, If I call LoadBalanced url, then call would go to only one instance. So, I'm thinking of below solutions.
Use SQS and let on-prem ap publish and AWS app consume that message. But here also, only one instance will receive the message.
Use SNS but listeners are http/s based so URL would remain same so I think only one instance would receive the message. (AFAIK)
Any other solution? Please suggest.
Thanks
Use SNS but listeners are http/s based so URL would remain same so I
think only one instance would receive the message. (AFAIK)
When using SNS each server would subscribe to the SNS topic, and when each server subscribes it would provide SNS with its direct HTTP(s) URL (not the load balancer URL). When SNS receives a message it would send it to each server that is currently subscribed. I'm not sure SNS will submit the request to the actuator endpoint in the correct format that your application needs though.
There are likely several solutions you could consider, including ones that won't require a code change. Such as establishing a VPN connection between your on-premise applications and the VPC that contains your ASGs, which would allow you to invoke each machine's refresh endpoint by it's unique private ip address.
However, more simply, if you're using an AWS Classic ELB or ALB, than repeated calls to the load balancer url should hit each machine running your application if enough calls to the refresh endpoint are made.
Although this may not meet your use case, say if you must strictly limit refresh calls to 1 time per endpoint. You'd have to experiment with your software and the load balancer's round-robin behavior.

How to process messages outside GCP in a Kafka server running on GCP

I have been trying to run a consumer in my local machine connecting to a Kafka server running inside GCP.
Kafka and Zookeeper is running on the same GCP VM instance
Step 1: Start Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
Step 2: Start Kafka
bin/kafka-server-start.sh config/server.properties
If I run a consumer inside the GCP VM instance it works fine:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
I verified the firewall rules, and I have access from my local machine, I can reach both the public IP and the port the Kafka server is running on.
I tested many options, changing the server.properties of kafka, for example:
advertised.host.name=public-ip
or
advertised.listeners=public-ip
Following the answer on connecting-kafka-running-on-ec2-machine-from-my-local-machine without success.
From the official documentation:
advertised.listeners
Listeners to publish to ZooKeeper for clients to use. In IaaS environments, this may
need to be different from the interface to which the broker binds. If
this is not set, the value for listeners will be used. Unlike
listeners it is not valid to advertise the 0.0.0.0 meta-address.
After testing many different options, this solution worked for me:
Setting up two listeners, one EXTERNAL with the public IP, and one INTERNAL with the private IP:
# Configure protocol map
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
# Use plaintext for inter-broker communication
inter.broker.listener.name=INTERNAL
# Specify that Kafka listeners should bind to all local interfaces
listeners=INTERNAL://0.0.0.0:9027,EXTERNAL://0.0.0.0:9037
# Separately, specify externally visible address
advertised.listeners=INTERNAL://localhost:9027,EXTERNAL://kafkabroker-n.mydomain.com:9093
Explanation:
In many scenarios, such as when deploying on AWS, the externally
advertised addresses of the Kafka brokers in the cluster differ from
the internal network interfaces that Kafka uses.
Also remember to set up your firewall rule to expose the port on the EXTERNAL listener in other to connect to it from an external machine.
Note: It's important to restrict access to authorized clients only.
You can use network firewall rules to restrict access. This guidance
applies to scenarios that involve both RFC 1918 and public IP;
however, when using public IP addresses, it's even more important to
secure your Kafka endpoint because anyone can access it.
Taken from google solutions.

Web hook listener in AWS Lambda

I am writing a simple monitoring system for one of our existing production system. The system being monitored is a SMPP gateway. The basic requirement is to send a message to the SMPP gateway at a given frequency and receive the message via a web hook. This is so to ensure that the SMPP gateway is functioning as expected else email alarms are triggered.
This is the flow my program:
Connect to SMPP gateway
Start a web hook listener on a new thread (server)
Send a test message
Listen for incoming web hooks and notify the parent thread via events
If message web hook was received, exit gracefully, else trigger email alarm.
I have implemented this system in AWS Lambda and assigned a elastic IP by placing the Lambda function inside a VPC. I am able to send the message to SMPP gateway and the gateway is attempting to respond via web hook. But unfortunately, the server can't reach the web hook listener via the specified elastic IP. I searched around and figured that one way to implement web hook listener in AWS Lambda is by using an API gateway trigger. This is not use because this will not gaurantee that the same Lambda instance which sent the message via SMPP will receive the web hook request.
So my question is, is it possible to run a web hook listener in AWS Lambda and receive requests via an attached elastic IP?
No, it is not possible to run a web hook listener in AWS Lambda and receive requests via an attached elastic IP.
Lambda functions inside a VPC make outbound requests to the Internet using an Elastic IP attached to a NAT Gateway, via an ENI associated with the container host. Neither the ENI nor the EIP are exclusively bound to one single Lambda invocation. Lambda functions are technically allowed to listen for inbound connections... but they will never arrive via the ENI, and the NAT Gateway is also specifically designed not to allow connections initiated from outside to make their way back in. So there are at least two layers of the design that prevent what you are attempting from being done in this way.

Limit port 8080 access to SNS

I am using SNS to connect to a Tomcat server on port 8080. The server runs on AWS/EC2. It is not a public server, I use it only to execute my code (triggered by the Notification delivered to it.)
How can I set up the "inbound rules" on my EC2 box so that only SNS can reach it? When I block the port, messages are not delivered. If I restrict it to the EC2 internal or external IP, the same. Apparently Notifications are delivered from "somewhere" that is not documented. And/or it is not a fixed IP [range]?
[I know how to secure the Tomcat server itself, but it would be nice if random port-scans can't even get to the server. I do see a number of (so far unsuccessful) access attempts in the Tomcat log.]

Diagnosing Kafka Connection Problems

I have tried to build as much diagnostics into my Kafka connection setup as possible, but it still leads to mystery problems. In particular, the first thing I do is use the Kafka Admin Client to get the clusterId, because if this operation fails, nothing else is likely to succeed.
def getKafkaClusterId(describeClusterResult: DescribeClusterResult): Try[String] = {
try {
val clusterId = describeClusterResult.clusterId().get(futureTimeout.length / 2, futureTimeout.unit)
Success(clusterId)
} catch {
case cause: Exception =>
Failure(cause)
}
}
In testing this usually works, and everything is fine. It generally only fails when the endpoint is not reachable somehow. It fails because the future times out, so I have no other diagnostics to go by. To test these problems, I usually telnet to the endpoint, for example
$ telnet blah 9094
Trying blah...
Connected to blah.
Escape character is '^]'.
Connection closed by foreign host.
Generally if I can telnet to a Kafka broker, I can connect to Kafka from my server. So my questions are:
What does it mean if I can reach the Kafka brokers via telnet, but I cannot connect via the Kafka Admin Client
What other diagnostic techniques are there to troubleshoot Kafka broker connection problems?
In this particular case, I am running Kafka on AWS, via a Docker Swarm, and trying to figure out why my server cannot connect successfully. I can see in the broker logs when I try to telnet in, so I know the brokers are reachable. But when my server tries to connect to any of 3 brokers, the logs are completely silent.
This is a good article that explains the steps that happens when you first connect to a Kafka broker
https://community.hortonworks.com/articles/72429/how-kafka-producer-work-internally.html
If you can telnet to the bootstrap server then it is listening for client connections and requests.
However clients don't know which real brokers are the leaders for each of the partitions of a topic so the first request they always send to a bootstrap server is a metadata request to get a full list of all the topic metadata. The client uses the metadata response from the bootstrap server to know where it can then make new connections to each of Kafka brokers with the active leaders for each topic partition of the topic you are trying to produce to.
That is where your misconfigured broker problem comes into play. When you misconfigure the advertised.listener port the results of the first metadata request are redirecting the client to connect to unreachable IP addresses or hostnames. It's that second connection that is timing out, not the first one on the port you are telnet'ing into.
Another way to think of it is that you have to configure a Kafka server to work properly as both a bootstrap server and a regular pub/sub message broker since it provides both services to clients. Yours are configured correctly as a pub/sub server but incorrectly as a bootstrap server because the internal and external ip addresses are different in AWS (also in docker containers or behind a NAT or a proxy).
It might seem counter intuitive in small clusters where your bootstrap servers are often the same brokers that the client is eventually connecting to but it is actually a very helpful architectural design that allow kafka to scale and to failover seamlessly without needing to provide a static list of 20 or more brokers on your bootstrap server list, or maintain extra load balancers and health checks to know onto which broker to redirect the client requests.
If you do not configure listeners and advertised.listeners correctly, basically Kafka just does not listen. Even though telnet is listening on the ports you've configured, the Kafka Client Library silently fails.
I consider this a defect in the Kafka design which leads to unnecessary confusion.
Sharing Anand Immannavar's answer from another question:
Along with ADVERTISED_HOST_NAME, You need to add ADVERTISED_LISTENERS to container environment.
ADVERTISED_LISTENERS - Broker will register this value in zookeeper and when the external world wants to connect to your Kafka Cluster they can connect over the network which you provide in ADVERTISED_LISTENERS property.
example:
environment:
- ADVERTISED_HOST_NAME=<Host IP>
- ADVERTISED_LISTENERS=PLAINTEXT://<Host IP>:9092