How to process messages outside GCP in a Kafka server running on GCP - google-cloud-platform

I have been trying to run a consumer in my local machine connecting to a Kafka server running inside GCP.
Kafka and Zookeeper is running on the same GCP VM instance
Step 1: Start Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
Step 2: Start Kafka
bin/kafka-server-start.sh config/server.properties
If I run a consumer inside the GCP VM instance it works fine:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
I verified the firewall rules, and I have access from my local machine, I can reach both the public IP and the port the Kafka server is running on.
I tested many options, changing the server.properties of kafka, for example:
advertised.host.name=public-ip
or
advertised.listeners=public-ip
Following the answer on connecting-kafka-running-on-ec2-machine-from-my-local-machine without success.

From the official documentation:
advertised.listeners
Listeners to publish to ZooKeeper for clients to use. In IaaS environments, this may
need to be different from the interface to which the broker binds. If
this is not set, the value for listeners will be used. Unlike
listeners it is not valid to advertise the 0.0.0.0 meta-address.
After testing many different options, this solution worked for me:
Setting up two listeners, one EXTERNAL with the public IP, and one INTERNAL with the private IP:
# Configure protocol map
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
# Use plaintext for inter-broker communication
inter.broker.listener.name=INTERNAL
# Specify that Kafka listeners should bind to all local interfaces
listeners=INTERNAL://0.0.0.0:9027,EXTERNAL://0.0.0.0:9037
# Separately, specify externally visible address
advertised.listeners=INTERNAL://localhost:9027,EXTERNAL://kafkabroker-n.mydomain.com:9093
Explanation:
In many scenarios, such as when deploying on AWS, the externally
advertised addresses of the Kafka brokers in the cluster differ from
the internal network interfaces that Kafka uses.
Also remember to set up your firewall rule to expose the port on the EXTERNAL listener in other to connect to it from an external machine.
Note: It's important to restrict access to authorized clients only.
You can use network firewall rules to restrict access. This guidance
applies to scenarios that involve both RFC 1918 and public IP;
however, when using public IP addresses, it's even more important to
secure your Kafka endpoint because anyone can access it.
Taken from google solutions.

Related

How 2 services can talk to each other on AWS Fargate?

I setup a Fargate cluster on AWS. My cluster has the following services:
server-A (port 3000)
server-B (port 4000)
Each service is in the same VPC and have the same security group (any ports, any source, any destination). The VPC is isolated from internet.
Now, I want server-A to send a http query to server-B. I would assume that, as in Docker swarm, there is a private DNS that maps the service name to its private IP, and it would be as simple as sending the query to: http://server-B:4000. However, server-A gets a timeout, which means it can't reach server-B.
I've read in the documentation that I can put the 2 containers in the same service, each container listening on a different port, so that, thanks to the loopback interface, from server-A, I could query http://127.0.0.1:4000 and server-B will respond, and vice-versa.
However, I want to be able to scale server-A and server-B independently, so I think it makes sense to keep each server independant from each other by having 2 services.
I've read that, for 2 tasks to talk to each other, I need to setup a load balancer. Coming from the world of Docker Swarm, it was so easy to query the services by their service name, and behind the scene, the request was forwarded to one of the containers in that service. But it doesn't seem to work like that on AWS Fargate.
Questions:
how can server-A talk to server-B?
As service sometimes redeploy, their private IP changes, so it makes no sense to query by IP, querying by hostname seems the most natural way
Do I need to setup any kind of internal DNS?
Thanks for your help, I am really lost on doing this simple setup.
After searching, I found out it was due to the fact that I was not enabling "Service Discovery" during the service creation, so no private DNS was created. Here is some additional documentation which explains exactly the steps:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service-discovery.html

How to connect to Kafka in Kubernetes from outside?

The challenge is to create a Kafka producer that connects to a Kafka cluster that lives within a Kubernetes cluster from outside that cluster. We have several RDBMS databases that sit on premise and we want to stream data directly to Kafka that lives in Kubernetes on AWS. We have tried a few things and deployed the Confluent Open Source Platform but nothing worked so far. Does anyone have a clear answer to this problem?
You might have a look at deploying Kafka Connect inside of Kubernetes. Since you want to replicate data from various RDMBS databases, you need to setup source connectors,
A source connector ingests entire databases and streams table updates
to Kafka topics. It can also collect metrics from all of your
application servers into Kafka topics, making the data available for
stream processing with low latency.
Depending on your source databases, you'd have to configure the corresponding connectors.
If you are not familiar with Kafka Connect, this article might be quite helpful as it explains the key concepts.
Kafka clients need to connect to specific node to produce or consume messages.
The kafka protocol can connect to any node to get metadata. Then the client connects to a specific node which has been elected as leader of the partition which the client wants to produce/consume.
Each kafka pod has to be individually accessible, so you need a L4 load balancer per pod. The advertised listener config can be set in the kafka config to advertise different IP/hostname for internal and external clients. Configure the ADVERTISED_LISTENERS EXTERNAL to use the load balancer, and the INTERNAL to use pod IP. The ports has to be different for internal and external.
Checkout https://strimzi.io/, https://bitnami.com/stack/kafka, https://github.com/confluentinc/cp-helm-charts
Update:
Was trying out installing kafka in k8s running in AWS EC2. Between
confluent-operator, bitnami-kafka and strimzi, only strimzi configured
EXTERNAL in the kafka settings to the load balancer.
bitnami-kafka used
headless service, which is not useful outside the k8s network.
Confluent-operator configures to node's IP which makes it accessible
outside k8s, but to those who can reach the EC2 instance via private
IP.

How to configure activemq-replicatedLevelDB to configure instance to connect to specific port of master/slave

I'm new with activemq-replicatedLevelDB so I might assumed things wrong based on my limited understanding.
I'm setting up 3 activemq instances with zookeeper which then determine which among the activemq instances is the master in AWS. Zookeeper are deployed within a private subnet and activemq are deployed within a public subnet, there's no problem with zookeeper and activemq communication.
For security purposes:
Question/Issue: I can't find where I can configure the activemq intances to which port should these activemq instances communicate with each other.
Why the issue: I need to restrict the available ports that are open of these activemq instances. And I cannot simply allow all access coming from public subnet
example below of port restrictions
port 22 should be open for ssh access
zookeeper client port (2181) should be open only for access coming
from these activemq instances
port 8161 should be accessible from specific sources
I am using security group to restrict these accesses in AWS. I tried allowing all ports accessible wihtin the public subnet which allows activemq to know that other activemq instances are alive, and they were capable of electing master/slaves. The port 45818 is not the same port after every setup from scratch. So I assume this is random.
sample logs below
Promoted to master
Using the pure java LevelDB implementation.
Master started: tcp://**.*.*.**:45818
Once I removed that port setup(allow all access), I got the below stacktrace
Not enough cluster members have reported their update positions yet.
org.apache.activemq.leveldb.replicated.MasterElector
If my understanding of the stacktrace above is right, it tells that the current activemq does not know the existence of other activemq instances. So I needed to know how I can configure the port of these activemq when checking of other activemq instances so I can restrict/allow access.
Here is the configuration of my activemq that points to zookeeper addresses. Other configuration are on default values.
activemq version: 5.13.4
<persistenceAdapter>
<replicatedLevelDB directory="activemq-data"
replicas="3"
bind="tcp://0.0.0.0:0"
zkAddress="testzookeeperip1:2181,testzookeeperip2:2181,testzookeeperip3:2181"
hostname="testhostnameofactivemqinstance"
/>
</persistenceAdapter>
Should there any information lacking, I'll update this question asap. thanks
This is rather a hint than a qualified answer, but too large for comment.
You configured dynamic ports with bind="tcp:0.0.0.0:0". I haven't used a fixed port on this configuration setting, but configuration doc says, you can set it.
The bind port will be used for the replication protocol with the master, so obviously, you cannot cut it off, but it should be ok to allow only the zk machines to communicate there.
I have not analyzed traffic between the brokers, but as I understand replicated LevelDB, the ZK decides over the active master, not the brokers. So there should be no communication between the brokers on that port.
The external broker address is configured on the transportConnectors element in the <broker> section of the config file, but I guess you already have that covered.
I suggest, you configure the bind to a fixed port and allow communication to that port from the ZK and if required from the cluster partners. Clients have only access to the transport ports. Allow communication to the ZKs and that should be it.

zookeeper installation on multiple AWS EC2instances

I am new to zookeeper and aws EC2. I am trying to install zookeeper on 3 ec2 instances.
as per zookeeper document, I have installed zookeeper on all 3 instances, created zoo.conf and add below configuration:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zookeeper/data
clientPort=2181
server.1=localhost:2888:3888
server.2=<public ip of ec2 instance 2>:2889:3889
server.3=<public ip of ec2 instance 3>:2890:3890
also I have created myid file on all 3 instances as /opt/zookeeper/data/myid
as per guideline..
I have couple of queries as below:
whenever I am starting zookeeper server on each instance, it will start in standalone mode.(as per logs)
can above configuration is really gonna connect to each other? port 2889:3889 & 2890:38900 - what these port all about. can I need to configure it on ec2 machine or I need to give some other port against it?
Is I need to create security group to open these connection? I am not sure how to do it in ec2 instance.
How to confirm all 3 zookeeper has started and they can communicate with each other?
The ZooKeeper configuration is designed such that you can install the exact same configuration file on all servers in the cluster without modification. This makes ops a bit simpler. The component that specifies the configuration for the local node is the myid file.
The configuration you've defined is not one that can be shared across all servers. All of the servers in your server list should be binding to a private IP address that is accessible to other nodes in the network. You're seeing your server start in standalone mode because you're binding to localhost. So, the problem is the other servers in the cluster can't see localhost.
Your configuration should look more like:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zookeeper/data
clientPort=2181
server.1=<private ip of ec2 instance 1>:2888:3888
server.2=<private ip of ec2 instance 2>:2888:3888
server.3=<private ip of ec2 instance 3>:2888:3888
The two ports listed in each server definition are respectively the quorum and election ports used by ZooKeeper nodes to communicate with one another internally. There's usually no need to modify these ports, and you should try to keep them the same across servers for consistency.
Additionally, as I said you should be able to share that exact same configuration file across all instances. The only thing that should have to change is the myid file.
You probably will need to create a security group and open up the client port to be available for clients and the quorum/election ports to be accessible by other ZooKeeper servers.
Finally, you might want to look in to a UI to help manage the cluster. Netflix makes a decent UI that will give you a view of your cluster and also help with cleaning up old logs and storing snapshots to S3 (ZooKeeper takes snapshots but does not delete old transaction logs, so your disk will eventually fill up if they're not properly removed). But once it's configured correctly, you should be able to see the ZooKeeper servers connecting to each other in the logs as well.
EDIT
#czerasz notes that starting from version 3.4.0 you can use the autopurge.snapRetainCount and autopurge.purgeInterval directives to keep your snapshots clean.
#chomp notes that some users have had to use 0.0.0.0 for the local server IP to get the ZooKeeper configuration to work on EC2. In other words, replace <private ip of ec2 instance 1> with 0.0.0.0 in the configuration file on instance 1. This is counter to the way ZooKeeper configuration files are designed but may be necessary on EC2.
Adding additional info regarding Zookeeper clustering inside Amazon's VPC.
Solution with VPC's public IP addres should be preferable solution since Zookeeper and using '0.0.0.0' should be your last option.
In case when you are using docker in your EC2 instance '0.0.0.0' will not work properly with Zookeeper 3.5.X after node restart.
The issue lies in resolving '0.0.0.0' and ensemble sharing of node addresses and SID order (if you will start your nodes in descending order, this issue may not occur).
So far the only working solution is to upgrade to 3.6.2+ version.

How to connect hornetq on AWS VPC from another vm on AWS

I have 2 VMs on AWS. On the first VM I have hornet and application that send messages to hornet. On another VM I have application that is a consumer of hornet.
The consumer fails to pull messages from hornet, and I can't understand why. Hornetq is running, I opened to ports to any IP.
I tried to connect hornet with jconsole (on my local computer) and failed, so I can't see if the hornet has any consumers/ suppliers.
I've tried to change 'bind' configurations to 0.0.0.0 but when I restarted hornet they were automatically changed to what I have as server IP in config.properties.
Any suggestions what might be the problem that I failed to connect my application to the hornetq?
Thanks!
These are the things you need to check for the connectivity between VMs in VPC.
The Security- Group of the instance has both Ingress-Egress Configuration settings unlike the traditional EC2 Security Group [ now Classic EC2 ]. Check the Egress from your Consumer and ingress to the Server
If the instances are in different Subnets you need to check for the ACL as well; however the default setting would be allow.
Check if the iptables / OS level firewall which are blocking.
With respect to the connectivity failed from your local machine to Hornetq - you need to place the Instance in Public sub and configure the Instance's SG accordingly; only the app / VM would accessible to public internet
I have assumed that both the instances are in the Same VPC. However the title of the post sounds slightly misleading - if it is 2 different VPCs altogether, then new concept of VPC Peering also comes in