Bootstrap IP to internal address conversion - amazon-web-services

I am new to Kafka and I setup an instance in aws. runs well.
then I created another aws instance and run the codes:
See image here
it can print out messages that I published to kafka
If I ran the same codes in the kafka server itself, I can also get messages.
However, if I run the same codes in my own laptop, I cant get anything.
I thought it might be the codes so I used kafka's own client in my laptop:
bin/kafka-console-consumer.sh --topic test22 --bootstrap-server 34.215.180.111:9092
Now I got an error:
2021-05-11 16:21:32,252] WARN [Consumer clientId=consumer-console-consumer-94326-1, groupId=console-consumer-94326] Error connecting to node ip-172-31-29-222.us-west-2.compute.internal:9092 (id: 0 rack: null) (org.apache.kafka.clients.NetworkClient)
ip-172-31-29-222.us-west-2.compute.internal
this piece of name is actually the AWS instance's internal address:
See image here
Then I thought it might be Amazon's issue so I repeated the whole process in Google Cloud and got the same results:
[2021-05-11 17:15:34,840] WARN [Consumer clientId=consumer-console-consumer-2377-1, groupId=console-consumer-2377] Error connecting to node instance-1.us-central1-a.c.seventh-seeker-267203.internal:9092 (id: 0 rack: null) (org.apache.kafka.clients.NetworkClient)
These internal addresses can not be accessed from external computers at all.
Can anybody help? thanks!

The logs are showing you the advertised.listeners of the brokers. If you want that to be different in order to connect, you'll need to modify that property such that the brokers have resolvable addresses for the clients
https://www.confluent.io/blog/kafka-listeners-explained/

Related

Kafka Multi broker setup with ec2 machine: Timed out waiting for a node assignment. Call: createTopics

I am trying to setup kafka with 3 broker nodes and 1 zookeeper node in AWS EC2 instances. I have following server.properties for every broker:
kafka-1:
broker.id=0
listeners=PLAINTEXT_1://ec2-**-***-**-17.eu-central-1.compute.amazonaws.com:9092
advertised.listeners=PLAINTEXT_1://ec2-**-***-**-17.eu-central-1.compute.amazonaws.com:9092
listener.security.protocol.map=,PLAINTEXT_1:PLAINTEXT
inter.broker.listener.name=PLAINTEXT_1
zookeeper.connect=ec2-**-***-**-105.eu-central-1.compute.amazonaws.com:2181
kafka-2:
broker.id=1
listeners=PLAINTEXT_2://ec2-**-***-**-43.eu-central-1.compute.amazonaws.com:9093
advertised.listeners=PLAINTEXT_2://ec2-**-***-**-43.eu-central-1.compute.amazonaws.com:9093
listener.security.protocol.map=,PLAINTEXT_2:PLAINTEXT
inter.broker.listener.name=PLAINTEXT_2
zookeeper.connect=ec2-**-***-**-105.eu-central-1.compute.amazonaws.com:2181
kafka-3:
broker.id=2
listeners=PLAINTEXT_3://ec2-**-***-**-27.eu-central-1.compute.amazonaws.com:9094
advertised.listeners=PLAINTEXT_3://ec2-**-***-**-27.eu-central-1.compute.amazonaws.com:9094
listener.security.protocol.map=,PLAINTEXT_3:PLAINTEXT
inter.broker.listener.name=PLAINTEXT_3
zookeeper.connect=ec2-**-***-**-105.eu-central-1.compute.amazonaws.com:2181
zookeeper:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
When I ran following command in zookeeper I see that they are connected
I also telnetted from any broker to other ones with broker port they are all connected
However, when I try to create topic with 2 replication factor I get Timed out waiting for a node assignment
I cannot understand what is incorrect with my setup, I see 3 nodes running in zookeeper, but having problems when creating topic. BTW, when I make replication factor 1 I get the same error. How can I make sure that everything is alright with my cluster?
It's good that telnet checks the port is open, but it doesn't verify the Kafka protocol works. You could use kcat utility for that, but the fix includes
listeners are set to either PLAINTEXT://:9092 or PLAINTEXT://0.0.0.0:9092 for every broker, which means using the same port
Removing the number from the listener mapping and advertised listeners property such that each broker is the same
I'd also recommend looking at using Ansible/Terraform/Cloudformation to ensure you consistently modify the cluster rather than edit individual settings manually

Redislabs UI logging error when number of nodes more than one

I am new at Redis Enterprise and can't fix this problem:
I have a Redis Enterprise cluster (v.6.0) in AWS with two nodes. When I have only one node I can enter UI, but after adding other (second) nodes always throws me out to the login page after entering credentials. Meanwhile, the cluster works fine (information is taken from rladmin).
In what direction I should investigate the issue?
P.S.: Can this error from logs cause an issue?
ERROR redis_mgr MainThread: Connect failed: connect: connection failed: Error 2 connecting to unix socket: /var/opt/redislabs/run/ccs.sock. No such file or directory.: retrying
Possibly, this solution will help anybody:
the reason was that ALB before UI didn't use sticky sessions.
the solution was to enable a sticky session and it works.

Greengrass_HelloWorld lambda doesn't publish to Amazon IoT console

I have been following the documentation in every step, and I didn't face any errors. Configured, deployed and made a subscription to hello/world topic just as the documentation detailed. However, when I arrived at the testing step here: https://docs.aws.amazon.com/greengrass/latest/developerguide/lambda-check.html
No messages were showing up on the IoT console (subscription view hello/world)! I am using Greengrass core daemon which runs on my Ubuntu machine, it is active and listens to port 8000. I don't think there is anything wrong with my local device because the group was deployed successfully and because I see the communications going both ways on Wireshark.
I have these logs on my machine: /home/##/Desktop/greengrass/ggc/var/log/system/runtime.log:
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Version: 1.9.3-RC3
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Root: /home/##/Desktop/greengrass
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Write Directory: /home/##/Desktop/greengrass/ggc
[2019-09-28T06:57:42.492-07:00][INFO]-Group File Directory: /home/##/Desktop/greengrass/ggc/deployment/group
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda UID: 122
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda GID: 127
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.492-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.493-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.494-07:00][INFO]-No proxy URL found.
[2019-09-28T06:57:42.495-07:00][INFO]-Started Deployment Agent to listen for updates. [2019-09-28T06:57:42.495-07:00][INFO]-Connecting with MQTT. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.497-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection successful. {"attemptId": "GVko", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection established. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection connected. Start subscribing. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Deployment agent connected to cloud.
[2019-09-28T06:57:42.685-07:00][INFO]-Start subscribing. {"numOfTopics": 2, "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/update/delta
[2019-09-28T06:57:42.727-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/get/accepted
[2019-09-28T06:57:42.814-07:00][INFO]-All topics subscribed. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:58:57.888-07:00][INFO]-Daemon received signal: terminated. [2019-09-28T06:58:57.888-07:00][INFO]-Shutting down daemon.
[2019-09-28T06:58:57.888-07:00][INFO]-Stopping all workers.
[2019-09-28T06:58:57.888-07:00][INFO]-Lifecycle manager is stopped.
[2019-09-28T06:58:57.888-07:00][INFO]-IPC server stopped.
/home/##/Desktop/greengrass/ggc/var/log/system/localwatch/localwatch.log:
[2019-09-28T06:57:42.491-07:00][DEBUG]-will keep the log files for the following lambdas {"readingPath": "/home/##/Desktop/greengrass/ggc/var/log/user", "lambdas": "map[]"}
[2019-09-28T06:57:42.492-07:00][WARN]-failed to list the user log directory {"path": "/home/##/Desktop/greengrass/ggc/var/log/user"}
Thanks in advance.
I had a similar issue on another platform (Jetson Nano). I could not get a response after going through the AWS instructions for setting up a simple Lambda using IOT Greengrass. In my search for answers I discovered that AWS has a qualification test script for any device you connect.
It goes through an automated process of deploying and testing a lambda function(as well as other functionality) and reports results for each step and docs provide troubleshooting info for failures.
By going through those tests I was able to narrow down the issues with my setup, installation, and configuration. The testing docs give pointers to troubleshoot test results. Here is a link to the test: https://docs.aws.amazon.com/greengrass/latest/developerguide/device-tester-for-greengrass-ug.html
If you follow the 'Next Topic' links, it will take you through the complete test. Let me warn you that its extensive, and will take some time, but for me it gave a lot of detailed insight that a hello world does not.

How to deploy Kafka on Google cloud

I deployed Kafka on Google cloud, I changed listeners to
PLAINTEXT://[internal ip address]:9092
And when I try
sudo ./bin/kafka-topics.sh --list --zookeeper [external IP address]:2181
I can get the topic on the broker. However when I try to produce message to the Kafka broker
sudo ./bin/kafka-console-producer.sh --broker-list [external IP address]:9092
--topic test
following error shows up:
ERROR Error when sending message to topic test with key: null, value:
5 bytes with error:
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s)
for test-0: 1506 ms has passed since batch creation plus linger time
I wonder what properties did I set wrong and how to fix it?
You need to set advertised.listeners to the external IP so that clients can correctly connect to it. Otherwise they'll try to connect to the internal IP (since advertised.listeners will default to listeners unless explicitly set)
Ref: https://kafka.apache.org/documentation/#brokerconfigs

Spark - Remote Akka Client Disassociated

I am setting up Spark 0.9 on AWS and am finding that when launching the interactive Pyspark shell, my executors / remote workers are first being registered:
14/07/08 22:48:05 INFO cluster.SparkDeploySchedulerBackend: Registered executor:
Actor[akka.tcp://sparkExecutor#ip-xx-xx-xxx-xxx.ec2.internal:54110/user/
Executor#-862786598] with ID 0
and then disassociated almost immediately, before I have the chance to run anything:
14/07/08 22:48:05 INFO cluster.SparkDeploySchedulerBackend: Executor 0 disconnected,
so removing it
14/07/08 22:48:05 ERROR scheduler.TaskSchedulerImpl: Lost an executor 0 (already
removed): remote Akka client disassociated
Any idea what might be wrong? I've tried adjusting the JVM options spark.akka.frameSize and spark.akka.timeout, but I'm pretty sure this is not the issue since (1) I'm not running anything to begin with, and (2) my executors are disconnecting a few seconds after startup, which is well within the default 100s timeout.
Thanks!
Jack
I had a very similar problem, if not the same.
It started to work for me once the workers were connecting to master by using the very same name as the master thought it had.
My log messages were something like:
ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker#idc1-hrm1.heylinux.com:7078] -> [akka.tcp://sparkMaster#vagrant-centos64.vagrantup.com:7077]: Error [Association failed with [akka.tcp://sparkMaster#vagrant-centos64.vagrantup.com:7077]].
ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker#192.168.121.127:7078] -> [akka.tcp://sparkMaster#idc1-hrm1.heylinux.com:7077]: Error [Association failed with [akka.tcp://sparkMaster#idc1-hrm1.heylinux.com:7077]]
WARN util.Utils: Your hostname, idc1-hrm1 resolves to a loopback address: 127.0.0.1; using 192.168.121.187 instead (on interface eth0)
So check the log of the master and see what name it thinks it has.
Then use that very same name on the workers.