How to properly Load balance between two Spark Controllers - vora

We are attempting to load balance between two Spark Controllers that connect to Vora...
We are capable of connecting and the query gets sent to the controller.
the problem occurs when the result is supposed to be passed back to hana the process hangs and will never finish.
The last lines in logs state :
17/02/14 14:24:12 INFO CommandRouter$$anon$1: Created broadcast 7 from executeSelectTask at CommandRouter.scala:650
17/02/14 14:24:12 INFO CommandRouter$$anon$1: Starting job: executeSelectTask at CommandRouter.scala:650
17/02/14 14:24:12 INFO CommandRouter$$anon$1: Created broadcast 8 from broadcast at DAGScheduler.scala:1008
17/02/14 14:24:14 INFO CommandRouter$$anon$1: Created broadcast 9 from broadcast at DAGScheduler.scala:1008
Is there something specific to be configured for allowing to load balance between the two controllers?

The reason the process hangs forever is because the nodes where the Spark executor jobs are running do not know the hostname of the HANA host and therefore are never able to return the resultset. This must be added to the node's /etc/hosts file.

Related

Kafka Multi broker setup with ec2 machine: Timed out waiting for a node assignment. Call: createTopics

I am trying to setup kafka with 3 broker nodes and 1 zookeeper node in AWS EC2 instances. I have following server.properties for every broker:
kafka-1:
broker.id=0
listeners=PLAINTEXT_1://ec2-**-***-**-17.eu-central-1.compute.amazonaws.com:9092
advertised.listeners=PLAINTEXT_1://ec2-**-***-**-17.eu-central-1.compute.amazonaws.com:9092
listener.security.protocol.map=,PLAINTEXT_1:PLAINTEXT
inter.broker.listener.name=PLAINTEXT_1
zookeeper.connect=ec2-**-***-**-105.eu-central-1.compute.amazonaws.com:2181
kafka-2:
broker.id=1
listeners=PLAINTEXT_2://ec2-**-***-**-43.eu-central-1.compute.amazonaws.com:9093
advertised.listeners=PLAINTEXT_2://ec2-**-***-**-43.eu-central-1.compute.amazonaws.com:9093
listener.security.protocol.map=,PLAINTEXT_2:PLAINTEXT
inter.broker.listener.name=PLAINTEXT_2
zookeeper.connect=ec2-**-***-**-105.eu-central-1.compute.amazonaws.com:2181
kafka-3:
broker.id=2
listeners=PLAINTEXT_3://ec2-**-***-**-27.eu-central-1.compute.amazonaws.com:9094
advertised.listeners=PLAINTEXT_3://ec2-**-***-**-27.eu-central-1.compute.amazonaws.com:9094
listener.security.protocol.map=,PLAINTEXT_3:PLAINTEXT
inter.broker.listener.name=PLAINTEXT_3
zookeeper.connect=ec2-**-***-**-105.eu-central-1.compute.amazonaws.com:2181
zookeeper:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
When I ran following command in zookeeper I see that they are connected
I also telnetted from any broker to other ones with broker port they are all connected
However, when I try to create topic with 2 replication factor I get Timed out waiting for a node assignment
I cannot understand what is incorrect with my setup, I see 3 nodes running in zookeeper, but having problems when creating topic. BTW, when I make replication factor 1 I get the same error. How can I make sure that everything is alright with my cluster?
It's good that telnet checks the port is open, but it doesn't verify the Kafka protocol works. You could use kcat utility for that, but the fix includes
listeners are set to either PLAINTEXT://:9092 or PLAINTEXT://0.0.0.0:9092 for every broker, which means using the same port
Removing the number from the listener mapping and advertised listeners property such that each broker is the same
I'd also recommend looking at using Ansible/Terraform/Cloudformation to ensure you consistently modify the cluster rather than edit individual settings manually

Why Kafka consumer freezes on Node failure while Producer stays unaffected?

I am new to Kafka and trying to create a Kafka cluster with 3 nodes for High Availability.
I have followed this guide and I have done the setup on Google Compute Instance (GCP VMs).
I tried creating topics with different --replication-factor.
Here is an example with replication-factor=3. I have tried the values 1 and 2 as well.
# With replication factor-3
bin/kafka-topics.sh --create \
--bootstrap-server xxx.xx.xx.xxx:9092,yy.yyy.yyy.yy:9092,zzz.zz.zz.zzz:9092 \
--replication-factor 3 --partitions 1 --topic sample-topic
This is what my consumer code looks like:
from kafka import KafkaConsumer
consumer = KafkaConsumer(
'sample-topic',
bootstrap_servers=[
'yyy.yy.yy.yyy'
],
client_id="sample-client-name",
auto_offset_reset="earliest",
group_id="sample-group-name")
for message in consumer:
print(message)
This is what my producer code looks like:
from time import sleep
from kafka import KafkaProducer
producer = KafkaProducer(
bootstrap_servers=[
'xxx.xx.xx.xxx', 'yyy.yy.yy.yyy', 'zzz.zz.zz.zzz'])
for i in range(1000):
message = str.encode("Message: " + str(i))
producer.send('sample-topic', value=message)
print("sent: {}".format(i))
sleep(5)
[xxx.xx.xx.xxx, yyy.yy.yy.yyy, zzz.zz.zz.zzz above are the IP addresses of the VMs]
Initially, the setup works pretty well. Now I start stopping the VMs (always keeping 2 VMs alive. Whenever I want to stop another VM I make sure at least two VMs are alive).
The producer code had the same bootstrap_servers as consumer code. I have tried keeping only 1 and 2 servers inside bootstrap_servers but it fails the same way for at least one of the VM going down( tried keeping all three VMs down one after another, making sure I'll have 2 VMs working).
Kafka consumers freeze because of stopping the VMs (only one VM down at a time, other two are brought up before stopping the targeted VM)
Is there any configuration I am missing? How to make the consumer stay intact like the producer?
If I read that linked post correctly, it doesn't mention that offsets.topic.replication.factor will have to be increased as well.
Otherwise, stopping the broker that holds the single replica will cause consumers fail to commit / lookup offsets

Can Akka Cluster Client Send Messages to Cluster Nodes Not in Initial Contacts?

Using Akka 2.3.14, I'm trying to create an Akka cluster of various services. Until now, I have had all my "services" in one artifact that was clustered across multiple nodes, but now I am trying to break this artifact into multiple services that all exist on the same cluster.
So in breaking this up, we've designed it so that any node on the cluster will first try to connect to the seed nodes. If there is no seed node, it will look to see if it is a candidate to run as a seed node (if it's on the same host that a seed node can be on) in which case it will grab the an open seed node port and become a seed node. So in this sense, any service in the cluster can become the seed node.
At least, that was the idea. Our API into this system running as a separate service implements a ClusterClient into this system. The initialContacts are set to be the same as the seed nodes. The problem is that the only receptionist actors I can send a message to through the ClusterClient are the actors on the seed nodes.
Here is an example if it helps. Let's say I have a String Service and a Double Service, and the receptionist for each service is a StringActor and a DoubleActor respectively. Now lets say I have a Client Service which sends StringMessages and DoubleMessages to the StringActor and DoubleActor
So for simplicity, let's say I have two nodes, server1 and server2 then:
seed-nodes = ["akka.tcp://system#server1:2773", "akka.tcp://system#server2:2773"]
My ClusterClient would be initialize like so:
system.actorOf(
ClusterClient.props(
Set(
system.actorSelection("akka.tcp://system#server1:2773/user/receptionist"),
system.actorSelection("akka.tcp://system#server2:2773/user/receptionist")
)
),
"clusterClient"
)
Here are the scenarios that are happening for me:
If the StringServices start up on both servers first, then DoubleMessages from the Client Service just disappear into the ether.
If the DoubleServices start up on both servers first, then StringMessages from the Client Service just disappear into the ether.
If the StringService starts up first on serverX and the DoubleService starts up first on serverY, then all StringMessages will be sent to serverX and all DoubleMessages will be sent to serverY, which is not as bad as the above case, but it means it's not really scaling.
This isn't what I expected, it's possible it's just a defect in my code, so I would like to know if this IS expected behavior or not. And if not, then is there another Akka concept that could help me with this?
Arguably, I could just make one service type my entry point, like a RoutingService that could accept StringMessages or DoubleMessages, and then send that to the correct service. But if the Client Service can only send messages to the RoutingService instances that are in the initial contacts, then I can't dynamically scale the RoutingService because no matter how many nodes I add the Client Service can only send to the initial contacts.
I'm also thinking about subscribing to ClusterEvents in my Client Service and seeing if I can add and remove initial contacts from my cluster client as nodes are started up in the cluster, but I'm not sure if this is possible, and it feels like there should be a better solution.
This is what I found out upon more troubleshooting, in case it helps anyone else:
The ClusterClient will attempt to connect to the initial contacts in order, and then only sends it's messages across that connection. If you are deploying different services on each node, you will have problems as the messages sent from the ClusterClient will only be sent to the node that it makes its connection to. In this way, you can think of the ClusterClient a legitimate client, it will connect to a URL that you give it, and then continue to communicate with the server through that URL.
Reading the Distributed Workers example, I realized that my Frontend, or in this case my routing service, should actually be part of the cluster, rather than acting as a client. For this I used the DistributedPubSub method instead.

akka cluster slave node not joining seed node

I am working with akka distributed worker template available on typesafe. I am using it to write a backend job which takes data from siebel using soap calls and inserts in mongo. This job is supposed to run once a week for few hours.
Based on the cluster-usage and other documentation on AKKA website, I imported akka-cluster.jar and configured the application configuration file with SEED nodes (akka.cluster.seed-nodes). But when I start the first node (MASTER NODE) with the configuration I mentioned (seed nodes etc), I start getting errors on the server console saying failed to join the seed node which is obvious (as it is the first node and there is nothing to join). Now I start the second node with akka.cluster.seed-nodes configured with the ip-address and port of the process where master node is running. I once again get the errors on the server console.
Now what I do next is - take the first join address of the master actor from the MASTER NODE and set it dynamically in the slave node in the code (construct an Address object and pass it to the actors on the slave node). THIS WORKS!!! If I take the same join address and configure it in the application configuration akka.cluster.seed-nodes, it throws me error and slave doesn't join the cluster.
So I have following questions :-
1. How to configure the akka.cluster.seed-node configuration in application. I could never make it work/count in the configuration.
2. Is there any way to pre-configure the seed nodes in the configuration. As per me trying it out, it looks like the configuration is dynamic i.e. to take the join address of actor on the master node from the logs and configure the slave's seed-node configuration with that address ?
I've had similar problems which were the result of a mismatch between the actor system name in the seed nodes configuration and the actual actor system name created in my code.

NServiceBus Distributor Worker creates a new worker queue

I was playing with Distributor/Worker but while i was restarting the application it was creating a new Worker queue everytime with a unique id.
any clue?? and what is best place to understand more about distributor/worker and their configuration.
When you start the endpoint, you'll see a warning logged, which explains the reason.
2013-08-26 18:56:48,473 [1] WARN NServiceBus.ConfigureDistributor [(null)] <(nu
ll)> - 'MasterNodeConfig.Node' points to a local host name: [localhost]. Worker
input address name is [Orders.Handler.810aa1ea-7eb4-47b3-b639-724c4498a999#SELEN
E]. It is randomly and uniquely generated to allow multiple workers working from
the same machine as the Distributor.
Here's some documentation on the distributors and the explanation of the Scale out sample as well.
http://particular.net/articles/load-balancing-with-the-distributor
http://particular.net/articles/scale-out-sample.
Check out the Hands on Lab (Intermediate lab), which will walk you through the Scale out lab as well. The Scale out lab will show you how to deploy your workers to different machines by keeping the code same and just changing config.
http://particular.net/HandsOnLabs
Hope this helps.