akka cluster slave node not joining seed node - akka

I am working with akka distributed worker template available on typesafe. I am using it to write a backend job which takes data from siebel using soap calls and inserts in mongo. This job is supposed to run once a week for few hours.
Based on the cluster-usage and other documentation on AKKA website, I imported akka-cluster.jar and configured the application configuration file with SEED nodes (akka.cluster.seed-nodes). But when I start the first node (MASTER NODE) with the configuration I mentioned (seed nodes etc), I start getting errors on the server console saying failed to join the seed node which is obvious (as it is the first node and there is nothing to join). Now I start the second node with akka.cluster.seed-nodes configured with the ip-address and port of the process where master node is running. I once again get the errors on the server console.
Now what I do next is - take the first join address of the master actor from the MASTER NODE and set it dynamically in the slave node in the code (construct an Address object and pass it to the actors on the slave node). THIS WORKS!!! If I take the same join address and configure it in the application configuration akka.cluster.seed-nodes, it throws me error and slave doesn't join the cluster.
So I have following questions :-
1. How to configure the akka.cluster.seed-node configuration in application. I could never make it work/count in the configuration.
2. Is there any way to pre-configure the seed nodes in the configuration. As per me trying it out, it looks like the configuration is dynamic i.e. to take the join address of actor on the master node from the logs and configure the slave's seed-node configuration with that address ?

I've had similar problems which were the result of a mismatch between the actor system name in the seed nodes configuration and the actual actor system name created in my code.

Related

Difference between coordinator node and the contact point of C++ driver in Cassandra?

When using C++ driver to create APIs for interacting with Cassandra, the C++ program has to be provided a comma separated list, which contains the IP addresses of the nodes which the driver can use as the contact point(cass_cluster_set_contact_points) to the database. I wanted to understand the role of this contact point and if it plays a different role than the coordinator node i.e. is the contact point and the coordinator node one and the same thing.
Also when we are executing, say, a multi-threaded program, for executing several queries, is the coordinator node/contact-point selected for each query or is it just selected at the beginning and then that node is fixed as the coordinator node throughout the execution of the program.
The contact endpoints simply serve as a way for your driver to discover the cluster. You really only need to provide two or three, and the driver will figure out the remaining endpoints via gossip.
When you connect, it is a good idea is to use the TokenAwareLoadBalancingPolicy. That will cause any query filtering on a partition key to bypass the need for a coordinator node, and route directly to the node which is responsible for the required data.
If a query is not filtering on a partition key, or if it is a multi-key query, then an exact node cannot be determined. At that point, your backup load balancing policy (the TokenAwareLoadBalancingPolicy takes a backup policy as an argument) will be used to determine a coordinator node. If I remember right the DCAwareRoundRobinLoadBalancingPolicy is the default.
In summary, the connection endpoints only serve for cluster discovery. The coordinator node is chosen at query-time, based on algorithms used in your load balancing policy.
The contact points, which are mentioned in the program are used by the cluster connection setup. The gossip will fetch the entire cluster connection setup based on those ip addresses. The gossip will maintain the ip addresses and other properties of the node and it will always looks whether there is change in the setup and will update everytime if the configuration changes.
If a write or read request happens on ip1, with cluster connected to hosts as ip1, ip2 & ip3, then ip1 is the co-ordinator for this particular operation. It acts as a proxy to process the operation, and redirects the operation to the respective node,say ip4 which is in the cluster but not in the contact list, based on different properties, how the cluster is set up and policies like TokenAwareLoadBalancingPolicy. You can have a look at this article by datastax : https://docs.datastax.com/en/archived/cassandra/1.2/cassandra/architecture/architectureClientRequestsAbout_c.html

(AWS SWF) Is there a way to get a list of all activity workers listening on a particular tasklist?

In our beta stack, we have a single EC2 instance listening to a tasklist. Sometimes another developer in the team start's his own instance for testing purposes and forget to turn it off. This creates problems for the next developer who tries to start an activity only for it to be taken up by the last developer's machine. Is there a way to get the hostnames of all activity workers listening to a particular tasklist ?
It is not currently possible to get a list of pollers waiting on a task list through the SWF API. The workaround is to look at the identity field on the ActivityExecutionStarted event after it was picked up by the wrong worker.
One way to avoid this issue is always use a task list name that is specific to a machine or developer to avoid collisions.

How to properly Load balance between two Spark Controllers

We are attempting to load balance between two Spark Controllers that connect to Vora...
We are capable of connecting and the query gets sent to the controller.
the problem occurs when the result is supposed to be passed back to hana the process hangs and will never finish.
The last lines in logs state :
17/02/14 14:24:12 INFO CommandRouter$$anon$1: Created broadcast 7 from executeSelectTask at CommandRouter.scala:650
17/02/14 14:24:12 INFO CommandRouter$$anon$1: Starting job: executeSelectTask at CommandRouter.scala:650
17/02/14 14:24:12 INFO CommandRouter$$anon$1: Created broadcast 8 from broadcast at DAGScheduler.scala:1008
17/02/14 14:24:14 INFO CommandRouter$$anon$1: Created broadcast 9 from broadcast at DAGScheduler.scala:1008
Is there something specific to be configured for allowing to load balance between the two controllers?
The reason the process hangs forever is because the nodes where the Spark executor jobs are running do not know the hostname of the HANA host and therefore are never able to return the resultset. This must be added to the node's /etc/hosts file.

Can Akka Cluster Client Send Messages to Cluster Nodes Not in Initial Contacts?

Using Akka 2.3.14, I'm trying to create an Akka cluster of various services. Until now, I have had all my "services" in one artifact that was clustered across multiple nodes, but now I am trying to break this artifact into multiple services that all exist on the same cluster.
So in breaking this up, we've designed it so that any node on the cluster will first try to connect to the seed nodes. If there is no seed node, it will look to see if it is a candidate to run as a seed node (if it's on the same host that a seed node can be on) in which case it will grab the an open seed node port and become a seed node. So in this sense, any service in the cluster can become the seed node.
At least, that was the idea. Our API into this system running as a separate service implements a ClusterClient into this system. The initialContacts are set to be the same as the seed nodes. The problem is that the only receptionist actors I can send a message to through the ClusterClient are the actors on the seed nodes.
Here is an example if it helps. Let's say I have a String Service and a Double Service, and the receptionist for each service is a StringActor and a DoubleActor respectively. Now lets say I have a Client Service which sends StringMessages and DoubleMessages to the StringActor and DoubleActor
So for simplicity, let's say I have two nodes, server1 and server2 then:
seed-nodes = ["akka.tcp://system#server1:2773", "akka.tcp://system#server2:2773"]
My ClusterClient would be initialize like so:
system.actorOf(
ClusterClient.props(
Set(
system.actorSelection("akka.tcp://system#server1:2773/user/receptionist"),
system.actorSelection("akka.tcp://system#server2:2773/user/receptionist")
)
),
"clusterClient"
)
Here are the scenarios that are happening for me:
If the StringServices start up on both servers first, then DoubleMessages from the Client Service just disappear into the ether.
If the DoubleServices start up on both servers first, then StringMessages from the Client Service just disappear into the ether.
If the StringService starts up first on serverX and the DoubleService starts up first on serverY, then all StringMessages will be sent to serverX and all DoubleMessages will be sent to serverY, which is not as bad as the above case, but it means it's not really scaling.
This isn't what I expected, it's possible it's just a defect in my code, so I would like to know if this IS expected behavior or not. And if not, then is there another Akka concept that could help me with this?
Arguably, I could just make one service type my entry point, like a RoutingService that could accept StringMessages or DoubleMessages, and then send that to the correct service. But if the Client Service can only send messages to the RoutingService instances that are in the initial contacts, then I can't dynamically scale the RoutingService because no matter how many nodes I add the Client Service can only send to the initial contacts.
I'm also thinking about subscribing to ClusterEvents in my Client Service and seeing if I can add and remove initial contacts from my cluster client as nodes are started up in the cluster, but I'm not sure if this is possible, and it feels like there should be a better solution.
This is what I found out upon more troubleshooting, in case it helps anyone else:
The ClusterClient will attempt to connect to the initial contacts in order, and then only sends it's messages across that connection. If you are deploying different services on each node, you will have problems as the messages sent from the ClusterClient will only be sent to the node that it makes its connection to. In this way, you can think of the ClusterClient a legitimate client, it will connect to a URL that you give it, and then continue to communicate with the server through that URL.
Reading the Distributed Workers example, I realized that my Frontend, or in this case my routing service, should actually be part of the cluster, rather than acting as a client. For this I used the DistributedPubSub method instead.

ZeroMQ get client connection info

Ok so I have the following case:
I am using ZeroMQ to pass around messages to other nodes in a cluster. I would like to have a master cluster that keeps track of who is in the cluster and tells other nodes when a node connects to the cluster. So for instance:
New node wants to join cluster.
New node announces to master server intent to join
Master server tells other existing nodes about new node
Other existing nodes connect to the new node.
From what I can tell I cant get information on the address of the new node when it tries to connect to a socket on the master server so I was wondering if there was any sort of way that I could try to forward on information about the new node to the other nodes.
Edit: I just noticed functionality that seems like it might be what I want in the monitoring abilities. Is this the only way to do so? And will it even be what I really want.
You may consider using Group Messaging pattern instead. In this pattern instead of talkng to a single master node, you tell to a group of nodes.
There are JOIN and LEAVE commands. When a node joins a group it broadcast a JOIN command to all its peers, thus telling them he has joined.