Spark 0.90 Stand alone connection refused - akka

I am using spark 0.90 stand alone mode.
When I tried with a streaming application in stand alone mode, I am getting a connection refused exception.
I added hostname in /etc/hosts also tried with IP alone. In both cases worker got registered with master without any issues.
Is there a way to solve this issue?
14/02/28 07:15:01 INFO Master: akka.tcp://driverClient#127.0.0.1:55891 got disassociated, removing it.
14/02/28 07:15:04 INFO Master: Registering app Twitter Streaming
14/02/28 07:15:04 INFO Master: Registered app Twitter Streaming with ID app-20140228071504-0000
14/02/28 07:34:42 INFO Master: akka.tcp://spark#127.0.0.1:33688 got disassociated, removing it.
14/02/28 07:34:42 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.165.35.96%3A38903-6#-1146558090] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/02/28 07:34:42 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster#10.165.35.96:8910] -> [akka.tcp://spark#127.0.0.1:33688]: Error [Association failed with [akka.tcp://spark#127.0.0.1:33688]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark#127.0.0.1:33688]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /127.0.0.1:33688

I had a similar issue when running in Spark in cluster mode. My problem was that the server was started with the hostname 'fluentd:7077' and not the FQDN. I edited the
/sbin/start-master.sh
to reflect how my remote nodes connect with the -ip flag.
/usr/lib/jvm/jdk1.7.0_51/bin/java -cp :/home/vagrant/spark-0.9.0-incubating-bin- hadoop2/conf:/home/vagrant/spark-0.9.0-incuba
ting-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.ap
ache.spark.deploy.master.Master --ip fluentd.alex.dev --port 7077 --webui-port 8080
Hope this helps.

Related

Error when running cassandra on Google Cloud on external ip - Failed to bind port 9042 on 34.89.109.98

I am trying to make Cassandra run on Google Cloud using external ip of the VM. But I am getting error Failed to bind port 9042 on 34.89.109.98. As far as I can see, I have followed the rules of setting firewall rules but I am still not able to resolve the issue. I have attached the pics of my configuration for your reference.
1) The firewall rule is
2) The list of all the rules is
3) The VM is
More Information
I followed the steps in https://linuxize.com/post/how-to-install-apache-cassandra-on-debian-9/ to install Cassandra. This automatically started cassandra. Then I killed cassandra, changed the ip address to external IP in cassandra.yaml file and started it again. It didn't work. Then I started working around with VPN settings.
Part of the message dump after I issue the command to start cassandra /usr/sbin/cassandra -f
INFO [main] 2019-12-18 16:09:40,755 StorageService.java:1521 - JOINING: Finish joining ring
INFO [main] 2019-12-18 16:09:40,826 StorageService.java:2442 - Node localhost/127.0.0.1 state jump to NORMAL
INFO [main] 2019-12-18 16:09:41,027 NativeTransportService.java:68 - Netty using native Epoll event loop
INFO [main] 2019-12-18 16:09:41,071 Server.java:158 - Using Netty Version: [netty-buffer=netty-buffer-4.0.44.Final
.452812a, netty-codec=netty-codec-4.0.44.Final.452812a, netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812
a, netty-codec-http=netty-codec-http-4.0.44.Final.452812a, netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a
, netty-common=netty-common-4.0.44.Final.452812a, netty-handler=netty-handler-4.0.44.Final.452812a, netty-tcnative=
netty-tcnative-1.1.33.Fork26.142ecbb, netty-transport=netty-transport-4.0.44.Final.452812a, netty-transport-native-
epoll=netty-transport-native-epoll-4.0.44.Final.452812a, netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452
812a, netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, netty-transport-udt=netty-transport-udt-4.0.4
4.Final.452812a]
INFO [main] 2019-12-18 16:09:41,071 Server.java:159 - Starting listening for CQL clients on /35.197.238.136:9042 (
unencrypted)...
Exception (java.lang.IllegalStateException) encountered during startup: Failed to bind port 9042 on 35.197.238.136.
java.lang.IllegalStateException: Failed to bind port 9042 on 35.197.238.136.
at org.apache.cassandra.transport.Server.start(Server.java:163)
at java.util.Collections$SingletonSet.forEach(Collections.java:4769)
at org.apache.cassandra.service.NativeTransportService.start(NativeTransportService.java:124)
at org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:696)
at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:546)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:635)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:742)
ERROR [main] 2019-12-18 16:09:41,100 CassandraDaemon.java:759 - Exception encountered during startup
java.lang.IllegalStateException: Failed to bind port 9042 on 35.197.238.136.
at org.apache.cassandra.transport.Server.start(Server.java:163) ~[apache-cassandra-3.11.5.jar:3.11.5]
at java.util.Collections$SingletonSet.forEach(Collections.java:4769) ~[na:1.8.0_232]
at org.apache.cassandra.service.NativeTransportService.start(NativeTransportService.java:124) ~[apache-cass
andra-3.11.5.jar:3.11.5]
at org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:696) [apache-cass
andra-3.11.5.jar:3.11.5]
at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:546) [apache-cassandra-3.11.5.ja
r:3.11.5]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:635) [apache-cassandra-3.11.5
.jar:3.11.5]
Within the Cassandra cassandra.yaml file you can bind your Cassandra server to an IP address on which it is listening. The default is 127.0.0.1 (localhost) and is not suitable for external connections.
The address values you can use are the addresses that the Compute Engine has associated with it. These can be discovered using:
ip addr
It is important to realize that a Compute Engine may appear to have a public IP address when shown in the GCP Console, but that is not a network interface on the Compute Engine. In the example in your original question, the Compute Engine IP address would be 10.154.0.4. This is the address you want to set in your configuration file.
See also this document which describes setting up Cassandra on GCP:
Spinning up a Cassandra Cluster on Google Cloud (for free) with just a browser

Unable to connect to neo4j server on my aws ec2 instance - port 7474

After installing neo4j on my aws ec2 instance, the following seems to indicate that the server is up.
# bin/neo4j console
Active database: graph.db
Directories in use:
home: /usr/local/share/neo4j-community-3.3.1
config: /usr/local/share/neo4j-community-3.3.1/conf
logs: /usr/local/share/neo4j-community-3.3.1/logs
plugins: /usr/local/share/neo4j-community-3.3.1/plugins
import: /usr/local/share/neo4j-community-3.3.1/import
data: /usr/local/share/neo4j-community-3.3.1/data
certificates: /usr/local/share/neo4j-community-3.3.1/certificates
run: /usr/local/share/neo4j-community-3.3.1/run
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended.
See the Neo4j manual.
2017-12-01 16:03:04.380+0000 INFO ======== Neo4j 3.3.1 ========
2017-12-01 16:03:04.447+0000 INFO Starting...
2017-12-01 16:03:05.986+0000 INFO Bolt enabled on 127.0.0.1:7687.
2017-12-01 16:03:11.206+0000 INFO Started.
2017-12-01 16:03:12.860+0000 INFO Remote interface available at
http://localhost:7474/
At this point I am not able to connect. I have opened up ports 7474 - and 7687 - and I can access port 80, plus ssh into the instance, etc.
Is this a neo4j or aws problem?
Any help is appreciated.
Colin Goldberg
Set the dbms.connectors.default_listen_address to be 0.0.0.0, then only open the SSL port located on 7473 using Amazon's ec2 security groups. Don't use 7474 if you don't have to.
It looks like Neo4j is only listening on the localhost interface. If your run netstat -a | grep 7474 you want to see something like *:7474. If you see something like localhost:7474 then you won't be able to connect to the port from outside.
Take a look at Configuring Neo4j connectors. I believe you want dbms.connectors.default_listen_address set to 0.0.0.0.
And now a warning - you are opening your Neo4j to the entire planet if you do this. That may be ok but it seems unlikely that this is what you want to do. The defaults are there for a reason - you don't want the entire planet being able to try to hack into your database. Use caution if you enable this.

Django Celery cannot connect to remote RabbitMQ on EC2

I created a rabbitmq cluster on two instances on EC2. My django app uses celery for async tasks which in turn uses RabbitMQ for message queue.
Whenever I start celery with the command:
python manage.py celery worker --loglevel=INFO
OR
python manage.py celeryd --loglevel=INFO
I keep getting following error message related to remote RabbitMQ:
[2015-05-19 08:58:47,307: ERROR/MainProcess] consumer: Cannot connect to amqp://myuser:**#<ip-address>:25672/myvhost/: Socket closed.
Trying again in 2.00 seconds...
I set permissions using:
sudo rabbitmqctl set_permissions -p myvhost myuser ".*" ".*" ".*"
and then restarted rabbitmq-server on both the cluster nodes. However, it didn't help.
In log file, I see few entries like below:
=INFO REPORT==== 19-May-2015::08:14:41 ===
accepting AMQP connection <0.1981.0> (<ip-address>:38471 -> <ip-address>:5672)
=ERROR REPORT==== 19-May-2015::08:14:44 ===
closing AMQP connection <0.1981.0> (<ip-address>:38471 -> <ip-address>:5672):
{handshake_error,opening,0,
{amqp_error,access_refused,
"access to vhost 'myvhost' refused for user 'myuser'",
'connection.open'}}
The file /usr/local/etc/rabbitmq/rabbitmq-env.conf contains an entry for NODE_IP_ADDRESS to bind it only to localhost. Removing the NODE_IP_ADDRESS entry from the config binds the port to all network inferfaces.
Source: https://superuser.com/questions/464311/open-port-5672-tcp-for-access-to-rabbitmq-on-mac
Turns out I had not created appropriate configuration files. In my case (Ubuntu 14.04), I had to create below two configuration files:
$ cat /etc/rabbitmq/rabbitmq-env.conf
RABBITMQ_NODE_IP_ADDRESS=<ip_of_ec2_instance>
<ip_of_ec2_instance> has to be the internal IP that EC2 uses. Not the public IP that one uses to ssh into the instance. It can be obtained using ip a command.
$ cat /etc/rabbitmq/rabbitmq.config
[
{mnesia, [{dump_log_write_threshold, 1000}]},
{rabbit, [{tcp_listeners, [25672]}]},
{rabbit, [{loopback_users, []}]}
].
I think the line {rabbit, [{tcp_listeners, [25672]}]}, was one of the most important piece of configuration that I was missing.
Thanks #dgil for the initial troubleshooting help.
The question has been answered. but just leaving notes with a similar issue i faced should anybody else find it useful
I have a flask app running on ec2 with amqp as a broker on port 5672 and ec2 elasticcache memcached as a backend. The amqp broker had trouble picking up tasks that were getting fired - so i resolved it by fixing as such
Assuming you have rabbitmq-server installed (sudo apt-get install rabbitmq-server), add the user and set the properties as such
sudo add_user username password
set_permissions username ".*" ".*" ".*"
restart server: sudo service rabbitmq-server restart
In your flask app for the celery configuration
broker_url=amqp://username:password#localhost:5672// (Set as above)
backend=cache+memcached://(ec2 cache url):11211/
(The cache+memcached:// tripped me up - without it i kept getting an import error (cannot import module)
Open up the port 5672 on your ec2 instance in the security group.
Now if you fire up your celery worker, it should pick up the the tasks that get fired and store the results on your memcached server

Confd error: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

While debugging I realised that confd doesn't pick up the keys and my journal looks like this:
Sep 18 18:31:50 ip-10-171-54-76.ec2.internal docker[24891]: [nginx] waiting for confd to refresh nginx.conf
Sep 18 18:31:56 ip-10-171-54-76.ec2.internal docker[24891]: 2014-09-18T18:31:56Z 9122c7a54edc confd[9572]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I use nsenter to log in to the running container to run some experiments for debugging purposes. I ran this command
confd -onetime -node 172.17.42.1:4001 -config-file /etc/confd/conf.d/nginx.toml
Then received this error as above
confd[12894]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I am totally clueless at this point. I am using EC2 with the stable version of CoreOS and I am sure that etcd is running on the host. Also, I can ping the host from inside the container successfully.
Any ideas on what's wrong?
Assistance will be much appreciated.
This error indicates that your etcd cluster isn't operating correctly, so confd has nothing to watch. It has probably lost quorum. The logs (journalctl -u etcd) should indicate what happened.

Problems getting RabbitMQ and Django-Celery Running: Target Machine actively refused connection

I am trying to get Django-Celery running on my Django App. I cannot get the worker server to run. When I try I get the message: No Connection could be made because the target machine actively refused it
Here is what I have done so far. First, I installed the django celery package: http://pypi.python.org/pypi/django-celery
I can load it into python without problems. I also installed the RabbitMQ server per the windows install instructions: http://www.rabbitmq.com/install.html#windows
Starting the tutorials in pytho on the RabbitMQ site I saw the need to install pika: http://pypi.python.org/pypi/pika. It imports without any problems.
From there I start the RabbitMQ server by running this at the command line: rabbitmq-service start
I get the message back that Service RabbitMQ started
Here is where I start to have problems.
I attempted the first steps in django-celery: http://packages.python.org/django-celery/getting-started/first-steps-with-django.html and the "hello world" example on the rabbitMQ site: http://www.rabbitmq.com/tutorials/tutorial-one-python.html
In both cases I get the message: No Connection could be made because the target machine actively refused it
My first thought was that this sounded like a firewall problem. So I went into the windows 7 firewall and added inbound and outbound rules to open the local and remote ports 5672 and 5673 to TCP protocol, but I still get the same error message.
When I run rabbitmqctl status i get the message:
Error: unable to connect to node 'rabbit#hostname': nodedown
diagnostics:
- nodes and their ports on hostname: [{rabbitmqctl18856, 505031}]
Does that mean it that it is trying to operate on those ports? what about the default 5672?
Any suggestions?
UPDATE: This was actually a problem resulting from several failed rabbitmq installs conflicting with the latest installation. If you have to remove rabbitmq use the 'rabbitmq-service remove' command and not SC DELETE, which cause a lot of problems for me and I had to go in and clean up my windows registry file.
The nodedown error indicated by rabbitmqctl suggests that the server isn't running on that machine.
Try going though the steps in RabbitMQ's troubleshooting guide. In particular, pay close attention to the logs. Has the server crashed for some reason? Could you post the logs somewhere?