Apache Spark - Connection refused for worker - akka

Hi I was new to apache spark and i was trying to learn it
While creating a new standalone cluster I met with this error.
I started my master and it is active in port 7077, i can see that in the ui (port 8080)
While startting the server using the command
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.0.56:7077
I am meeting with a connection refused error
14/07/22 13:18:30 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker#node- physical:55124] -> [akka.tcp://sparkMaster#192.168.0.56:7077]: Error [Association failed with [akka.tcp://sparkMaster#192.168.0.56:7077]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster#192.168.0.56:7077]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.0.56:7077
Please help me with the error i am sruck here for a long time.
I hope the information is enough. Please help

In my case, I went to /etc/hosts and :
removed the line with 127.0.1.1 and it worked.
wrote "MASTER_IP MACHINE_NAME"

Try "./sbin/start-master -h ". It works, when I specify the host name as IP address.

Change the SPARK_MASTER_HOST=< ip> in the spark-env.sh of the master node.
Then restart the master, if you grep the process you will see it changes from
java -cp /spark/conf/:/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host < HOST NAME> --port 7077 --webui-port 8080
to
java -cp /spark/conf/:/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host < HOST IP> --port 7077 --webui-port 8080

Check if your firewall is turned off as it might be blocking the worker connection by either turning off the firewall temporarily:
$ sudo service iptables stop
or permanently:
$ sudo chkconfig iptables off

it seems like spark is very picky about IP and machine names. so, when starting your master, it will use your machine name to register spark master. if that name is not reachable from your workers, it will be almost impossible to reach.
a way to solve it, is to start your master like this:
SPARK_MASTER_IP=YOUR_SPARK_MASTER_IP ${SPARK_HOME}/sbin/start-master.sh
then, you will be able to connect your slaves like this
${SPARK_HOME}/sbin/start-slave.sh spark://YOUR_SPARK_MASTER_IP:PORT
i hope it helps!

did you add the entries of master and worker nodes in etc/hosts, if not add every machines ip and host name mappings in all the machines.

For Windows: spark-class org.apache.spark.deploy.master.Master -h [Interface IP to bind to]

I had the similar problem in a docker container, I solved it by setting the IP for master and driver as localhost, specifically:
set('spark.master.hostname' ,'localhost')
set('spark.driver.hostname', 'localhost')

I do not have a DNS and I added entries in /etc/hosts in the master node to refer to the IPs and hostnames of all master and worker nodes. In worker nodes, I added the IP and hostname of the master node in /etc/hosts.

Related

Unable to connect to Docker container: Connection Refused

I have a war file deployed as Docker container on linux ec2. But when I try to hit the http://ec2-elastic-ip:8080/AppName, I don't get any response.
I have all the security group inbound rules set up for both http and https. So that's not a problem.
Debugging
I tried debugging by ssh-ing the linux instance. Tried command curl localhost:8080 , this is the response:
curl: (7) Failed to connect to localhost port 8080: Connection refused
Tried with 127.0.0.1:8080 but the same response.
Next thing I did was to list the Docker container: docker ps. I get:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<ID> <ecr>.amazonaws.com/<my>-registry:2019-05-16.12-17-02 "catalina.sh run" 24 minutes ago Up 24 minutes 0.0.0.0:32772->8080/tcp ecs-app-24-name
Now, I connected to this container using docker exec -it <name> /bin/bash and tried checking tomcat logs which clearly shows that my application war is there and tomcat has started.
I ever tried checking the docker-machine ip default but this gave me error:
Docker machine "default" does not exist. Use "docker-machine ls" to list machines. Use "docker-machine create" to add a new one.
Now am stuck. Not able to debug further. The result am expecting is to access the app through the url above.
What to do? Is it something am doing wrong?
Also, to mention, the entire infrastructure is managed through terraform. I first create the base image,copy the war to webapps using DockerFile, push the registry image and finally do a terraform apply to apply any changes.
Make sure that apache is listening on all IP addresses inside the docker container, not just localhost. The IP should be like 0.0.0.0.
If any service is running inside docker and is listening to only localhost, it can only be accessed inside that container, not from the host.
You can also try to start apache with port 8080 and bind docker 8080 port with host 8080 port
docker run apache -p 8080:8080
Currently your app is working on a random host port i.e 32772, see the docker ps output .You must be able to access you app on http://ec2-ip:32772 once you allow port 32772 in security groups.
In order to make it work on host port 8080, you need to bind/expose the host port during docker run -
$ docker run -p 8080:8080 ......
If you are on ECS, ideally you should use an ALB & TG with your service.
However, if you are not using ALB etc then you can try giving a static hostPort in TD "hostPort": 8080(I haven't tried this). If it works fine, you will need to make sure to change the deployment strategy as "minimum healthy percentage = 0" else you might face port conflict issues.
If the application needs a network port you must EXPOSE it in the docker file.
EXPOSE <port> [<port>/<protocol>...]
In case you need that port to be mapped to a specific port on the network, you must define that when you spin up the new container.
docker run -p 8080:8080/tcp my_app
If you use run each image separately you must bind the port every time.
If you don't want to do this every time you can use docker-compose and add the ports directive in it.
ports:
- "8080:8080/tcp"
Supposing you added expose in the dockerfile, he full docker-compose.yml would look like this:
version: '1'
services:
web:
build:
ports:
- "8080:8080"
my_app:
image: my_app

Unable to register AWS host to Ambari server

While registering a host to the cluster of Ambari-server, I am getting the following error.
"Host checks were skipped on 1 hosts that failed to register."
I'm trying to install HDP 2.5 version on the instance of AWS.
I have tried to follow the documentation of Hortonworks.
https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-installation/content/set_the_hostname.html
I have added public ip address and public hostname to /etc/hosts file and change the name of host in /etc/hostname file on the server and on the host. Rebooted both, hostname got changed. Then I have stop iptables by
sudo service iptables stop
After doing everything, the host registration is still failing. Kindly help. I am stuck.
Background
From my experience with Ambari (Hortonworks) you have to explicitly setup your Hadoop nodes in each other's /etc/hosts file with the actual name/IPs that the Hadoop services will bind to. NOTE: hostnames should also be FQDN - fully qualified domain names.
For example if you're setting up the hosts as:
node01.mydom.com (10.0.0.2)
node02.mydom.com (10.0.0.3)
node03.mydom.com (10.0.0.4)
These entries should be in all 3 server's /etc/hosts and these should be the names used when referencing them within Ambari's installation/setup wizards.
If you do not pay special attention to this detail, Ambari's server will fail to find/manage any of the other node's that you're telling it to manage.
hostname of ambari-agents
The other item to look at is that the ambari-agent's and what hostnames they think they're going as.
$ ps -eaf|grep ambari_agent
root 3282 1 0 Jul30 ? 00:00:00 /usr/bin/python /usr/lib/python2.6/site-packages/ambari_agent/AmbariAgent.py start --expected-hostname=node01.mydom.com
root 3290 3282 1 Jul30 ? 08:24:29 /usr/bin/python /usr/lib/python2.6/site-packages/ambari_agent/main.py start --expected-hostname=node01.mydom.com
Debugging further
In the screen where you're attempting to register the other nodes as agents, there's a full log of what's happening and you can typically get the commands from this area and attempt to run them directly. I've done this on a number of occasions. The commands will often be python ... commands which you can then copy/paste from the logs and run on the Ambari server where you're attempting to run the install.

Issue running apprtc on AWS

I am following instruction and am able to build, run apprtc on my local ubuntu machine.
I am trying to implement the same on AWS. I have added ports 8000 and 8080 to the instance security group. On AWS when I execute
/dev_appserver.py ./out/app_engine
I get console message
Starting API server at: http://localhost:45920
Starting module "default" running at: http://localhost:8080
Starting admin server at: http://localhost:8000
I check ec2...compute-1.amazonaws.com:8000, ec2...compute-1.amazonaws.com:8080 and see nothing. Could you please point to what I am missing?
By default the apprtc is bound to localhost, you need to specify --host 0.0.0.0 in order to expose it outside.
So use "/home/usertest/google_appengine/dev_appserver.py ./out/app_engine --host 0.0.0.0" to run out the machine

Setting up JMeter for Distributed testing in AWS with connectivity issues

I have to do distributed testing using JMeter. The objective is to have multiple remote servers in AWS controlled by one local server send a file download request to another server in AWS.
How can I set up the different servers in AWS?
How can I connect to them remotely?
Can someone provide some step by step instructions on how to do it?
I have tried several things but keep running into connectivity issues across networks.
We had a similar task and we ran into a bunch of issues as well. Here are the details of the whole process and what we did to resolve the issues we encountered. Hope it helps.
We needed to send requests from 5 servers located in various regions of the world. So we launched 5 micro instances in AWS, each in a different region. We chose the regions to be as geographically apart as possible.
Remote (server) JMeters config
Here is how we set up each instance.
Installed java:
$ sudo apt-get update
$ sudo apt-get install default-jre
Installed JMeter:
$ mkdir jmeter
$ cd jmeter;
$ wget ftp://apache.mirrors.pair.com//jmeter/binaries/apache-jmeter-2.9.tgz
$ gunzip apache-jmeter-2.9.tgz;tar xvf apache-jmeter-2.9.tar
Edited the jmeter.properties file in the /bin folder of the JMeter installation and uncomment the line containing the server.rmi.localport setting. We changed the port to 50000.
server.rmi.localport=50000
Started JMeter server. Make sure the address and the port the server reports listening to are correct.
$ cd ~/jmeter/apache-jmeter-2.9/bin
$ vi jmeter-server
Local (client) JMeter config
Then we set up JMeter to run tests remotely on these instances on our local client machine:
Ensured to use the same version of JMeter as was running on the servers. Installed Java and JMeter as described above.
Enabled remote testing by editing the jmeter.properties file that can be found in the bin folder of the JMeter installation. The parameter remote_hosts needed to be set with the public DNS of the remote servers we were connecting to.
remote_hosts=54.x.x.x,54.x.x.x,54.x.x.x,54.x.x.x,54.x.x.x
We were now able to tell our client JMeter instance to run tests on any or all of our specified remote servers.
Issues and resolutions
Here are the issues we encountered and how we resolved them:
The client failed with:
ERROR - jmeter.engine.ClientJMeterEngine: java.rmi.ConnectException: Connection - refused to host: 127.0.0.1
It was due to the server host returning the private IP address as its address because of Amazon NAT.
We fixed this by setting the parameter RMI_HOST_DEF that the /usr/local/jmeter/bin/jmeter-server script includes in starting the server:
RMI_HOST_DEF=-Djava.rmi.server.hostname=54.xx.xx.xx
Now, the AWS instance returned the server’s external IP, and we could start the test.
When the server node attempted to return the result and tried to connect to the client, the server tried to connect to the external IP address of my local machine. But it threw a connection refused error:
2013/05/16 12:23:37 ERROR - jmeter.samplers.RemoteListenerWrapper: testStarted(host) java.rmi.ConnectException: Connection refused to host: xxx.xxx.xxx.xx;
We resolved this issue by setting up reverse tunnels at the client side.
First, we edited the jmeter.properties file in the /bin folder of the JMeter installation and uncommented the line containing the client.rmi.localport setting. We changed the port to 60000:
client.rmi.localport=60000
Then we connected to each of the servers using SSH, and setup a reverse tunnel to port 60000 on the client.
$ ssh -i ~/.ssh/54-x-x-x.us-east.pem -R 60000:localhost:60000 ubuntu#54.x.x.x
We kept each of these sessions open, as the JMeter server needs to be able to deliver the test results to the client.
Then we set up the JVM_ARGS environment variable on the client, in the jmeter.sh file in the /bin folder:
export JVM_ARGS="-Djava.rmi.server.hostname=localhost"
By doing this, JMeter will tell the servers to connect to localhost:60000 for delivering their results. This ends up being tunneled back to the client.
The SSH connections to the servers kept dropping after staying idle for a little bit. To prevent that from happening, we added a parameter to each of the SSH tunnel set up directing the client to wait 60 seconds before sending a null packet to the server to keep the connection alive:
$ ssh -i ~/.ssh/54-x-x-x.us-east.pem -o ServerAliveInterval=60 -R 60000:localhost:60000 ubuntu#54.x.x.x
(.ssh/config version of all required SSH settings:
Host 54.x.x.x
HostName 54.x.x.x
Port 22
User ubuntu
ServerAliveInterval 60
RemoteForward 127.0.0.1:60000 127.0.0.1:60000
IdentityFile ~/.ssh/54-x-x-x.us-east.pem
IdentitiesOnly yes
Just use ssh 54.x.x.x after setting this up.
)
I just went though this on openstack and found the same issues... no idea why the jmeter remoting documentation only covers half the required steps. You can do it without tunnels or touching the properties files.
You need
All nodes to advertise their public IP - on AWS/OS this defaults to the private IP
Ingress rules for the RMI port which defaults to 1099 - I use this
Ingress rules for the RMI "local" port which defaults to dynamic. Below I use 4001 for the client and 4000 for servers. The port can be the same but note the properties are different.
If you are using your workstation as the client you probably still need tunnels. Above Archana Aggarwal has good tips for tunnels.
Remote servers
Set java.rmi.server.hostname and server.rmi.localport inline or in the properties file.
jmeter-server -Djava.rmi.server.hostname=publicip -Dserver.rmi.localport=4000
Sneaky server on client
You can also run one on the same machine as the client. For clarity I've set java.rmi.server.hostname but left server.rmi.localport as dynamic
jmeter-server -Djava.rmi.server.hostname=localip
Client
Set java.rmi.server.hostname and client.rmi.localport inline or in the properties file. Use -R etc like so:
jmeter -n -t Test.jmx -Rremotepublicip1,remotepublicip2 -Djava.rmi.server.hostname=clientpublicip -Dclient.rmi.localport=4001 -GmypropA=1 -GmypropB=2 -lresults.jtl
When you go for distributed testing using JMeter in AWS, I would suggest you to use docker - which will help us with jmeter test infrastructure very quickly. This way we can also ensure that same version of java and jmeter are installed in all the instances of amazon which is very important of JMeter distributed testing.
Ensure that - you set below properties and ports are open for jmeter-server. [they do not have to be 1099,50000 exactly]
server.rmi.localport=50000
server_port=1099
java.rmi.server.hostname=SERVER_IP
for client
client.rmi.localport=60000
java.rmi.server.hostname=SERVER_IP - this step is very important as the container in aws instance will have their own IP address in the docker network - so master and slave can not communicate. So we explicitly set this property
More info:
http://www.testautomationguru.com/jmeter-distributed-load-testing-using-docker-in-aws/

Connecting to EC2 Django development Server

I am new to EC2 and web development. Currently I have a Linux EC2 instance running, and have installed Django. I am creating a test project before I start on my real project and tried running a Django test server.
This is my output in the shell:
python manage.py runserver ec2-###-##-##-##.compute-1.amazonaws.com:8000
Validating models...
0 errors found
Django version 1.3, using settings 'testsite.settings'
Development server is running at http://ec2-###-##-##-##.compute-1.amazonaws.com:8000/
Quit the server with CONTROL-C.
To test that it is wroking I have tried visiting: ec2-###-##-##-##.compute-1.amazonaws.com:8000 but I always get a "Cannot connect" message from my browser.
Whenever I do this lcoally on my computer however I do successfully get to the DJango development home page at 127.0.0.1:8000. Could someone help me figure out what I am doing wrong / might be missing when I am doing this on my EC2 instance as opposed to my own laptop?
Using an ec-2 instance with Ubuntu, I found that specifying 0.0.0.0:8000 worked:
$python manage.py runserver 0.0.0.0:8000
Of course 8000 does need to be opened for TCP in your security group settings.
You probably don't have port 8000 open on the firewall. Check which security group your instance is running (probably "default") and check the rules it is running. You will probably find that port 8000 is not listed.
1) You need to make sure port 8000 is added as a Custom TCP Rule into your Security Group list of inbound ports
2) Odds are that the IP that you see listed on your AWS Console, which is associated to your instance is a PUBLIC IP OR a PUBLIC Domain Name(i.e. ec2-###-##-##-##.compute-1.amazonaws.com or 174.101.122.132) that Amazon assigns.
2.1) If it is a public IP, then your instance has no way of knowing what the Public IP assigned to it is, rather it will only know the its assigned Local IP.
2.2) To get your Local IP on a Linux System, type:
$ ifconfig
Then look at the eth0 Data and you'll see an IP next to "inet addr" of the format xxx.xxx.xxx.xxx (e.g. 10.10.12.135) This is your Local IP
3) To successfully runserver you can do one of the following two:
$ python manage.py runserver <LOCAL IP>:8000
or
$ python manage.py runserver 0.0.0.0:8000
** Option Two also works great as Ernest Ezis mentioned in his answer.
EDIT : From The Django Book : "The IP address 0.0.0.0 tells the server to listen on any network interface"
** My theory of Public IP could be wrong, since I'm not sure how Amazon assigns IPs. I'd appreciate being corrected.
I was having the same problem. But I was running RHEL on EC2. Besides from adding a rule to security group, I had to manually add a port to firewalld.
firewall-cmd --permanent --add-port=8000/tcp
firewall-cmd --reload
That worked for me! (Although no idea why I had to do that)
Yes, if you use quick launch EC2 option, you should add new HTTP rule (just as it appears on the list) to run a development server.
Adding a security group with the inbound rules as follows usually does the trick unless you have something else misconfigured. The port range specifies which port you want to allow incoming traffic on.
HTTP access would need 80
HTTP access over port 8000 would need 8000
SSH to server would need 22
HTTPS would need 443