Host key verification failed in google compute engine based mpich cluster - google-cloud-platform

TLDR:
I have 2 google compute engine instances, I've installed mpich on both.
When I try to run a sample I get Host key verification failed.
Detailed version:
I've followed this tutorial in order to get this task done: http://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/.
I have 2 google compute engine vms with ubuntu 14.04 (the google cloud account is a trial one, btw). I've downloaded this version of mpich on both instances: http://www.mpich.org/static/downloads/3.3rc1
/mpich-3.3rc1.tar.gz and I installed it using these steps:
./configure --disable-fortran
sudo make
sudo make install
This is the way the /etc/hosts file looks on the master-node:
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
169.254.169.254 metadata.google.internal metadata
10.128.0.3 client
10.128.0.2 master
10.128.0.2 linux1.us-central1-c.c.ultimate-triode-161918.internal linux
1 # Added by Google
169.254.169.254 metadata.google.internal # Added by Google
And this is the way the /etc/hosts file looks on the client-node:
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
169.254.169.254 metadata.google.internal metadata
10.128.0.2 master
10.128.0.3 client
10.128.0.3 linux2.us-central1-c.c.ultimate-triode-161918.internal linux
2 # Added by Google
169.254.169.254 metadata.google.internal # Added by Google
The rest of the steps involved adding an user named mpiuser on both nodes and configuring passwordless ssh authentication between the nodes. And configuring a cloud shared directory between nodes.
The configuration worked till this point. I've downloaded this file https://raw.githubusercontent.com/pmodels/mpich/master/examples/cpi.c to /home/mpiuser/cloud/mpi_sample.c, compiled it this way:
mpicc -o mpi_sample mpi_sample.c
and issued this command on the master node while logged in as the mpiuser:
mpirun -np 2 -hosts client,master ./mpi_sample
and I got this error:
Host key verification failed.
What's wrong? I've tried to troubleshoot this problem over more than 2 days but I can't get a valid solution.

Add
package-lock.json
in ".gcloudignore file".
And deploy it again.

It turned out that my password less ssh wasn't configured properly. I've created 2 new instances and did the following things to get a working password less and thus get a working version of that sample. The following steps were execute on an ubuntu server 18.04.
First, by default, instances on google cloud have PasswordAuthentication setting turned off. In the client server do:
sudo vim /etc/ssh/sshd_config
and change PasswordAuthentication no to PasswordAuthentication yes. Then
sudo systemctl restart ssh
Generate a ssh key from the master server with:
ssh-keygen -t rsa -b 4096 -C "user.mail#server.com"
Copy the generated ssh key from the master server to the client
ssh-copy-id client
Now you get a fully functional password less ssh from master to client. However mpich still failed.
The additional steps that I did was to copy the public key to the ~/.ssh/authorized_keys file, both on master and client. So execute this command from both servers:
sudo cat .ssh/id_rsa.pub >> .ssh/authorized_keys
Then make sure the /etc/ssh/sshd_config files from both the client and server have the following configurations:
PasswordAuthentication no
ChallengeResponseAuthentication no
UsePAM no
Restart the ssh service from both client and master
sudo systemctl restart ssh
And that's it, mpich works smoothly now.

Related

Running code server on gcp cloud shell gives error when previewing

I'm trying to run code-server on gcp cloud shell. I downloaded the following version
https://github.com/cdr/code-server/releases/download/v3.9.2/code-server-3.9.2-linux-amd64.tar.gz, which I think is the correct one, extracted the contents and ran
code-server --auth none
This gave the following output
[2021-04-06T00:53:21.728Z] info code-server 3.9.2 109d2ce3247869eaeab67aa7e5423503ec9eb859
[2021-04-06T00:53:21.730Z] info Using user-data-dir ~/.local/share/code-server
[2021-04-06T00:53:21.751Z] info Using config file ~/.config/code-server/config.yaml
[2021-04-06T00:53:21.751Z] info HTTP server listening on http://127.0.0.1:8080
[2021-04-06T00:53:21.751Z] info - Authentication is disabled
[2021-04-06T00:53:21.751Z] info - Not serving HTTPS
Now when I try Web Preview -> preview on port 8080 nothing happens I just get a blank screen and on the code console I see the following error
2021-04-06T00:50:04.470Z] error vscode Handshake timed out {"token":"e9b80ff7-10f9-4089-8497-b98688129452"}
I'm not sure what I need to do here ?
In cloud shell editor, create a file with .sh extension, and install the code-server by using these steps:
export VERSION=`curl -s https://api.github.com/repos/cdr/code-server/releases/latest | grep -oP '"tag_name": "\K(.*)(?=")'`
wget https://github.com/cdr/code-server/releases/download/v3.10.2/code-server-3.10.2-linux-amd64.tar.gz
tar -xvzf code-server-3.10.2-linux-amd64.tar.gz
cd code-server-3.10.2-linux-amd64
To run the vscode.sh file using terminal:
./vscode.sh
If a warning “permission denied” comes, type chmod +x vscode.sh and then again proceed with
running the file.
To navigate to the folder:
cd code-server-3.10.2-linux-amd64/
To navigate to the bin:
cd bin/
To start the server :
./code-server --auth none --port 8080
Now you can see the VSCode IDE in your browser either by using web preview->preview on port 8080 option or the HTTP server link in your terminal.
My gut is saying that one must study this article (Expose code-server) in great detail. I think you will find that Code server is listening on IP address 127.0.0.1 at port 8080. Your thinking then is to access this server using Web Preview on port 8080 .... however ... pay attention to the IP addresses of your virtual machine. The IP address 127.0.0.1 is known as the loopback address. It is ONLY accessible to applications running on the SAME machine. My belief is that when you run Web Preview, you are trying to access the IP address of your Cloud Shell machine which is NOT 127.0.0.1.
If you read the above article, the story goes on to show how to use SSH forwarding to provide a front-end to whatever this application may be.

how to run apache superset in dev mode on aws ec2

I have developed a plugin for the apache superset,for which I have followed the following tutorial https://preset.io/blog/2020-07-02-hello-world/
In this tutorial the setup only works for development environment. So I need to run the backend server and frontend server separately using these commands.
for backend:
superset run -p 8088 -h 0.0.0.0 --with-threads --reload --debugger
for frontend:
npm run dev-server
In inboud rules in the security group for my EC2 instance, I have set the inbound rules to custom tcp where I have allowed the traffic for port 9000, 8088.
however I am unable to ping publicDNS of EC2:9000
this is not the case when I ping publicDNS of EC2:8088
turns out if we edit the webpack.config.js file and add an additional parameter for for host by passing devserverHost = '0.0.0.0' and then replacing the 'localhost' with ${devserverHost} solves the problem, additionally this also has to be done for the webpack.proxy.config.js file

Ansible deployment to windows host behind bastion

I am currently successfully using Ansible to run tasks on hosts that are in a private subnet in AWS, which the below group_vars is setting up:
ansible_ssh_common_args: '-o ProxyCommand="ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -W %h:%p -q ec2-user#bastionhost#example.com"'
This is working fine.
For Windows instances not in a private subnet the following group_vars works:
---
ansible_user: "AnsibleUser"
ansible_password: "Password"
ansible_port: 5986
ansible_connection: winrm
ansible_winrm_server_cert_validation: ignore
Now, trying to get Ansible to deploy to a Windows server behind the bastion by just using the ProxyCommand won't work - which I understand.
I believe though that there is a new protocol/module I can use called psrp.
I imagine that my group_vars for my Windows hosts needs to change to something like this:
---
ansible_user: "AnsibleUser"
ansible_password: "Password"
ansible_port: 5986
ansible_connection: psrp
ansible_psrp_cert_validation: ignore
If I run with just the above changes against instances that are publicly available (and not trying to connect via a bastion), my task seems to work fine:
Using module file /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/modules/windows/win_shell.ps1
<10.100.11.14> ESTABLISH PSRP CONNECTION FOR USER: Administrator ON PORT 5986 TO 10.100.11.14
PSRP: EXEC (via pipeline wrapper)
I know there must be more changes before I can try this on a windows server behind a bastion, but ran it anyway to see what errors I get to give me clues on what to do next. Here is the result when running this on an instance behind a bastion server:
Using module file /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/modules/windows/setup.ps1
<10.100.11.14> ESTABLISH PSRP CONNECTION FOR USER: Administrator ON PORT 5986 TO 10.100.11.14
The full traceback is:
.
.
.
.
ConnectTimeout: HTTPSConnectionPool(host='10.100.11.14', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x110bbfbd0>, 'Connection to 10.100.11.14 timed out. (connect timeout=30)'))
It seems like Ansible is ignoring my group_vars for the ProxyCommand - which I'm not sure if that's expected.
I'm also not sure on what the next steps are to enable Ansible to deploy to Windows servers behind a bastion.
What config am I missing?
The doc says, the ansible_ssh_common_args setting is appended to sftp, scp, and ssh commands. So it sounds normal to me that is not taking into account when using winrm or psrp ansible_connection.
As explained in the link provided by Pouyan in the comments, ansible_psrp_proxy variable will be used to provide proxy information.
ansible_connection: psrp
ansible_psrp_proxy=socks5h://localhost:1234
More info on the creation of the socks proxy can be found on: https://www.bloggingforlogging.com/2018/10/14/windows-host-through-ssh-bastion-on-ansible/

Cloudera manager while installation , get HostName

As a student we were asked to implement a cloudier cluster using Vm’s , i tried to install cloudier cluster using 3 Vm’s under ubuntu 14.01 ,
My steps so far :
I used ssh passwordless connection between all nodes , (tested )
Configure /etc/hosts to all my vm’s
installed cloudera CDH ,
the problem occurs when i try to install every thing on each node this thing occurs .enter image description here
but whats make me so unconfident about this its it identify perfectly my hosts names and address in the pre-installation
[enter image description here][2]
enter image description here
but when i try to install it fails and give me the error mentioned ,
so i tried to see if python dont recognize my hostnames so i went on some codes to test and every thing worked perfectly :
>>> import socket
>>> socket.gethostbyname("master1")
'10.211.55.13'
>>> socket.gethostbyname("slave1")
'10.211.55.11'
>>> socket.gethostbyname("slave2")
'10.211.55.12'
>>>
my /etc/hosts file its similar to all hosts
127.0.0.1 localhost
#127.0.1.1 ubuntu
10.211.55.13 master1.bdsas.com master1
10.211.55.11 slave1.bdsas.com slave1
10.211.55.12 slave2.bdsas.com slave2
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
CDH does not support IPv6 (see this link: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_ig_req_supported_versions.html) and search for IPv6. Turn off IPv6 in your network configuration and see if that resolves the issue.

Can't connect to VM running Django

Using VirtualBox, I have a NAT enabled VM running Centos 7. The host OS is Windows 7. I can't seem to access the Django web server running inside the VM. What am I missing?
I have two port forwarding rules set for the Virtual Machine:
I start the Django web server on the guest OS with:
python manage.py runserver 0.0.0.0:8000
And I try to visit the webpage on the host OS at:
http://localhost:8000
Google Chrome gives me the error code ERR_CONNECTION_RESET.
The result of curl on the host OS:
[user#win7 ~ ]$ curl http://localhost:8000
curl: (56) Recv failure: Connection reset by peer
Here is the result of a netstat performed on the guest OS:
[user#vm ~ ]$ netstat -na | grep 8000
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN
Here is the result of a netstat performed on the host OS (with Cygwin):
[user#win7 ~ ]$ netstat -na | grep 8000
TCP 0.0.0.0:8000 0.0.0.0:0 LISTENING
It is also worth mentioning that the SSH rule works. I can SSH into the machine with no problems.
This is not a solution, but a work-around for my problem. Maybe this will help anyone encountering a problem similar to mine, and just wants to be able to connect to their VM's webserver.
Since SSH was working, I figured I could access the webpage via a SSH Tunnel. The syntax for doing so via command line is:
ssh -L <local-port>:<remote-host>:<remote-port>
So in my situation, if I wanted to open a tunnel via command line I would do:
ssh -L 8000:127.0.0.1:8000
This would allow me to browse to http://localhost:8000 and access the website.
You can also do this via PuTTY, but I won't explain that here, so just Google for a guide.
The ssh tunnel is an OK work around, but the problem is almost certainly CentOS 7 which now uses firewalld rather than iptables to manager access. And, unlike iptables the default configuration is quite restrictive.
if
ps -ae | grep firewall
returns something like
602 ? 00:00:00 firewalld
your system is running firewalld, not iptables. They do not run together.
To correct your VM so you can access your django site from the host use the commands:
firewall-cmd --zone=public --add-port=8000/tcp --permanent
firewall-cmd --reload
Many thanks to pablo v in the post "Access django server on virtual Machine" for pointing this out.