Im new to Devops and im learning the Ansible as a beginner with AWS EC2 Ubuntu 16.04 LTS.
initially i have launched 2 EC2 instances with SSH port 22 open in security group, i named the instances as Master and Slave.
I ssh'ed into Master Instances when all is up and running.
I will list steps one by one as follow
1. I created a user called ansible and issued password
ubuntu#ip-172-31-17-94:~$ sudo su
root#ip-172-31-17-94:/home/ubuntu# adduser ansible
Adding user `ansible' ...
Adding new group `ansible' (1001) ...
Adding new user `ansible' (1001) with group `ansible' ...
Creating home directory `/home/ansible' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for ansible
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n]
2. uncommented PermitRootLogin yes and PasswordAuthentication yes in /etc/ssh/sshd_config and restarted ssh
3.Changed the Visudo file adding root access to ansible user
root#ip-172-31-17-94:/home/ubuntu# visudo
# User privilege specification
ansible ALL=(ALL:ALL) ALL
saved and closed
4.Generated ssh keygen
ansible#ip-172-31-17-94:~$ ssh-keygen -t rsa -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/home/ansible/.ssh/id_rsa):
Created directory '/home/ansible/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/ansible/.ssh/id_rsa.
Your public key has been saved in /home/ansible/.ssh/
The key fingerprint is:
SHA256:wah0yD9Ngf7hzLIihkEFvNYjPNrzcLubNxGnqFKYrik ansible#ip-172-31-17-94
The key's randomart image is:
+---[RSA 4096]----+
|... .. |
| . o ..o . |
|. + +.o + |
| B + +ooo. |
|++o o.oOS. |
|= = o +.= |
|.+ * . + |
|Eo+ +.+ |
|=o .+= . |
5. Installing Ansible packages
$ sudo apt-get install software-properties-common
$ sudo apt-add-repository ppa:ansible/ansible
$ sudo apt-get update
$ sudo apt-get install ansible
Awesome!.. all is up and ansible is install in master server
I issued a command to test the ansible
ansible#ip-172-31-17-94:~$ ansible --version
config file = /etc/ansible/ansible.cfg
configured module search path = Default w/o overrides
python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]
6. Edited the /etc/ansible/hosts and added my slave server private ip (My ec2 are in same subnet in same availability zone ) so i used private Ip
.save and closed
7.ssh'ed in Slave Server and repeated the 1, 2 ,3 steps and logout
8.shh'ed into Master server
local#host $ shh ansible#<Master Ip>
9.Copying public Key to the Slave Server from Master server
ansible#ip-172-31-17-94:~$ ssh-copy-id
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/ansible/.ssh/"
The authenticity of host ' (' can't be established.
ECDSA key fingerprint is SHA256:qOW0ZktetcpTNmxRsubxn1kcr8egyNmcA5Uk9+oWc7A.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
ansible#'s password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh ''"
and check to make sure that only the key(s) you wanted were added
ansible#ip-172-31-17-94:~$ ssh
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-1013-aws x86_64)
* Documentation:
* Management:
* Support:
Get cloud support with Ubuntu Advantage Cloud Guest:
14 packages can be updated.
12 updates are security updates.
Last login: Sat Apr 22 06:27:15 2017 from
ansible#ip-172-31-29-197:~$ logout
Connection to closed.
And successfully configured the Mater to Slave password-less ssh connection
Till then i haven't faced any issue.
when i issued a command Ansible -m ping all
i got an error | FAILED! => {
"changed": false,
"failed": true,
"module_stderr": "Shared connection to closed.\r\n",
"module_stdout": "/bin/sh: 1: /usr/bin/python: not found\r\n",
"rc": 0
later when i gooogled i got some chunks i followed listed step by the solution.
the solution is i repeated step no.5 from above list in slave server. when issued ansible -m ping from the Master server i got success Message
My question that how to install agent-less anible in a Slave, The main feature of ansible agent-less!
Help me if i missed any of step
You need to install Python 2 on
Or (in a less likely case) if it is installed in a path different than /usr/bin/python you need to add ansible_python_interpreter parameter pointing to the right executable to your inventory file.
I followed the steps in the below links to set up my GCP dynamic inventory.
In short, it was the below steps
I installed the needed requisites.
$ pip install requests google-auth1
I created a service account with sufficient privileges. and set it's
I added the below to the /etc/ansible/ansible.cfg file
enable_plugins = gcp_compute
I created a file called hosts.gcp.yml which holds the dynamic inventory setup (as shown below):
- my-project-id
- name
filters: []
auth_kind: serviceaccount
service_account_file: my/credentials_path.json
- key: zone
and tried to run the below command which worked fine
macbook#MacBooks-MacBook-Pro Ansible % ansible-inventory --graph -i hosts.gcp.yml
| |--test
but when running the below command I got the following errors
macbook#MacBooks-MacBook-Pro Ansible % ansible -i hosts.gcp.yml all -m ping
test | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname test: nodename nor servname provided, or not known",
"unreachable": true
I then commented out the - name option from the hosts.gcp.yml file but got another error.
macbook#MacBooks-MacBook-Pro Ansible % ansible -i hosts.gcp.yml all -m ping
34.X.X.8 | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: macbook#34.X.X.8: Permission denied (publickey).",
"unreachable": true
This raises the following questions
1- Is an SSH setup (creating users and copying ssh-keys) needed on the host machines when using dynamic Inventories (I don't think so)?
2- Why is ansible resorting to SSH though a dynamic Inventory is set? What if the host didn't expose SSH to the public or didn't have a public IP?
Your kind support is highly appreciated.
A more verbose output of the test
macbook#MacBooks-MacBook-Pro Ansible % ansible -i hosts.gcp.yml all -vvv -m ping
ansible [core 2.11.6]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/Users/macbook/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/Cellar/ansible/4.7.0/libexec/lib/python3.9/site-packages/ansible
ansible collection location = /Users/macbook/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.9.7 (default, Oct 13 2021, 06:45:31) [Clang 13.0.0 (clang-1300.0.29.3)]
jinja version = 3.0.2
libyaml = True
Using /etc/ansible/ansible.cfg as config file
redirecting (type: inventory) ansible.builtin.gcp_compute to
Parsed /Users/macbook/xxxx/Projects/xxxx/Ansible/hosts.gcp.yml inventory source with plugin
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
META: ran handlers
<> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/Users/macbook/.ansible/cp/026bb454d7 '/bin/sh -c '"'"'echo ~ && sleep 0'"'"''
<34.X.X.8> (255, b'', b'macbook#34.X.X.8: Permission denied (publickey).\r\n')
34.X.X.8 | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: macbook#34.X.X.8: Permission denied (publickey).",
"unreachable": true
macbook#MacBooks-MacBook-Pro Ansible % ansible -i hosts.gcp.yml all -u ansible -vvv -m ping
ansible [core 2.11.6]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/Users/macbook/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/Cellar/ansible/4.7.0/libexec/lib/python3.9/site-packages/ansible
ansible collection location = /Users/macbook/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.9.7 (default, Oct 13 2021, 06:45:31) [Clang 13.0.0 (clang-1300.0.29.3)]
jinja version = 3.0.2
libyaml = True
Using /etc/ansible/ansible.cfg as config file
redirecting (type: inventory) ansible.builtin.gcp_compute to
Parsed /Users/macbook/xxxx/Projects/xxx/Ansible/hosts.gcp.yml inventory source with plugin
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
META: ran handlers
<> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ansible"' -o ConnectTimeout=10 -o ControlPath=/Users/macbook/.ansible/cp/46d2477dfb '/bin/sh -c '"'"'echo ~ansible && sleep 0'"'"''
<34.X.X.8> (255, b'', b'ansible#34.X.X.8: Permission denied (publickey).\r\n')
34.X.X.8 | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ansible#34.X.X.8: Permission denied (publickey).",
"unreachable": true
Dynamic inventory used only for collect data of your machines. If you want to get access into it, you should use SSH.
You must add your ssh-public key into VM's config and specify username
Add these lines in your ansible.cfg into the [defaults] section:
host_key_checking = false
remote_user = <username that you specify in VM's config>
private_key_file = <path to private ssh-key>
Most probably Ansible can't establish ssh connection to the hosts (listed in hosts.gcp.yml) because they don't recognize ssh key of the machine that tries to ping them.
Since you're using a macbook it's clear it's not a GCP VM. This means your GCP VM's don't have it's public ssh key by default.
You can add your macboook's key (found in ~ssh/ to the list of authorized keys that all GCP VM's will accept without any action on your side.
As for the first question - it's clearly DNS issue - however I'm not versed enough with this tool so You'd have tell if you can ping all the VM's using their DNS names directly from your mac's terminal. If so then the issue will be with Ansible configuration - otherwise it's DNS issue that prevent's your computer from using DNS names of your VM's.
Additionally - ansible-inventory --graph i /file/path works "offline" and will only show the structure of your inventory regardles if it exists or works.
There are a couple of points in your question, one about inventory and one about connections.
Your hosts.gcp.yml file is for a dynamic inventory plugin, as you said. What that means is that Ansible will run the GCP inventory plugin using the settings in that file, and the plugin will call GCP's API and generate a list of hosts to use as inventory. What the ansible-inventory command returns is what the ansible command will use also. In the example bit of output you pasted into your question, it looks like "test" is the only host it sees.
When you run the ansible command it will run the module against each host. It will first get the hostname returned by inventory, and then connect to that host using the transport type you specified. This is true even for the ping module. From the ping module's doc page: "This is NOT ICMP ping, this is just a trivial test module that requires Python on the remote-node." Meaning, it makes a connection.
Potential Gotchas
Is inventory returning the correct hostname for your environment?
What is the connection type you're using?
As for hostname, you set "hostnames" to "name" in your inventory file. Just be sure that's right. It might not be in your case.
As for connection type, if you haven't configured it, then by default it will be "smart", which uses SSH. You can find what you're using by doing this:
ansible-config dump | grep DEFAULT_TRANSPORT
You can change the connection type with the --connection option to the ansible command, or any of the other ways ansible lets you specify config options. Connection type is set independently from inventory type. They are two separate steps. The connection type is set via config or the command line option and is not based on what inventory plugin you're using.
Your Problem
To resolve your problem, figure out what hostnames ansible-inventory is actually returning, and what connection type you're using. Then see if you can connect to that hostname using that connection type. If the hostname being returned is "test" and your connection type is "smart" or "ssh", then try actually connecting with ssh to "test". From the command line, literally do ssh test. If that succeeds, then ansible should successfully connect to that host when it's run. If that doesn't succeed, then you have to do whatever you need to do to fix it in order for ansible to run successfully. Likewise, if you set a connection plugin different from SSH, then you should try to connect to your host using whatever that connection method uses in order to ensure that those types of connections are actually working.
More info about all this can be found in ansible's user guide. See, for example, "Connecting to remote nodes".
Having this Dockerfile:
FROM fedora:30
RUN dnf upgrade -y \
&& dnf install -y \
openssh-clients \
openvpn \
slirp4netns \
&& dnf clean all
CMD ["openvpn", "--config", "/vpn/ovpn.config", "--auth-user-pass", "/vpn/ovpn.auth"]
Building the image with:
podman build -t peque/vpn .
If I try to run it with (note $(pwd), where the VPN configuration and credentials are stored):
podman run -v $(pwd):/vpn:Z --cap-add=NET_ADMIN --device=/dev/net/tun -it peque/vpn
I get the following error:
ERROR: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)
Any ideas on how could I fix this? I would not mind changing the base image if that could help (i.e.: to Alpine or anything else as long as it allows me to use openvpn for the connection).
System information
Using Podman 1.4.4 (rootless) and Fedora 30 distribution with kernel 5.1.19.
/dev/net/tun permissions
Running the container with:
podman run -v $(pwd):/vpn:Z --cap-add=NET_ADMIN --device=/dev/net/tun -it peque/vpn
Then, from the container, I can:
# ls -l /dev/ | grep net
drwxr-xr-x. 2 root root 60 Jul 23 07:31 net
I can also list /dev/net, but will get a "permission denied error":
# ls -l /dev/net
ls: cannot access '/dev/net/tun': Permission denied
total 0
-????????? ? ? ? ? ? tun
Trying --privileged
If I try with --privileged:
podman run -v $(pwd):/vpn:Z --privileged --cap-add=NET_ADMIN --device=/dev/net/tun -it peque/vpn
Then instead of the permission-denied error (errno=13), I get a no-such-file-or-directory error (errno=2):
ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
I can effectively verify there is no /dev/net/ directory when using --privileged, even if I pass the --cap-add=NET_ADMIN --device=/dev/net/tun parameters.
Verbose log
This is the log I get when configuring the client with verb 3:
OpenVPN 2.4.7 x86_64-redhat-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
library versions: OpenSSL 1.1.1c FIPS 28 May 2019, LZO 2.08
Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
TCP/UDP: Preserving recently used remote address: [AF_INET]xx.xx.xx.xx:1194
Socket Buffers: R=[212992->212992] S=[212992->212992]
UDP link local (bound): [AF_INET][undef]:0
UDP link remote: [AF_INET]xx.xx.xx.xx:1194
TLS: Initial packet from [AF_INET]xx.xx.xx.xx:1194, sid=3ebc16fc 8cb6d6b1
WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
VERIFY OK: depth=1, C=ES, ST=XXX, L=XXX, O=XXXXX,, CN=internal-ca
Validating certificate extended key usage
++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
VERIFY OK: depth=0, C=ES, ST=XXX, L=XXX, O=XXXXX,, CN=ovpn.server.address
Control Channel: TLSv1.2, cipher TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA
[ovpn.server.address] Peer Connection Initiated with [AF_INET]xx.xx.xx.xx:1194
SENT CONTROL [ovpn.server.address]: 'PUSH_REQUEST' (status=1)
PUSH: Received control message: 'PUSH_REPLY,route xx.xx.xx.xx,route xx.xx.xx.0,dhcp-option DOMAIN,dhcp-option DNS xx.xx.xx.254,dhcp-option DNS xx.xx.xx.1,dhcp-option DNS xx.xx.xx.1,route-gateway xx.xx.xx.1,topology subnet,ping 10,ping-restart 60,ifconfig xx.xx.xx.24,peer-id 1'
OPTIONS IMPORT: timers and/or timeouts modified
OPTIONS IMPORT: --ifconfig/up options modified
OPTIONS IMPORT: route options modified
OPTIONS IMPORT: route-related options modified
OPTIONS IMPORT: --ip-win32 and/or --dhcp-option options modified
OPTIONS IMPORT: peer-id set
OPTIONS IMPORT: adjusting link_mtu to 1624
Outgoing Data Channel: Cipher 'AES-128-CBC' initialized with 128 bit key
Outgoing Data Channel: Using 160 bit message hash 'SHA1' for HMAC authentication
Incoming Data Channel: Cipher 'AES-128-CBC' initialized with 128 bit key
Incoming Data Channel: Using 160 bit message hash 'SHA1' for HMAC authentication
ROUTE_GATEWAY xx.xx.xx.xx/ IFACE=tap0 HWADDR=0a:38:ba:e6:4b:5f
ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Exiting due to fatal error
Error number may change depending on whether I run the command with --privileged or not.
It turns out that you are blocked by SELinux: after running the client container and trying to access /dev/net/tun inside it, you will get the following AVC denial in the audit log:
type=AVC msg=audit(1563869264.270:833): avc: denied { getattr } for pid=11429 comm="ls" path="/dev/net/tun" dev="devtmpfs" ino=15236 scontext=system_u:system_r:container_t:s0:c502,c803 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=0
To allow your container configuring the tunnel while staying not fully privileged and with SELinux enforced, you need to customize SELinux policies a bit. However, I did not find an easy way to do this properly.
Luckily, there is a tool called udica, which can generate SELinux policies from container configurations. It does not provide the desired policy on its own and requires some manual intervention, so I will describe how I got the openvpn container working step-by-step.
First, install the required tools:
$ sudo dnf install policycoreutils-python-utils policycoreutils udica
Create the container with required privileges, then generate the policy for this container:
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --name ovpn peque/vpn
$ podman inspect ovpn | sudo udica -j - ovpn_container
Policy ovpn_container created!
Please load these modules using:
# semodule -i ovpn_container.cil /usr/share/udica/templates/base_container.cil
Restart the container with: "--security-opt label=type:ovpn_container.process" parameter
Here is the policy which was generated by udica:
$ cat ovpn_container.cil
(block ovpn_container
(blockinherit container)
(allow process process ( capability ( chown dac_override fsetid fowner mknod net_raw setgid setuid setfcap setpcap net_bind_service sys_chroot kill audit_write net_admin )))
(allow process default_t ( dir ( open read getattr lock search ioctl add_name remove_name write )))
(allow process default_t ( file ( getattr read write append ioctl lock map open create )))
(allow process default_t ( sock_file ( getattr read write append open )))
Let's try this policy (note the --security-opt option, which tells podman to run the container in newly created domain):
$ sudo semodule -i ovpn_container.cil /usr/share/udica/templates/base_container.cil
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --security-opt label=type:ovpn_container.process peque/vpn
ERROR: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)
Ugh. Here is the problem: the policy generated by udica still does not know about specific requirements of our container, as they are not reflected in its configuration (well, probably, it is possible to infer that you want to allow operations on tun_tap_device_t based on the fact that you requested --device /dev/net/tun, but...). So, we need to customize the policy by extending it with few more statements.
Let's disable SELinux temporarily and run the container to collect the expected denials:
$ sudo setenforce 0
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --security-opt label=type:ovpn_container.process peque/vpn
These are:
$ sudo grep denied /var/log/audit/audit.log
type=AVC msg=audit(1563889218.937:839): avc: denied { read write } for pid=3272 comm="openvpn" name="tun" dev="devtmpfs" ino=15178 scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=1
type=AVC msg=audit(1563889218.937:840): avc: denied { open } for pid=3272 comm="openvpn" path="/dev/net/tun" dev="devtmpfs" ino=15178 scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=1
type=AVC msg=audit(1563889218.937:841): avc: denied { ioctl } for pid=3272 comm="openvpn" path="/dev/net/tun" dev="devtmpfs" ino=15178 ioctlcmd=0x54ca scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=1
type=AVC msg=audit(1563889218.947:842): avc: denied { nlmsg_write } for pid=3273 comm="ip" scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tclass=netlink_route_socket permissive=1
Or more human-readable:
$ sudo grep denied /var/log/audit/audit.log | audit2allow
#============= ovpn_container.process ==============
allow ovpn_container.process self:netlink_route_socket nlmsg_write;
allow ovpn_container.process tun_tap_device_t:chr_file { ioctl open read write };
OK, let's modify the udica-generated policy by adding the advised allows to it (note, that here I manually translated the syntax to CIL):
(block ovpn_container
(blockinherit container)
(allow process process ( capability ( chown dac_override fsetid fowner mknod net_raw setgid setuid setfcap setpcap net_bind_service sys_chroot kill audit_write net_admin )))
(allow process default_t ( dir ( open read getattr lock search ioctl add_name remove_name write )))
(allow process default_t ( file ( getattr read write append ioctl lock map open create )))
(allow process default_t ( sock_file ( getattr read write append open )))
; This is our new stuff.
(allow process tun_tap_device_t ( chr_file ( ioctl open read write )))
(allow process self ( netlink_route_socket ( nlmsg_write )))
Now we enable SELinux back, reload the module and check that the container works correctly when we specify our custom domain:
$ sudo setenforce 1
$ sudo semodule -r ovpn_container
$ sudo semodule -i ovpn_container.cil /usr/share/udica/templates/base_container.cil
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --security-opt label=type:ovpn_container.process peque/vpn
Initialization Sequence Completed
Finally, check that other containers still have no these privileges:
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z peque/vpn
ERROR: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)
Yay! We stay with SELinux on, and allow the tunnel configuration only to our specific container.
I have 2 google compute engine instances, I've installed mpich on both.
When I try to run a sample I get Host key verification failed.
Detailed version:
I've followed this tutorial in order to get this task done:
I have 2 google compute engine vms with ubuntu 14.04 (the google cloud account is a trial one, btw). I've downloaded this version of mpich on both instances:
/mpich-3.3rc1.tar.gz and I installed it using these steps:
./configure --disable-fortran
sudo make
sudo make install
This is the way the /etc/hosts file looks on the master-node: localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts metadata client master linux
1 # Added by Google # Added by Google
And this is the way the /etc/hosts file looks on the client-node: localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts metadata master client linux
2 # Added by Google # Added by Google
The rest of the steps involved adding an user named mpiuser on both nodes and configuring passwordless ssh authentication between the nodes. And configuring a cloud shared directory between nodes.
The configuration worked till this point. I've downloaded this file to /home/mpiuser/cloud/mpi_sample.c, compiled it this way:
mpicc -o mpi_sample mpi_sample.c
and issued this command on the master node while logged in as the mpiuser:
mpirun -np 2 -hosts client,master ./mpi_sample
and I got this error:
Host key verification failed.
What's wrong? I've tried to troubleshoot this problem over more than 2 days but I can't get a valid solution.
in ".gcloudignore file".
And deploy it again.
It turned out that my password less ssh wasn't configured properly. I've created 2 new instances and did the following things to get a working password less and thus get a working version of that sample. The following steps were execute on an ubuntu server 18.04.
First, by default, instances on google cloud have PasswordAuthentication setting turned off. In the client server do:
sudo vim /etc/ssh/sshd_config
and change PasswordAuthentication no to PasswordAuthentication yes. Then
sudo systemctl restart ssh
Generate a ssh key from the master server with:
ssh-keygen -t rsa -b 4096 -C ""
Copy the generated ssh key from the master server to the client
ssh-copy-id client
Now you get a fully functional password less ssh from master to client. However mpich still failed.
The additional steps that I did was to copy the public key to the ~/.ssh/authorized_keys file, both on master and client. So execute this command from both servers:
sudo cat .ssh/ >> .ssh/authorized_keys
Then make sure the /etc/ssh/sshd_config files from both the client and server have the following configurations:
PasswordAuthentication no
ChallengeResponseAuthentication no
UsePAM no
Restart the ssh service from both client and master
sudo systemctl restart ssh
And that's it, mpich works smoothly now.
I have several salt states(base and pillars) already written and present in Amazon s3. I want to re-use the salt states instead of writing the salt state again. I want to create an AMI image using packer and apply the salt-states that I have downloaded from s3 to the Packer Builder EC2 instance. Even if the salt-minion is installed on the CentOS -7 machine, I have installed salt-master service as well and started both salt-minion and salt-master by following commands.
cat > /etc/salt/minion.d/minion_id.conf <<'EOT' id: ${host} # id salt-minion id EOT
Generate the name of the master to connect to
cat > /etc/salt/minion.d/master_name.conf <<'EOT' master: localhost EOT
systemctl enable salt-minion
systemctl start salt-minion
systemctl enable salt-master
systemctl start salt-master
When running the below command it doesn't list any minions:
salt-key -L Accepted Keys: Denied Keys: Unaccepted Keys: Rejected Keys:
So the salt 'localhost-*' state.sls state.high_state
fails with errors:
"No minions matched the target. No command was sent, no jid was assigned.
ERROR: No return received"
This is because no minionid is created from salt-key.
Anybody has any idea why the salt-key is not being shown with salt-minion and how can i resolve this issue by running the existing salt-state successfully downloaded from s3 will work in AMI image?
What could be happening is that your minions can't find (resolve/DNS) the master salt.
What you could do is add the IP of your master to your minions /etc/salt/minion something like this:
Replace with the IP of your master
Later restart your minion and check the master again for requests.
I have a django site which I would like to deploy to a Digital Ocean server everytime a branch is merged to master. I have it mostly working and have followed this tutorial.
language: python
- '2.7'
depth: false
- openssl aes-256-cbc -K *removed encryption details* -in travis_rsa.enc -out travis_rsa -d
- chmod 600 travis_rsa
- pip install -r backend/requirements.txt
- pip install -q Django==$DJANGO_VERSION
- cp backend/local.env backend/.env
script: python test
skip_cleanup: true
provider: script
script: "./"
all_branches: true - runs when the travis 'deploy' task calls it
# print outputs and exit on first failure
set -xe
if [ $TRAVIS_BRANCH == "master" ] ; then
# setup ssh agent, git config and remote
echo -e "Host\n\tStrictHostKeyChecking no\n" >> ~/.ssh/config
eval "$(ssh-agent -s)"
ssh-add travis_rsa
git remote add deploy ""
git config "Travis CI"
git config ""
git add .
git status # debug
git commit -m "Deploy compressed files"
git push -f deploy HEAD:master
echo "Git Push Done"
ssh -i travis_rsa -o UserKnownHostsFile=/dev/null 'cd /home/dean/se_dockets/backend; echo hello; ./'
echo "No deploy script for branch '$TRAVIS_BRANCH'"
Everything works find until things get to the 'deploy' stage. I keep getting error messages like:
The ECDSA host key for has changed,
and the key for the corresponding IP address *REDACTED FOR STACK OVERFLOW*
is unknown. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
Please contact your system administrator.
Add correct host key in /home/travis/.ssh/known_hosts to get rid of this message.
Offending RSA key in /home/travis/.ssh/known_hosts:11
remove with: ssh-keygen -f "/home/travis/.ssh/known_hosts" -R
Password authentication is disabled to avoid man-in-the-middle attacks.
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Permission denied (publickey,password).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Script failed with status 128
INTERESTINGLY - If I re-run this job the 'git push' command will succeed at pushing to the deploy remote (my server). However, the next step in the deploy script which is to SSH into the server and run some postupdate commands will fail for the same reason (hosts fingerprint change or something). Or, it will ask for password (it has none) and will hang on the input prompt.
Additionally when I debug the Travis CI build and use the SSH url you're given to SSH into the machine Travis CI runs on - I can SSH into my own server from it. However it takes multiple tries to get around the errors.
So - this seems to be a fluid problem with stuff persisting from builds into the next on retries causing different errors/endings.
As you can see in my .yml file and the deploy script I have attempted to disable various host name checks and added the domain to known hosts etc... all to no avail.
I know I have things 99% set up correctly as things do mostly succeed when I retry the job a few times.
Anyone seen this before?