I am using openshift-ansible (https://github.com/openshift/openshift-ansible) that was partially customized for our needs. The part launching the instances was modified to set the group_id nothing more was changed in it.
When creating a master openshift all works fine. However when creating 2 nodes of openshift I can see the 2 instances being created in the "Running instance" panel of the EC2 Dashboard. The instances are for a few seconds in state Initializing and they automatically switch to "Shutting down"
Ansible on its side was still in the task of launching the instances. So my question is:
Is there a way to analyze logs of the instances of AWS when new instances are being created ?
Log of the last ansible task:
TASK: [Launch instance(s)]
**************************************************** REMOTE_MODULE ec2 region=eu-west-1 keypair=ggkey1-eu-west
state=present instance_type=m3.large user_data='#cloud-config mounts:
- [ xvdb ] - [ ephemeral0 ] write_files: - content: | DEVS=/dev/xvdb VG=docker_vg path: /etc/sysconfig/docker-storage-setup owner:
root:root permissions: '"'"'0644'"'"' ' vpc_subnet_id=subnet-60cf1205
image=ami-33ba2a44 count=2 EXEC ['/bin/sh', '-c', 'mkdir
-p $HOME/.ansible/tmp/ansible-tmp-1441977401.88-262307796372076 && echo $HOME/.ansible/tmp/ansible-tmp-1441977401.88-262307796372076']
PUT /tmp/tmp4r8qve TO
/root/.ansible/tmp/ansible-tmp-1441977401.88-262307796372076/ec2
EXEC ['/bin/sh', '-c', u'LANG=C LC_CTYPE=C /usr/bin/env
python2
/root/.ansible/tmp/ansible-tmp-1441977401.88-262307796372076/ec2; rm
-rf /root/.ansible/tmp/ansible-tmp-1441977401.88-262307796372076/ >/dev/null 2>&1'] failed: [localhost] => {"failed": true} msg: wait for instances running timeout on Fri Sep 11 13:21:43 2015
$ ansible --version
ansible 1.9.2
configured module search path = None
$ uname -a
Linux ip-172-31-42-45 3.10.0-123.8.1.el7.x86_64 #1 SMP Mon Sep 22 19:06:58 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
root#ip-172-31-42-45 : ~/uha-rbox-spawner$
Thanks,
Is there a way to analyze logs of the instances of AWS when new instances are being created ?
You are looking for "get console output". You can see it in the AWS (http) console, or you can fetch it from awscli or the API of your choice.
"get console output" is slightly confusing since the AWS Console is also a "console". Think of it as "system logs" (as the Console does), or simply "what would show on a screen in a datacenter".
Related
Configuration
I followed the steps in the below links to set up my GCP dynamic inventory.
https://docs.ansible.com/ansible/latest/scenario_guides/guide_gce.html
http://matthieure.me/2018/12/31/ansible_inventory_plugin.html
In short, it was the below steps
I installed the needed requisites.
$ pip install requests google-auth1
I created a service account with sufficient privileges. and set it's
credentials.
I added the below to the /etc/ansible/ansible.cfg file
[inventory]
enable_plugins = gcp_compute
I created a file called hosts.gcp.yml which holds the dynamic inventory setup (as shown below):
projects:
- my-project-id
hostnames:
- name
filters: []
auth_kind: serviceaccount
service_account_file: my/credentials_path.json
keyed_groups:
- key: zone
and tried to run the below command which worked fine
macbook#MacBooks-MacBook-Pro Ansible % ansible-inventory --graph -i hosts.gcp.yml
#all:
|--#_us_central1_a:
| |--test
|--#ungrouped:
but when running the below command I got the following errors
macbook#MacBooks-MacBook-Pro Ansible % ansible -i hosts.gcp.yml all -m ping
test | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname test: nodename nor servname provided, or not known",
"unreachable": true
}
I then commented out the - name option from the hosts.gcp.yml file but got another error.
macbook#MacBooks-MacBook-Pro Ansible % ansible -i hosts.gcp.yml all -m ping
34.X.X.8 | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: macbook#34.X.X.8: Permission denied (publickey).",
"unreachable": true
}
This raises the following questions
1- Is an SSH setup (creating users and copying ssh-keys) needed on the host machines when using dynamic Inventories (I don't think so)?
2- Why is ansible resorting to SSH though a dynamic Inventory is set? What if the host didn't expose SSH to the public or didn't have a public IP?
Your kind support is highly appreciated.
Thanks.
A more verbose output of the test
macbook#MacBooks-MacBook-Pro Ansible % ansible -i hosts.gcp.yml all -vvv -m ping
ansible [core 2.11.6]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/Users/macbook/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/Cellar/ansible/4.7.0/libexec/lib/python3.9/site-packages/ansible
ansible collection location = /Users/macbook/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.9.7 (default, Oct 13 2021, 06:45:31) [Clang 13.0.0 (clang-1300.0.29.3)]
jinja version = 3.0.2
libyaml = True
Using /etc/ansible/ansible.cfg as config file
redirecting (type: inventory) ansible.builtin.gcp_compute to google.cloud.gcp_compute
Parsed /Users/macbook/xxxx/Projects/xxxx/Ansible/hosts.gcp.yml inventory source with ansible_collections.google.cloud.plugins.inventory.gcp_compute plugin
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
META: ran handlers
<34.132.201.8> ESTABLISH SSH CONNECTION FOR USER: None
<34.132.201.8> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/Users/macbook/.ansible/cp/026bb454d7 34.132.201.8 '/bin/sh -c '"'"'echo ~ && sleep 0'"'"''
<34.X.X.8> (255, b'', b'macbook#34.X.X.8: Permission denied (publickey).\r\n')
34.X.X.8 | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: macbook#34.X.X.8: Permission denied (publickey).",
"unreachable": true
}
macbook#MacBooks-MacBook-Pro Ansible % ansible -i hosts.gcp.yml all -u ansible -vvv -m ping
ansible [core 2.11.6]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/Users/macbook/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/Cellar/ansible/4.7.0/libexec/lib/python3.9/site-packages/ansible
ansible collection location = /Users/macbook/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.9.7 (default, Oct 13 2021, 06:45:31) [Clang 13.0.0 (clang-1300.0.29.3)]
jinja version = 3.0.2
libyaml = True
Using /etc/ansible/ansible.cfg as config file
redirecting (type: inventory) ansible.builtin.gcp_compute to google.cloud.gcp_compute
Parsed /Users/macbook/xxxx/Projects/xxx/Ansible/hosts.gcp.yml inventory source with ansible_collections.google.cloud.plugins.inventory.gcp_compute plugin
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
META: ran handlers
<34.132.201.8> ESTABLISH SSH CONNECTION FOR USER: ansible
<34.132.201.8> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ansible"' -o ConnectTimeout=10 -o ControlPath=/Users/macbook/.ansible/cp/46d2477dfb 34.132.201.8 '/bin/sh -c '"'"'echo ~ansible && sleep 0'"'"''
<34.X.X.8> (255, b'', b'ansible#34.X.X.8: Permission denied (publickey).\r\n')
34.X.X.8 | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ansible#34.X.X.8: Permission denied (publickey).",
"unreachable": true
}
Dynamic inventory used only for collect data of your machines. If you want to get access into it, you should use SSH.
You must add your ssh-public key into VM's config and specify username
Add these lines in your ansible.cfg into the [defaults] section:
host_key_checking = false
remote_user = <username that you specify in VM's config>
private_key_file = <path to private ssh-key>
Most probably Ansible can't establish ssh connection to the hosts (listed in hosts.gcp.yml) because they don't recognize ssh key of the machine that tries to ping them.
Since you're using a macbook it's clear it's not a GCP VM. This means your GCP VM's don't have it's public ssh key by default.
You can add your macboook's key (found in ~ssh/id_rsa.pub) to the list of authorized keys that all GCP VM's will accept without any action on your side.
As for the first question - it's clearly DNS issue - however I'm not versed enough with this tool so You'd have tell if you can ping all the VM's using their DNS names directly from your mac's terminal. If so then the issue will be with Ansible configuration - otherwise it's DNS issue that prevent's your computer from using DNS names of your VM's.
Additionally - ansible-inventory --graph i /file/path works "offline" and will only show the structure of your inventory regardles if it exists or works.
There are a couple of points in your question, one about inventory and one about connections.
Inventory
Your hosts.gcp.yml file is for a dynamic inventory plugin, as you said. What that means is that Ansible will run the GCP inventory plugin using the settings in that file, and the plugin will call GCP's API and generate a list of hosts to use as inventory. What the ansible-inventory command returns is what the ansible command will use also. In the example bit of output you pasted into your question, it looks like "test" is the only host it sees.
Connections
When you run the ansible command it will run the module against each host. It will first get the hostname returned by inventory, and then connect to that host using the transport type you specified. This is true even for the ping module. From the ping module's doc page: "This is NOT ICMP ping, this is just a trivial test module that requires Python on the remote-node." Meaning, it makes a connection.
Potential Gotchas
Is inventory returning the correct hostname for your environment?
What is the connection type you're using?
As for hostname, you set "hostnames" to "name" in your inventory file. Just be sure that's right. It might not be in your case.
As for connection type, if you haven't configured it, then by default it will be "smart", which uses SSH. You can find what you're using by doing this:
ansible-config dump | grep DEFAULT_TRANSPORT
You can change the connection type with the --connection option to the ansible command, or any of the other ways ansible lets you specify config options. Connection type is set independently from inventory type. They are two separate steps. The connection type is set via config or the command line option and is not based on what inventory plugin you're using.
Your Problem
To resolve your problem, figure out what hostnames ansible-inventory is actually returning, and what connection type you're using. Then see if you can connect to that hostname using that connection type. If the hostname being returned is "test" and your connection type is "smart" or "ssh", then try actually connecting with ssh to "test". From the command line, literally do ssh test. If that succeeds, then ansible should successfully connect to that host when it's run. If that doesn't succeed, then you have to do whatever you need to do to fix it in order for ansible to run successfully. Likewise, if you set a connection plugin different from SSH, then you should try to connect to your host using whatever that connection method uses in order to ensure that those types of connections are actually working.
More info about all this can be found in ansible's user guide. See, for example, "Connecting to remote nodes".
I'm attempting to create a ssh tunnel, when deploying an application to aws beanstalk. I want to put the tunnel as a background process, that is always connected on application deploy. The script is hanging forever on the deployment and I can't see why.
"/home/ec2-user/eclair-ssh-tunnel.sh":
mode: "000500" # u+rx
owner: root
group: root
content: |
cd /root
eval $(ssh-agent -s)
DISPLAY=":0.0" SSH_ASKPASS="./askpass_script" ssh-add eclair-test-key </dev/null
# we want this command to keep running in the backgriund
# so we add & at then end
nohup ssh -L 48682:localhost:8080 ubuntu#[host...] -N &
and here is the output I'm getting from /var/log/eb-activity.log:
[2019-06-14T14:53:23.268Z] INFO [15615] - [Application update suredbits-api-root-0.37.0-testnet-ssh-tunnel-fix-port-9#30/AppDeployStage1/AppDeployPostHook/01_eclair-ssh-tunnel.sh] : Starting activity...
The ssh tunnel is spawned, and I can find it by doing:
[ec2-user#ip-172-31-25-154 ~]$ ps aux | grep 48682
root 16047 0.0 0.0 175560 6704 ? S 14:53 0:00 ssh -L 48682:localhost:8080 ubuntu#ec2-34-221-186-19.us-west-2.compute.amazonaws.com -N
If I kill that process, the deployment continues as expected, which indicates that the bug is in the tunnel script. I can't seem to find out where though.
You need to add -n option to ssh when run it in background to avoid reading from stdin.
I'm trying to deploy a django application on elasticbeanstalk. It has been working fine then suddenly stopped and I cannot figure out why.
When I do eb deploy I get
INFO: Environment update is starting.
INFO: Deploying new version to instance(s).
INFO: New application version was deployed to running EC2 instances.
INFO: Environment update completed successfully.
Alert: An update to the EB CLI is available. Run "pip install --upgrade awsebcli" to get the latest version.
INFO: Attempting to open port 22.
INFO: SSH port 22 open.
INFO: Running ssh -i /home/ubuntu/.ssh/web-cdi_011017.pem ec2-user#54.188.214.227 if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' /etc/httpd/conf.d/wsgi.conf ; then echo -e 'WSGIApplicationGroup %{GLOBAL}' | sudo tee -a /etc/httpd/conf.d/wsgi.conf; fi;
INFO: Attempting to open port 22.
INFO: SSH port 22 open.
INFO: Running ssh -i /home/ubuntu/.ssh/web-cdi_011017.pem ec2-user#54.188.214.227 sudo /etc/init.d/httpd reload
Reloading httpd: [ OK ]
When I then run eb health, I get
Incorrect application version found on all instances. Expected version
"app-c56a-190604_135423" (deployment 300).
If I eb ssh and look in /opt/python/current there is nothing there so nothing is being copied across
I think something may be wrong with .elasticbeanstalk/config.yml. Somehow the directory was deleted and setup again. This is the config.yml
branch-defaults:
master:
environment: app-prod
scoring-dev:
environment: app-dev
environment-defaults:
app-prod:
branch: null
repository: null
global:
application_name: my-app
default_ec2_keyname: am-app_011017
default_platform: arn:aws:elasticbeanstalk:us-west-2::platform/Python 2.7 running
on 64bit Amazon Linux/2.3.1
default_region: us-west-2
include_git_submodules: true
instance_profile: null
platform_name: null
platform_version: null
profile: null
sc: git
workspace_type: Application
Please, any ideas about how to troubleshoot?
I upgraded to the latest AWS stack for python 2.7 and that sorted it
I faced the same problem and the cause the command timeout
Default max deployment time -Command timeout- is 600 (10 minutes)
Your Environment → Configuration → Deployment preferences → Command timeout
Increase the Deployment preferences for example 1800
or upgrade the instance type to work faster
I'm using Ansible: 2.2.0.0
I have 3 machines:
Two vagrant boxes (one CentOS 7.x and one Ubuntu 14.04) and
3rd box is an EC2 Amazon Linux instance (Amazon Linux AMI release
2016.03).
On these boxes, I'm running the following command and getting valid output (as listed below):
CentOS:
[vagrant#myvagrant ~] $ ansible all -m setup -i "`hostname`," --connection=local -a "filter=ansible_distribution*"
myvagrant | SUCCESS => {
"ansible_facts": {
"ansible_distribution": "CentOS",
"ansible_distribution_major_version": "7",
"ansible_distribution_release": "Core",
"ansible_distribution_version": "7.2.1511"
},
"changed": false
}
Ubuntu:
vagrant#myubuntuvagrant:~$ ansible all -m setup -i "`hostname`," --connection=local -a "filter=ansible_distribution*"
myubuntuvagrant | SUCCESS => {
"ansible_facts": {
"ansible_distribution": "Ubuntu",
"ansible_distribution_major_version": "14",
"ansible_distribution_release": "trusty",
"ansible_distribution_version": "14.04"
},
"changed": false
}
vagrant#myubuntuvagrant:~$
Amazon EC2 instance/box:
$ ansible all -m setup -i "`hostname`," --connection=local -a "filter=ansible_distribution*"
ip-10-200-1-145 | SUCCESS => {
"ansible_facts": {
"ansible_distribution": "Amazon",
"ansible_distribution_major_version": "NA",
"ansible_distribution_release": "NA",
"ansible_distribution_version": "2016.03"
},
"changed": false
}
In one of my Ansible playbook / templates/yum.repos.d.file.j2 file, I'm using {{ ansible_distribution_major_version }} variable and using it's value in a .repo file for the baseurl property's value for CentOS/Amazon EC2 instance only i.e. when: ansible_distribution == "CentOS" or ansible_distribution == "Amazon".
baseurl=https://packagecloud.io/company/packages/telegraf/el/6/$basearch
PS: I'm not looking for Ubuntu (as that part is working fine with using apt-get in my playbook for both setting the apt-get source list and installing the package).
My question:
Why ansible facter variable is not setting any valid valid for ansible_distribution_major_release version for Amazon EC2 instance? What facter_*/ansible_* can I use which will work in all 3 OS types.
PS: When I used baseurl's value with ../el/6/.. in it (inside the yum.repos.d/target-pacakge.amazon-os.repo file), yum install worked fine for installing the package on Amazon linux box (though, using ../el/7/.. in baseurl didn't work). See here for more details: https://packagecloud.io/docs#os_distro_version (under heading: Enterprise Linux (CentOS, RedHat, Amazon Linux))
If you use following, set_fact, you don't have to specifically handle ansible_distribution_major_version in tasks for all three OS types.
pre_tasks:
- set_fact: ansible_distribution_major_version=6
when: ansible_distribution == "Amazon" and ansible_distribution_major_version == "NA"
Here's some relevant values from an Amazon Linux 2 docker image:
# docker images|grep amazon
amazonlinux 2 d656eea421ba 4 weeks ago 162MB
amazonlinux latest d656eea421ba 4 weeks ago 162MB
# docker run --init --rm -it amazonlinux:2 cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
After installing ansible within the docker image:
# ansible --version
ansible 2.7.2
# Relevant items from output of 'ansible -m setup'
ansible_distribution: Amazon
ansible_distribution_file_parsed: true
ansible_distribution_file_path: /etc/system-release
ansible_distribution_file_variety: Amazon
ansible_distribution_major_version: NA
ansible_distribution_release: NA
ansible_distribution_version: 2
ansible_os_family: RedHat
ansible_pkg_mgr: yum
ansible_service_mgr: sysvinit # should this be 'systemd'?
ansible_system_vendor: NA
ansible_virtualization_role: guest
ansible_virtualization_type: docker
and on an actual EC2 Amazon Linux 2 instance:
[root#ip-xxx ~]# ansible --version
ansible 2.7.2
ansible_os_family: RedHat
ansible_pkg_mgr: yum
ansible_service_mgr: systemd
ansible_system: Linux
ansible_system_vendor: Xen
ansible_virtualization_role: guest
ansible_virtualization_type: xen
I'm trying to deploy my application with docker and elastic beanstalk. My Dockerrun.aws.json file looks like
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "jvans/maven_weekly",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "5000"
}],
"Volumes": [
{
"HostDirectory": "/Users/jamesvanneman/Code/maven_weekly/maven_weekly",
"ContainerDirectory": "/maven_weekly"
}
],
"Logging": "/var/log/nginx"
}
I created this application with eb create and when I run eb deploy I get
Docker container quit unexpectedly after launch: Docker container quit
unexpectedly on Mon Sep 21 01:15:12 UTC 2015:. Check snapshot logs for details.
Hook /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
In var/log/eb-activity.log I see the following errors:
Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Mon Sep 21 01:08:52 UTC 2015:. Check snapshot logs for details. (ElasticBeanstalk::ExternalInvocationError)
caused by: 83ea9b7f9a069eeb8351fef7aaedb8374f7dfe300a5e0aaeba0fe17600583175
[2015-09-21T01:08:52.205Z] INFO [2246] - [Application deployment/StartupStage1/AppDeployEnactHook/00run.sh] : Activity failed.
So it seems like there's a problem with a startup script. If i ssh into the container and try to run it manually I don't really get any extra help from error messages.
eb ssh
sudo /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh
Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Mon Sep 21 01:34:52 UTC 2015:. Check snapshot logs for details.
Msg: Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Mon Sep 21 01:34:52 UTC 2015:. Check snapshot logs for details.
Are snapshot logs something different than what's in var/log/eb-activity.log? Any Idea what is going on/how I can debug this further?
Docker dumps are stored in the host box at /var/log/eb-docker/containers/.
Go there and you'll find the docker startup crash log that should indicate the root cause of your problem.
You want to look at
/var/log/eb-docker/containers/eb-current-app/unexpected-quit.log
in the bundle downloaded by eb logs --all or using eb ssh. This log file will have the stdout and stderr of your application before it crashed.
In my case, unexpected-quit.log was empty and the other log files in /var/log/ didn't help.
So I found out why the container was crashing by manually running the image on the server.
[ec2-user#ip-xxx-xxx-xxx-xxx ~]$ nano Dockerfile # or clone the entire project
[ec2-user#ip-xxx-xxx-xxx-xxx ~]$ sudo docker build -t test .
[ec2-user#ip-xxx-xxx-xxx-xxx ~]$ sudo docker run --rm test
Exception in thread "main" …
I fixed that exception, redeployed a new version and now the container starts without issues.
In my case I did an eb ssh to connect into the ec2 instance and looked at my server logs. It was an application error. Nothing to do with AWS. Fixed it. Tried again and it's working!