I've taken over the maintenance of a live web project that utilizes docker containers. Immediately, I've noticed that the web app goes down after a couple of hours, and docker ps -a shows me:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9b02f1352f15 nginx:latest "nginx -g 'daemon off" 9 weeks ago Exited (1) 14 hours ago 80/tcp, 443/tcp, 0.0.0.0:80->8000/tcp ng01
8079b3d3b398 webapp_web "gunicorn --error-log" 9 weeks ago Exited (1) 14 hours ago 8000/tcp webapp_web_1
564fe0b72fa6 d0f5f9c3d3a6 "/bin/sh -c 'apt-get " 12 weeks ago Exited (0) 12 weeks ago modest_perlman
6cddbfcfa8f6 d0f5f9c3d3a6 "/bin/sh -c 'apt-get " 12 weeks ago Exited (0) 12 weeks ago backstabbing_goldwasser
7460be4f4451 postgres "/docker-entrypoint.s" 4 months ago Exited (1) 14 hours ago 5432/tcp webapp_db_1
Notice the 3 containers that exited 14 hours ago - those relate to the web app. How do I diagnose/fix this problem? Being a beginner, I'm struggling here. Thanks in advance! Following are some diagnostics I tried to run.
I used docker logs on the errant containers to see what could be going wrong.
docker logs 9b02f1352f15 (nginx) is empty.
docker logs 8079b3d3b398 (application server - gunicorn) shows many incidences of:
Exception in thread Thread-725265:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/site-packages/unirest/__init__.py", line 97, in __request
response = urllib2.urlopen(req, timeout=_timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/usr/local/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/usr/local/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/local/lib/python2.7/site-packages/poster/streaminghttp.py", line 142, in http_open
return self.do_open(StreamingHTTPConnection, req)
File "/usr/local/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 101] Network is unreachable>
docker logs 7460be4f4451 (postgresql backend) shows many incidences of:
LOG: database system was interrupted; last known up at 2017-01-22 12:42:46 UTC
LOG: database system was not properly shut down; automatic recovery in progress
LOG: invalid record length at 0/17AAD28
LOG: redo is not required
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
In case it matters, doing tail -f docker.log at /var/run/upstart/docker.log gives the output:
INFO[0000] Firewalld running: false
time="2017-01-23T14:38:47.142345718Z" level=error msg="devmapper: Error unmounting device 3da5c7e87cc8969249d7ed8b15c9cea9296feaefeba62fde534b4c183e4edbd4: Device is Busy"
time="2017-01-23T14:38:47.142542018Z" level=error msg="Error unmounting container 8079b3d3b3988a793537c4116bd12c70823415cb84068021d25658a316d8f568: Device is Busy"
INFO[0000] Firewalld running: false
INFO[0000] Firewalld running: false
time="2017-01-23T14:40:20.694963580Z" level=error msg="devmapper: Error unmounting device 6fd51632808dede3fea81b4f19fb84f1a13d93c38917e2845d0776de6a2ef941: Device is Busy"
time="2017-01-23T14:40:20.695010680Z" level=error msg="Error unmounting container 9b02f1352f15447acb7669bff918db0eeed58dc832fff565f4b2a4236474db1f: Device is Busy"
INFO[0000] Firewalld running: false
time="2017-01-23T14:41:27.307003457Z" level=error msg="devmapper: Error unmounting device 7c5a35dc9fc1929e57de43f5efc8e05a9442782c7496713d313c95fe62910f7b: Device is Busy"
time="2017-01-23T14:41:27.307059457Z" level=error msg="Error unmounting container 7460be4f445102274dd4aba4f113db23c17f58b9f771fc5c474d174766ae593c: Device is Busy"
I also tried docker inspect on all three. Following are the results relating to state - there seems to be nothing wrong here.
docker inspect 9b02f1352f15(nginx):
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 1,
"Error": "",
"StartedAt": "2017-01-22T12:42:47.142236155Z",
"FinishedAt": "2017-01-22T23:38:46.7038628Z"
},
docker inspect 8079b3d3b398 (application server - gunicorn):
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 1,
"Error": "",
"StartedAt": "2017-01-22T12:42:42.602662338Z",
"FinishedAt": "2017-01-22T23:38:46.5945186Z"
},
docker inspect 7460be4f4451 (postgresql backend):
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 1,
"Error": "",
"StartedAt": "2017-01-22T12:42:34.413283342Z",
"FinishedAt": "2017-01-22T23:38:46.5102334Z"
},
docker-compose.yml is simply:
version: '2'
services:
db:
image: postgres
web:
build: .
command: gunicorn --error-logfile err.log myapp.wsgi:application -b 0.0.0.0:8000
volumes:
- .:/code
expose:
- "8000"
depends_on:
- db
nginx:
image: nginx:latest
container_name: ng01
ports:
- "80:8000"
volumes:
- .:/src
- ./config/nginx:/etc/nginx/conf.d
depends_on:
- web
Related
I have 3 docker containers - my_django_app, rabbitmq, and celery_worker. I have implemented it on my local system using docker-compose.yml which is as follows:
version: '3'
services:
web: &my_django_app
build: .
command: python3 manage.py runserver 0.0.0.0:8000
ports:
- "80:8000"
depends_on:
- rabbitmq
rabbitmq:
image: rabbitmq:latest
celery_worker:
<<: *my_django_app
command: celery -A MyDjangoApp worker --autoscale=10,1 --loglevel=info
ports: []
depends_on:
- rabbitmq
When I run this on my local system, it works perfectly fine. And then I deployed these images to AWS Elastic Beanstalk (Multi container environment) using Dockerrun.aws.json which is as follows:
{
"AWSEBDockerrunVersion": 2,
"Authentication": {
"Bucket": "cred-keeper",
"Key": "index.docker.io/.dockercfg"
},
"containerDefinitions": [{
"Authentication": {
"Bucket": "cred-keeper",
"Key": "index.docker.io/.dockercfg"
},
"command": [
"celery",
"-A",
"MyDjangoApp",
"worker",
"--autoscale=10,1",
"--loglevel=info"
],
"essential": true,
"image": "myName/my_django_app:latest",
"name": "celery_worker",
"memory": 150
},
{
"essential": true,
"image": "rabbitmq:latest",
"name": "rabbitmq",
"memory": 256,
},
{
"Authentication": {
"Bucket": "cred-keeper",
"Key": "index.docker.io/.dockercfg"
},
"command": [
"python3",
"manage.py",
"runserver",
"0.0.0.0:8000"
],
"essential": true,
"image": "myName/my_django_app:latest",
"memory": 256,
"name": "web",
"portMappings": [{
"containerPort": 8000,
"hostPort": 80
}]
}
],
"family": "",
"volumes": []
}
I saw the logs for the 3 containers by downloading the logs from AWS Elastic Beanstalk, and the containers web as well as rabbitmq are working just fine, but celery_worker shows logs like:
[2020-06-30 20:17:22,885: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**#rabbitmq:5672//: failed to resolve broker hostname.
Trying again in 2.00 seconds... (1/100)
[2020-06-30 20:17:24,898: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**#rabbitmq:5672//: failed to resolve broker hostname.
Trying again in 4.00 seconds... (2/100)
[2020-06-30 20:17:28,914: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**#rabbitmq:5672//: failed to resolve broker hostname.
Trying again in 6.00 seconds... (3/100)
.
.
.
[2020-06-30 20:16:45,662: CRITICAL/MainProcess] Unrecoverable error: OperationalError('failed to resolve broker hostname')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/amqp/transport.py", line 137, in _connect
host, port, family, socket.SOCK_STREAM, SOL_TCP)
File "/usr/local/lib/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 439, in _reraise_as_library_errors
yield
File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 430, in ensure_connection
callback, timeout=timeout)
File "/usr/local/lib/python3.7/site-packages/kombu/utils/functional.py", line 344, in retry_over_time
return fun(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 283, in connect
return self.connection
File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 839, in connection
self._connection = self._establish_connection()
File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 794, in _establish_connection
conn = self.transport.establish_connection()
File "/usr/local/lib/python3.7/site-packages/kombu/transport/pyamqp.py", line 130, in establish_connection
conn.connect()
File "/usr/local/lib/python3.7/site-packages/amqp/connection.py", line 311, in connect
self.transport.connect()
File "/usr/local/lib/python3.7/site-packages/amqp/transport.py", line 77, in connect
self._connect(self.host, self.port, self.connect_timeout)
File "/usr/local/lib/python3.7/site-packages/amqp/transport.py", line 148, in _connect
"failed to resolve broker hostname"))
OSError: failed to resolve broker hostname
My CELERY_BROKER_URL in my Django's settings is "amqp://rabbitmq". Also, my celery.py is as:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'Speeve.settings')
app = Celery('MyDjangoApp')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
What do I need to do in order for my celery container to work properly on AWS Elastic Beanstack?
Please help!
I want to make AMI file from packer and ansible.
I have tried many configuration, but I have still a problem of connection to the instance.
Here is my packer conf:
{
"variables": {
"aws_access_key": "{{env `AWS_ACCESS_KEY_ID`}}",
"aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}",
"region": "us-east-1"
},
"builders": [
{
"type": "amazon-ebs",
"access_key": "{{ user `aws_access_key` }}",
"secret_key": "{{ user `aws_secret_key` }}",
"region": "{{ user `region` }}",
"instance_type": "t2.micro",
"source_ami_filter": {
"filters": {
"virtualization-type": "hvm",
"name": "*Windows_Server-2012-R2*English-64Bit-Base*",
"root-device-type": "ebs"
},
"most_recent": true,
"owners": "amazon"
},
"ami_name": "packer-demo-{{timestamp}}",
"user_data_file": "userdata/windows-aws.txt",
"communicator": "winrm",
"winrm_username": "Administrator"
}],
"provisioners": [{
"type": "powershell",
"inline": [
"dir c:\\"
]
},
{
"type": "ansible",
"playbook_file": "./win-playbook.yml",
"extra_arguments": [
"--connection", "packer", "-vvv",
"--extra-vars", "ansible_shell_type=powershell ansible_shell_executable=None"
]
}]
}
The User data script is activating winrm on the AWS instance:
<powershell>
winrm quickconfig -q
winrm set winrm/config/winrs '#{MaxMemoryPerShellMB="300"}'
winrm set winrm/config '#{MaxTimeoutms="1800000"}'
winrm set winrm/config/service '#{AllowUnencrypted="true"}'
winrm set winrm/config/service/auth '#{Basic="true"}'
netsh advfirewall firewall add rule name="WinRM 5985" protocol=TCP dir=in localport=5985 action=allow
netsh advfirewall firewall add rule name="WinRM 5986" protocol=TCP dir=in localport=5986 action=allow
net stop winrm
sc config winrm start=auto
net start winrm
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope LocalMachine
</powershell>
Here is win-playbook.yml file:
---
- hosts: all
tasks:
- win_ping:
I do have the packer.py installed in the ~/.ansible/plugins/connection_plugins/ directory and configured in ~/.ansible.cfg:
root#ip-172-31-30-11:~/demo# grep connection_plugins /etc/ansible/ansible.cfg
connection_plugins = /root/.ansible/plugins/connection_plugins
root#ip-172-31-30-11:~/demo# ll /root/.ansible/plugins/connection_plugins
total 16
drwx------ 2 root root 4096 May 2 16:58 ./
drwx------ 4 root root 4096 May 2 17:11 ../
-rwx--x--x 1 root root 511 May 2 16:53 packer.py*
and then this is output error:
==> amazon-ebs: Provisioning with Ansible...
==> amazon-ebs: Executing Ansible: ansible-playbook --extra-vars packer_build_name=amazon-ebs packer_builder_type=amazon-ebs -i /tmp/packer-provisioner-ansible962278842 /root/demo/win-playbook.yml -e ansible_ssh_private_key_file=/tmp/ansible-key842946567 --connection packer -vvv --extra-vars ansible_shell_type=powershell ansible_shell_executable=None
amazon-ebs: ansible-playbook 2.5.2
amazon-ebs: config file = /etc/ansible/ansible.cfg
amazon-ebs: configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
amazon-ebs: ansible python module location = /usr/lib/python2.7/dist-packages/ansible
amazon-ebs: executable location = /usr/bin/ansible-playbook
amazon-ebs: python version = 2.7.12 (default, Dec 4 2017, 14:50:18) [GCC 5.4.0 20160609]
amazon-ebs: Using /etc/ansible/ansible.cfg as config file
amazon-ebs: Parsed /tmp/packer-provisioner-ansible962278842 inventory source with ini plugin
amazon-ebs:
amazon-ebs: PLAYBOOK: win-playbook.yml *****************************************************
amazon-ebs: 1 plays in /root/demo/win-playbook.yml
amazon-ebs:
amazon-ebs: PLAY [all] *********************************************************************
amazon-ebs:
amazon-ebs: TASK [Gathering Facts] *********************************************************
amazon-ebs: task path: /root/demo/win-playbook.yml:2
amazon-ebs: Using module file /usr/lib/python2.7/dist-packages/ansible/modules/windows/setup.ps1
amazon-ebs: <127.0.0.1> ESTABLISH SSH CONNECTION FOR USER: root
amazon-ebs: The full traceback is:
amazon-ebs: Traceback (most recent call last):
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 138, in run
amazon-ebs: res = self._execute()
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 558, in _execute
amazon-ebs: result = self._handler.run(task_vars=variables)
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/plugins/action/normal.py", line 46, in run
amazon-ebs: result = merge_hash(result, self._execute_module(task_vars=task_vars, wrap_async=wrap_async))
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/plugins/action/__init__.py", line 705, in _execute_module
amazon-ebs: self._make_tmp_path()
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/plugins/action/__init__.py", line 251, in _make_tmp_path
amazon-ebs: result = self._low_level_execute_command(cmd, sudoable=False)
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/plugins/action/__init__.py", line 902, in _low_level_execute_command
amazon-ebs: rc, stdout, stderr = self._connection.exec_command(cmd, in_data=in_data, sudoable=sudoable)
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/plugins/connection/ssh.py", line 976, in exec_command
amazon-ebs: use_tty = self.get_option('use_tty')
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/plugins/__init__.py", line 58, in get_option
amazon-ebs: option_value = C.config.get_config_value(option, plugin_type=get_plugin_class(self), plugin_name=self._load_name, variables=hostvars)
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/config/manager.py", line 284, in get_config_value
amazon-ebs: value, _drop = self.get_config_value_and_origin(config, cfile=cfile, plugin_type=plugin_type, plugin_name=plugin_name, keys=keys, variables=variables)
amazon-ebs: File "/usr/lib/python2.7/dist-packages/ansible/config/manager.py", line 304, in get_config_value_and_origin
amazon-ebs: defs = self._plugins[plugin_type][plugin_name]
amazon-ebs: KeyError: 'connection'
amazon-ebs: fatal: [default]: FAILED! => {
amazon-ebs: "msg": "Unexpected failure during module execution.",
amazon-ebs: "stdout": ""
amazon-ebs: }
amazon-ebs: to retry, use: --limit #/root/demo/win-playbook.retry
amazon-ebs:
amazon-ebs: PLAY RECAP *********************************************************************
amazon-ebs: default : ok=0 changed=0 unreachable=0 failed=1
packer version: 1.2.3
ansible version: 2.5.2
It looks like this issue is common for Ansible 2.5.x and Packer. Adarobin commented on the packer issue https://github.com/hashicorp/packer/issues/5845. We ran into the same issue, tested the solution and it worked for us.
I was hitting the KeyError: 'connection' issue with Ansible 2.5 on
Packer 1.2.2 with the AWS builder and I think I have discovered the
issue. It looks like Ansible now requires plugins to have a
documentation string. I copied the documentation string from the SSH
connection plugin (since that is what the packer plugin is based on)
made a few changes and my packer.py now looks like this.
https://gist.github.com/adarobin/2f02b8b993936233e15d76f6cddb9e00
Trying to run a simple gather_ facts playbook using Ansible. I can connect via SSH using the user credentials with no issues but for a reason I cannot get my head around the playbook fails with the following message:
2017-10-07 22:57:44,248 ncclient.transport.ssh Unknown exception: cannot import name aead
OS: Ubuntu (Ubuntu 16.04.3 LTS)
Destination Router: Virtualbox JunOS Olive [12.1R1.9]
Ansible Version: 2.4.0.0
hosts:
[all:vars]
ansible_python_interpreter=/usr/bin/python
ansible_connection = local
[junos]
lab.r1
Playbook:
---
- hosts: junos
gather_facts: no
tasks:
- name: obtain login credentials
include_vars: ../auth/secrets.yml
- name: Checking NETCONF connectivity
wait_for: host={{ inventory_hostname }} port=830 timeout=5
- name: Gather Facts
junos_facts:
host: "{{ inventory_hostname }}"
username: "{{ creds['username'] }}"
password: "{{ creds['password'] }}"
register: junos
- name: version
debug: msg="{{ junos.facts.version }}"
Playbook output:
$ ansible-playbook -vvvv junos-get_facts.yml
ansible-playbook 2.4.0.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/usr/local/lib/python2.7/dist-packages/ansible/modules']
ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
executable location = /usr/local/bin/ansible-playbook
python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]
Using /etc/ansible/ansible.cfg as config file
setting up inventory plugins
Parsed /etc/ansible/hosts inventory source with ini plugin
Loading callback plugin default of type stdout, v2.0 from /usr/local/lib/python2.7/dist-packages/ansible/plugins/callback/__init__.pyc
PLAYBOOK: junos-get_facts.yml ******************************************************************************************************************
1 plays in junos-get_facts.yml
PLAY [junos] ***********************************************************************************************************************************
META: ran handlers
TASK [obtain login credentials] ****************************************************************************************************************
task path: /usr/local/share/ansible/junos/junos-get_facts.yml:6
Trying secret FileVaultSecret(filename='/usr/local/share/ansible/auth/vault/vault_pass.py') for vault_id=default
ok: [lab.r1] => {
"ansible_facts": {
"creds": {
"password": "*******",
"username": "ansible"
}
},
"ansible_included_var_files": [
"/usr/local/share/ansible/junos/../auth/secrets.yml"
],
"changed": false,
"failed": false
}
TASK [Checking NETCONF connectivity] ***********************************************************************************************************
task path: /usr/local/share/ansible/junos/junos-get_facts.yml:9
Using module file /usr/local/lib/python2.7/dist-packages/ansible/modules/utilities/logic/wait_for.py
<lab.r1> ESTABLISH LOCAL CONNECTION FOR USER: ansible
<lab.r1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1507431462.1-117888621897412 `" && echo ansible-tmp-1507431462.1-117888621897412="` echo $HOME/.ansible/tmp/ansible-tmp-1507431462.1-117888621897412 `" ) && sleep 0'
<lab.r1> PUT /tmp/tmpW193y0 TO /usr/local/share/ansible/.ansible/tmp/ansible-tmp-1507431462.1-117888621897412/wait_for.py
<lab.r1> EXEC /bin/sh -c 'chmod u+x /usr/local/share/ansible/.ansible/tmp/ansible-tmp-1507431462.1-117888621897412/ /usr/local/share/ansible/.ansible/tmp/ansible-tmp-1507431462.1-117888621897412/wait_for.py && sleep 0'
<lab.r1> EXEC /bin/sh -c '/usr/bin/python /usr/local/share/ansible/.ansible/tmp/ansible-tmp-1507431462.1-117888621897412/wait_for.py; rm -rf "/usr/local/share/ansible/.ansible/tmp/ansible-tmp-1507431462.1-117888621897412/" > /dev/null 2>&1 && sleep 0'
ok: [lab.r1] => {
"changed": false,
"elapsed": 0,
"failed": false,
"invocation": {
"module_args": {
"active_connection_states": [
"ESTABLISHED",
"FIN_WAIT1",
"FIN_WAIT2",
"SYN_RECV",
"SYN_SENT",
"TIME_WAIT"
],
"connect_timeout": 5,
"delay": 0,
"exclude_hosts": null,
"host": "lab.r1",
"msg": null,
"path": null,
"port": 830,
"search_regex": null,
"sleep": 1,
"state": "started",
"timeout": 5
}
},
"path": null,
"port": 830,
"search_regex": null,
"state": "started"
}
TASK [Gather Facts] ****************************************************************************************************************************
task path: /usr/local/share/ansible/junos/junos-get_facts.yml:12
<lab.r1> using connection plugin netconf
<lab.r1> socket_path: None
fatal: [lab.r1]: FAILED! => {
"changed": false,
"failed": true,
"msg": "unable to open shell. Please see: https://docs.ansible.com/ansible/network_debug_troubleshooting.html#unable-to-open-shell"
}
to retry, use: --limit #/usr/local/share/ansible/junos/junos-get_facts.retry
PLAY RECAP *************************************************************************************************************************************
lab.r1 : ok=2 changed=0 unreachable=0 failed=1
The detailed log output shows the following:
2017-10-07 23:19:51,177 p=2906 u=ansible | TASK [Gather Facts] ****************************************************************************************************************************
2017-10-07 23:19:51,180 p=2906 u=ansible | task path: /usr/local/share/ansible/junos/junos-get_facts.yml:12
2017-10-07 23:19:52,739 p=2937 u=ansible | creating new control socket for host lab.r1:830 as user ansible
2017-10-07 23:19:52,740 p=2937 u=ansible | control socket path is /usr/local/share/ansible/.ansible/pc/b52ae79c72
2017-10-07 23:19:52,740 p=2937 u=ansible | current working directory is /usr/local/share/ansible/junos
2017-10-07 23:19:52,741 p=2937 u=ansible | using connection plugin netconf
2017-10-07 23:19:52,937 p=2937 u=ansible | network_os is set to junos
2017-10-07 23:19:52,951 p=2937 u=ansible | ssh connection done, stating ncclient
2017-10-07 23:19:52,982 p=2937 u=ansible | failed to create control socket for host lab.r1
2017-10-07 23:19:52,985 p=2937 u=ansible | Traceback (most recent call last):
File "/usr/local/bin/ansible-connection", line 316, in main
server = Server(socket_path, pc)
File "/usr/local/bin/ansible-connection", line 112, in __init__
self.connection._connect()
File "/usr/local/lib/python2.7/dist-packages/ansible/plugins/connection/netconf.py", line 158, in _connect
ssh_config=ssh_config
File "/usr/local/lib/python2.7/dist-packages/ncclient/manager.py", line 154, in connect
return connect_ssh(*args, **kwds)
File "/usr/local/lib/python2.7/dist-packages/ncclient/manager.py", line 116, in connect_ssh
session.load_known_hosts()
File "/usr/local/lib/python2.7/dist-packages/ncclient/transport/ssh.py", line 299, in load_known_hosts
self._host_keys.load(filename)
File "/usr/local/lib/python2.7/dist-packages/paramiko/hostkeys.py", line 97, in load
e = HostKeyEntry.from_line(line, lineno)
File "/usr/local/lib/python2.7/dist-packages/paramiko/hostkeys.py", line 358, in from_line
key = ECDSAKey(data=decodebytes(key), validate_point=False)
File "/usr/local/lib/python2.7/dist-packages/paramiko/ecdsakey.py", line 156, in __init__
self.verifying_key = numbers.public_key(backend=default_backend())
File "/usr/local/lib/python2.7/dist-packages/cryptography/hazmat/backends/__init__.py", line 15, in default_backend
from cryptography.hazmat.backends.openssl.backend import backend
File "/usr/local/lib/python2.7/dist-packages/cryptography/hazmat/backends/openssl/__init__.py", line 7, in <module>
from cryptography.hazmat.backends.openssl.backend import backend
File "/usr/local/lib/python2.7/dist-packages/cryptography/hazmat/backends/openssl/backend.py", line 23, in <module>
from cryptography.hazmat.backends.openssl import aead
ImportError: cannot import name aead
2017-10-07 23:20:02,775 p=2906 u=ansible | fatal: [lab.r1]: FAILED! => {
"changed": false,
"failed": true,
"msg": "unable to open shell. Please see: https://docs.ansible.com/ansible/network_debug_troubleshooting.html#unable-to-open-shell"
}
Any help is appreciated.
The answer was:
Answered by "Paul Kehrer"
aead is being imported by the backend, but also can't be found. This sounds like it may be trying to import two different versions of cryptography. pycrypto is irrelevant here (it is an unrelated package). First I'd suggest upgrading cryptography, but since that aead was added in 2.0 you may need to make sure you don't have cryptography installed both via pip and also via your distribution package manager.
Once I removed pycrypto and cryptography via pip the playbook ran as expected:
TASK [version] *************************************************************************************************************************************************
task path: /usr/local/share/ansible/junos/junos-get_facts.yml:25
ok: [lab.r1] => {
"msg": "olive"
}
META: ran handlers
META: ran handlers
PLAY RECAP *****************************************************************************************************************************************************
lab.r1 : ok=5 changed=0 unreachable=0 failed=0
I'm trying to push a very simple rails app on a DigitalOcean droplet. Unfortunatly, i'm enable to continue the deployment : i get stuck with a very elusive error message :
info: [agent] get agent status
info: [agent] agent is running: true
info: [agent] get agent status
info: [agent] agent is running: true
info: [agent] get agent status
info: [agent] agent is running: true
info: [agent] get agent status
info: [agent] agent is running: true
info: Connecting to http://192.168.50.4:2375
PLAY [all] *********************************************************************
TASK [setup] *******************************************************************
fatal: [default]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "/bin/sh: 1: /usr/bin/python: not found\r\n", "msg": "MODULE FAILURE", "parsed": false}
NO MORE HOSTS LEFT *************************************************************
to retry, use: --limit #playbooks/setup.retry
PLAY RECAP *********************************************************************
default : ok=0 changed=0 unreachable=0 failed=1
Am i the only one who ever encounter this problem ?
Here is my Azkfile too :
/**
* Documentation: http://docs.azk.io/Azkfile.js
*/
// Adds the systems that shape your system
systems({
'apptelier-website': {
// Dependent systems
depends: [],
// More images: http://images.azk.io
image: {docker: 'azukiapp/ruby:2.3.0'},
// Steps to execute before running instances
provision: [
"bundle install --path /azk/bundler"
],
workdir: "/azk/#{manifest.dir}",
shell: "/bin/bash",
command: ["bundle", "exec", "rackup", "config.ru", "--pid", "/tmp/ruby.pid", "--port", "$HTTP_PORT", "--host", "0.0.0.0"],
wait: 20,
mounts: {
'/azk/#{manifest.dir}': sync("."),
'/azk/bundler': persistent("./bundler"),
'/azk/#{manifest.dir}/tmp': persistent("./tmp"),
'/azk/#{manifest.dir}/log': path("./log"),
'/azk/#{manifest.dir}/.bundle': path("./.bundle")
},
scalable: {"default": 1},
http: {
domains: [
'#{env.HOST_DOMAIN}', // used if deployed
'#{env.HOST_IP}', // used if deployed
'#{system.name}.#{azk.default_domain}' // default azk domain
]
},
ports: {
// exports global variables
http: "3000/tcp"
},
envs: {
// Make sure that the PORT value is the same as the one
// in ports/http below, and that it's also the same
// if you're setting it in a .env file
RUBY_ENV: "production",
RAILS_ENV: "production",
RACK_ENV: 'production',
WORKER_RETRY: 1,
BUNDLE_APP_CONFIG: '/azk/bundler',
APP_URL: '#{system.name}.#{azk.default_domain}'
}
},
deploy: {
image: {docker: 'azukiapp/deploy-digitalocean'},
mounts: {
'/azk/deploy/src': path('.'),
'/azk/deploy/.ssh': path('#{env.HOME}/.ssh'), // Required to connect with the remote server
'/azk/deploy/.config': persistent('deploy-config')
},
// This is not a server. Just run it with `azk deploy`
scalable: {default: 0, limit: 0},
envs: {
GIT_REF: 'master',
AZK_RESTART_COMMAND: 'azk restart -Rvv',
BOX_SIZE: '512mb'
}
}
});
Thanks for the help !
Edouard, nice to meet you. I'm from azk core team and I'm not sure if you're using the latest deployment image.
Please follow these steps to update it:
adocker pull azukiapp/deploy-digitalocean:0.0.7;
Edit the deploy system in your Azkfile and add the tag 0.0.7 to the used deployment image to ensure we're using the latest one. It should be like: image: {docker: 'azukiapp/deploy-digitalocean:0.0.7'};
Next, be sure you have the env DEPLOY_API_TOKEN set in your .env file. If you don't have it set yet, take a look on the Step 7 of the article we've published on DigitalOcean Community Tutorials: https://www.digitalocean.com/community/tutorials/how-to-deploy-a-rails-app-with-azk#step-7-%E2%80%94-obtaining-a-digitalocean-api-token
Finally, re-run the deploy command:
azk deploy clear-cache;
azk deploy
Please let me know if this is enough to solve your problem.
I am attempting to deploy a strongloop app to a Digitalocean remote box running Strongloop Process Manager. I have gotten as far as successfully running the deploy command as follows:
USER ~/projects/loopback/places-api $ slc deploy http://IPADDRESS deploy
Counting objects: 5215, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4781/4781), done.
Writing objects: 100% (5215/5215), 7.06 MiB | 4.27 MiB/s, done.
Total 5215 (delta 1130), reused 0 (delta 0)
To http://104.131.66.124:8701/api/services/1/deploy/default
* [new branch] deploy -> deploy
Deployed `deploy` as `placesAPI` to `http://IPADDRESS:8701/`
Next, I check the status of my Strongloop app by running the following command:
slc ctl -C http://IPADDRESS:8701
Service ID: 1
Service Name: placesAPI
Environment variables:
No environment variables defined
Instances:
Version Agent version Debugger version Cluster size Driver metadata
5.1.0 2.0.2 n/a 1 N/A
Processes:
ID PID WID Listening Ports Tracking objects? CPU profiling? Tracing? Debugging?
1.1.1050 1050 0
1.1.2065 2065 49
At this point, I am not able to access my app by visiting IPADDRESS:3001 as the Strongloop documentation would suggest and there are no processes listed in the above app status running on port 3001 as would be expected according to the Strongloop documentation.
Comparing my app status to the app status at this state of deployment shown in the Strongloop documentation, It appears I should have some processes listening to port 3001 which are not running in my app.
Here is the app status shown in the Strongloop documentation:
$ slc ctl -C http://prod.foo.com:7777
Service ID: 1
Service Name: appone
Environment variables:
No environment variables defined
Instances:
Version Agent version Cluster size
4.0.30 1.4.15 4
Processes:
ID PID WID Listening Ports Tracking objects? CPU profiling?
1.1.22555 22555 0
1.1.22741 22741 5 prod.foo.com:3001
1.1.22748 22748 6 prod.foo.com:3001
1.1.22773 22773 7 prod.foo.com:3001
1.1.22793 22793 8 prod.foo.com:3001
Notice the additional processes listening to port 3001.
My question is: how do I get my strongloop app to run and listen to these ports?
If it helps here are my package.json and config.json files:
:::::::::::::::::::::::package.json::::::::::::::::
{
"name": "placesAPI",
"version": "1.0.0",
"main": "server/server.js",
"scripts": {
"start": "node .",
"pretest": "jshint ."
},
"dependencies": {
"body-parser": "^1.9.0",
"compression": "^1.0.3",
"connect-ensure-login": "^0.1.1",
"cookie-parser": "^1.3.2",
"cors": "^2.5.2",
"errorhandler": "^1.1.1",
"express-flash": "0.0.2",
"express-session": "^1.7.6",
"jade": "^1.7.0",
"loopback": "^2.22.0",
"loopback-boot": "^2.6.5",
"loopback-component-explorer": "^2.1.0",
"loopback-component-passport": "^1.5.0",
"loopback-connector-postgresql": "^2.4.0",
"loopback-datasource-juggler": "^2.39.0",
"passport": "^0.3.2",
"passport-facebook": "^1.0.3",
"passport-google-oauth": "^0.2.0",
"passport-local": "^1.0.0",
"passport-oauth2": "^1.1.2",
"passport-twitter": "^1.0.3",
"serve-favicon": "^2.0.1"
},
"devDependencies": {
"jshint": "^2.5.6"
},
"repository": {
"type": "",
"url": ""
},
"description": "placesAPI",
"bundleDependencies": [
"body-parser",
"compression",
"connect-ensure-login",
"cookie-parser",
"cors",
"errorhandler",
"express-flash",
"express-session",
"jade",
"loopback",
"loopback-boot",
"loopback-component-explorer",
"loopback-component-passport",
"loopback-connector-postgresql",
"loopback-datasource-juggler",
"passport",
"passport-facebook",
"passport-oauth2",
"serve-favicon"
]
}
:::::::::::::::::::::::config.json::::::::::::::::
{
"restApiRoot": "/api",
"host": "0.0.0.0",
"port": 3000,
"cookieSecret": "REDACTED",
"remoting": {
"context": {
"enableHttpContext": false
},
"rest": {
"normalizeHttpPath": false,
"xml": false
},
"json": {
"strict": false,
"limit": "100kb"
},
"urlencoded": {
"extended": true,
"limit": "100kb"
},
"cors": false,
"errorHandler": {
"disableStackTrace": false
}
},
"legacyExplorer": false
}
There is also this error in the logs from log-dump:
2015-12-23T22:13:35.876Z pid:2720 worker:84 events.js:142
2015-12-23T22:13:35.882Z pid:2720 worker:84 throw er; // Unhandled 'error' event
2015-12-23T22:13:35.882Z pid:2720 worker:84 ^
2015-12-23T22:13:35.882Z pid:2720 worker:84 Error: connect ECONNREFUSED 127.0.0.1:5432
2015-12-23T22:13:35.882Z pid:2720 worker:84 at Object.exports._errnoException (util.js:856:11)
2015-12-23T22:13:35.882Z pid:2720 worker:84 at exports._exceptionWithHostPort (util.js:879:20)
2015-12-23T22:13:35.883Z pid:2720 worker:84 at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1064:14)
2015-12-23T22:13:35.919Z pid:1106 worker:0 ERROR supervisor worker id 84 (pid 2720) accidental exit with 1
2015-12-23T22:13:38.253Z pid:1106 worker:0 INFO supervisor started worker 85 (pid 2738)
2015-12-23T22:13:38.253Z pid:1106 worker:0 INFO supervisor resized to 1
2015-12-23T22:13:39.858Z pid:2738 worker:85 INFO strong-agent native addon missing, install a compiler
2015-12-23T22:13:39.859Z pid:2738 worker:85 INFO strong-agent v2.0.2 profiling app 'placesAPI' pid '2738'
2015-12-23T22:13:39.890Z pid:2738 worker:85 INFO strong-agent[2738] started profiling agent
2015-12-23T22:13:44.943Z pid:2738 worker:85 INFO strong-agent not profiling, agent metrics requires a valid license.
2015-12-23T22:13:44.944Z pid:2738 worker:85 Please contact sales#strongloop.com for assistance.
2015-12-23T22:13:44.992Z pid:2738 worker:85 Web server listening at: http://0.0.0.0:3001
2015-12-23T22:13:44.997Z pid:2738 worker:85 Browse your REST API at http://0.0.0.0:3001/explorer
2015-12-23T22:13:45.103Z pid:2738 worker:85 Connection fails: { [Error: connect ECONNREFUSED 127.0.0.1:5432]
2015-12-23T22:13:45.104Z pid:2738 worker:85 code: 'ECONNREFUSED',
2015-12-23T22:13:45.104Z pid:2738 worker:85 errno: 'ECONNREFUSED',
2015-12-23T22:13:45.104Z pid:2738 worker:85 syscall: 'connect',
2015-12-23T22:13:45.104Z pid:2738 worker:85 address: '127.0.0.1',
2015-12-23T22:13:45.104Z pid:2738 worker:85 port: 5432 }
2015-12-23T22:13:45.104Z pid:2738 worker:85 It will be retried for the next request.
Just to put this in an answer, from the package.json we can see that you have included the loopback-connector-postgresql and in the log we see an attempted connection to port 5432 which is the default for that DBMS. It's trying to connect on the localhost (127.0.0.1) and my guess is that Postgres is either not installed on your Digital Ocean box, or not running. You'll need to update the config for your DB, or install (and run) the DB on your DO droplet.
If you have different configs for dev vs production then you can set up an environment-specific datasources config file: datasources.production.json for example. In that file you would put your prod config, and in datasources.json you would have your dev (local) config. When using this method, be sure to set the NODE_ENV variable on your DO droplet to production (to match the name of the prod datasources config file).