Running Docker container randomly disappears on AWS EC2 Ubuntu

Running Docker container randomly disappears on AWS EC2 Ubuntu - amazon-web-services

I'm running Docker on a t2.micro AWS EC2 instance with Ubuntu.
I'm running several containers. One of my long-running containers (always the same) just disappeared after running about 2-5 days for the third time right now. It is just gone with no sign of a crash.
The machine has not been restarted (uptime says 15 days).
I do not use the --rm flag: docker run -d --name mycontainer myimage.
There is no exited zombie of this container when running docker ps -a.
There is no log, i.e. docker logs mycontainer does not find any container.
There is no log entry in journalctl -u docker.service within the time frame
where the container disappears. However, there are some other log entries
regarding another container (let's call it othercontainer) which are
occuring repeatedly about every 6 minutes (it's a cronjob, don't know if relevant):
could not remove cluster networks: This node is not a swarm manager. Use
"docker swarm init" or "docker swarm join" to connect this node to swarm
and try again
Handler for GET /v1.24/networks/othercontainer_default returned error:
network othercontainer_default not found
Firewalld running: false
Even if there would be e.g. an out-of-memory issue or if my application just exits, I would still have an exited Docker container zombie in the ps -a overview, probably with exist status 0 or != 0, right?
I also don't want to --restart automatically, I just want to see the exited container.
Where can I look for more details to trace the issue?
Versions:
OS: Ubuntu 16.04.2 LTS (Kernel: 4.4.0-1013-aws)
Docker: Docker version 17.03.1-ce, build c6d412e

Thanks to a hint to look at dmesg or maybe the general journalctl I think I finally found the issue.
Somehow, one of the cronjobs has been running docker system prune -f at its end every 5 minutes. This command basically seems to remove everything unused and non-running.
I didn't know about this command before but certainly this has to be the way how my exited containers got removed without me knowing how it happened.

Related

How to keep a container running in ECS

We are deploying an application in ECS, which is exiting due to some error.
We need to log in to the container and check the logs, however the container stops when the application exits after the error.
How can I keep the container running so that I can ssh in to it?
I tried using tail -f /dev/null in the startup.sh script, which is run at the container startup.
I need to run the startup.sh script to configure the SSH, etc.
However, it looks like executing tail -f /dev/null at the end of the scripts does not seem to keep the container running.
Appreciate any advice on how to keep an ECS container running.

Cloud Composer GKE Node upgrade results in Airflow task randomly failing

The problem:
I have a managed Cloud composer environment, under a 1.9.7-gke.6 Kubernetes cluster master.
I tried to upgrade it (as well as the default-pool nodes) to 1.10.7-gke.1, since an upgrade was available.
Since then, Airflow has been acting randomly. Tasks that were working properly are failing for no given reason. This makes Airflow unusable, since the scheduling becomes unreliable.
Here is an example of a task that runs every 15 minutes and for which the behavior is very visible right after the upgrade:
airflow_tree_view
On hover on a failing task, it only shows an Operator: null message (null_operator). Also, there is no log at all for that task.
I have been able to reproduce the situation with another Composer environment in order to ensure that the upgrade is the cause of the dysfunction.
What I have tried so far :
I assumed the upgrade might have screwed up either the scheduler or Celery (Cloud composer defaults to CeleryExecutor).
I tried restarting the scheduler with the following command:
kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -
I also tried to restart Celery from inside the workers, with
kubectl exec -it airflow-worker-799dc94759-7vck4 -- sudo celery multi restart 1
Celery restarts, but it doesn't fix the issue.
So I tried to restart the airflow completely the same way I did with airflow-scheduler.
None of these fixed the issue.
Side note, I can't access Flower to monitor Celery when following this tutorial (Google Cloud - Connecting to Flower). Connecting to localhost:5555 stay in 'waiting' state forever. I don't know if it is related.
Let me know if I'm missing something!

1.10.7-gke.2 is available now [1]. Can you further upgrade to 1.10.7-gke.2 to see if the issue persists?
[1] https://cloud.google.com/kubernetes-engine/release-notes

gdb in docker container returns "ptrace: Operation not permitted."

I've checked /proc/sys/kernel/yama/ptrace_scope in the container and on the host - both report the value as zero but when attached to pid one gdb reports
Reading symbols from /opt/my-web-proxy/bin/my-web-proxy...done.
Attaching to program: /opt/my-web-proxy/bin/my-web-proxy, process 1
ptrace: Operation not permitted.
I've also tried attached to the container with the privileged flag
docker exec --privileged -it mywebproxy_my-proxy_1 /bin/bash
Host OS is Fedora 25 with docker from their repos and container is a official centos6.8

I discovered the answer - the container needs to be started with strace capabilities
Adding this to my docker-compose.yml file allows GDB to work
cap_add:
- SYS_PTRACE
Or it can also be passed on the docker command line with --cap-add=SYS_PTRACE

Getting ember to run under docker on Windows Quickstart

Working through this tutorial on setting up ember-cli in a Docker container:
http://www.rkblog.rk.edu.pl/w/p/setting-ember-cli-development-environment-ember-21/
Here are my steps:
Created docker-compose.yml in an empty folder on the host machine
Launched Docker Quickstart to get a terminal
Changed to the folder with the .yml
Ran the two docker-compose commands below from the terminal (added -d because without that you get a message that interactive mode is not supported)
Ran docker ps -a to verify that the container was running
Ran docker inspect CONTAINER_ID to find the ip address of the running container
Found the IP address at an odd location (172.17.0.2)
Attempted to access port 4200 on that IP from the host Windows machine browser and also from the Docker CL via curl but without success.
Ran docker ps -a and found that both containers that had been instantiated had exited.
Now if I try to start the container again it just exits immediately
docker-compose run -d --rm ember init
docker-compose run -d --rm ember server
What am I missing to get up and running? Do I need to open ports on the Default VM running in Virtualbox? How do I diagnose why the container keeps exiting?

First I would suggest using docker-compose up, that is most likely what you want.
To see the logs for a detached container you can run docker logs <container name>. If there are any errors you'll see them there.
A likely cause of the "container exit" is because the process goes into the background. Docker requires a process to stay in the foreground, but many serve commands will background by default. To keep the process in the foreground you can sometimes add use a flag like --foreground or --no-daemon, but I'm not sure if one exists for ember.
If that flag doesn't exist, it's likely that ember server is just checking if stdin/stdout are connected to a tty. By default they are not. You can add these lines to your docker-compose.yml to fix it:
stdin_open: True
tty: True

Ok finally resolved it. The issue with the module resolution may have been long file name resolution on windows because after I moved the source folder to the root of the host I was able to get ember serve running under windows.
Then from the terminal window I ran the commands to init and launch ember-server
docker-compose run -d --rm ember init
docker-compose run -d --rm ember server
Then did:
docker-compose up -d
which launched the containers successfully and then I was able to access the Ember page served up at the IP:Port specified earlier in the comments
http://192.168.99.100:4200/

docker dead but pidfile exists

I was working on docker on AWS instance and it was working fine. On one day, docker stopped working. When i restarted docker "service docker start", it started and "service docker status" returned "docker dead but pidfile exists" message and docker commands did not executed. When i inspected log file, it showed following messages:
msg="+job serveapi(unix:///var/run/docker.sock)"
msg="Listening for HTTP on unix (/var/run/docker.sock)"
msg="There are no more loopback devices available."
msg="loopback mounting failed"
To start docker, i removed pid file from /var/run/docker.pid, /var/run/docker.sock and also removed docker from /var/lock/subsys/docker and restarted docker. But no gain. It still gives same error on start "docker dead but pidfile exists".
Please help.

This ticket could be related with the loopback issue.
https://github.com/docker/docker/issues/7058
So, please check output from losetup -l and ls -l /dev/loop*
EDIT: If ls -l /dev/loop* returns an error, most likely cause is the github ticket I indicated, and then you would need something like
#!/bin/bash
for i in {0..6}
do
mknod -m0660 /dev/loop$i b 7 $i
done
(taken from the mentioned issue)
Also, if you only want to restart, you may need to umount /var/lib/docker/devicemapper or any mounted volume of type aufs

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js