We are deploying an application in ECS, which is exiting due to some error.
We need to log in to the container and check the logs, however the container stops when the application exits after the error.
How can I keep the container running so that I can ssh in to it?
I tried using tail -f /dev/null in the startup.sh script, which is run at the container startup.
I need to run the startup.sh script to configure the SSH, etc.
However, it looks like executing tail -f /dev/null at the end of the scripts does not seem to keep the container running.
Appreciate any advice on how to keep an ECS container running.
Related
I have a few Docker containers running on EC2 instances in AWS. In the past I have had situations where the Docker containers simply exit due to errors on the docker daemon, and they never start up even though the restart policies are in place (daemon is not running so I don't expect them to get up of course).
Since I am going on holiday I want to implement a quick and easy solution that would allow me to be notified if any containers have exited unexpectedly. The only quick solution I could find was using an Amazon Event Bridge rule for running a scheduled task every X minutes and executing a Systems Manager RunDockerAction command (docker ps) on the instances, but this does not give me any output except for the fact that the command has successfully executed on the instance.
Is there any way that I can get the output of such an Event Bridge task to send the results over an SNS topic if things go wrong?
IF you are running Linux on your AWS EC2 instance, then one solution is to use e-mail as a notification system. In that case, I would suggest the following:
On the AWS EC2 instance, create a Bash script that runs docker ps -a and combine that with a grep statement to filter on the docker container IDs that you want to monitor.
In the same Bash script, using echo and mail, you can e-mail yourself with statistics seen in the previous step. For example"
echo "${container} is not running" | mail -s "Alert! Docker container ${container} is not running!" "first.last#domain.com"
(The above relies on $container to be set appropriately. Use grep to filter out data of interest.)
Create a system crontab job (etc/crontab) and schedule the Bash script to run at your wanted interval.
This is only one possible solution, one that I use myself for quick checks at times.
I have a docker container running in a small AWS instance with limited disk space. The logs were getting bigger, so I used the commands below to delete the evergrowing log files:
sudo -s -H
find /var -name "*json.log" | grep docker | xargs -r rm
journalctl --vacuum-size=50M
Now I want to see what's the behaviour of one of the running docker containers, but it claims the log file has disappeared (from the rm command above):
ubuntu#x-y-z:~$ docker logs --follow name_of_running_docker_1
error from daemon in stream: Error grabbing logs: open /var/lib/docker/containers/d9562d25787aaf3af2a2bb7fd4bf00994f2fa1a4904979972adf817ea8fa57c3/d9562d25787aaf3af2a2bb7fd4bf00994f2fa1a4904979972adf817ea8fa57c3-json.log: no such file or directory
I would like to be able to see again what's going on in the running container, so I tried:
sudo touch /var/lib/docker/containers/d9562d25787aaf3af2a2bb7fd4bf00994f2fa1a4904979972adf817ea8fa57c3/d9562d25787aaf3af2a2bb7fd4bf00994f2fa1a4904979972adf817ea8fa57c3-json.log
And again docker follow, but while interacting with the software that should produce logs, I can see that nothing is happening.
Is there any way to rescue the printing into the log file again without killing (rebooting) the containers?
Is there any way to rescue the printing into the log file again without killing (rebooting) the containers?
Yes, but it's more of a trick than a real solution. You should never interact with /var/lib/docker data directly. As per Docker docs:
part of the host filesystem [which] is managed by Docker (/var/lib/docker/volumes/ on Linux). Non-Docker processes should not modify this part of the filesystem.
For this trick to work, you need to configure your Docker Daemon to keep containers alive during downtime before first running our container. For example, by setting your /etc/docker/daemon.json with:
{
"live-restore": true
}
This requires Daemon restart such as sudo systemctl restart docker.
Then create a container and delete its .log file:
$ docker run --name myhttpd -d httpd:alpine
$ sudo rm $(docker inspect myhttpd -f '{{ .LogPath }}')
# Docker is not happy
$ docker logs myhttpd
error from daemon in stream: Error grabbing logs: open /var/lib/docker/containers/xxx-json.log: no such file or directory
Restart Daemon (with live restore), this will cause Docker to somehow re-take management of our container and create our log file back. However, any logs generate before log file deletion are lost.
$ sudo systemctl restart docker
$ docker logs myhttpd # works! and log file is created back
Note: this is not a documented or official Docker feature, simply a behavior I observed with my own experimentations using Docker 19.03. It may not work with other Docker versions
With live restore enabled, our container process keeps running even though Docker Daemon is stopped. On Docker daemon restart, it probably somehow try to re-read from the still alive process stdout and stderr and redirect output to our log file (hence re-creating it)
I'm running into issues with our Kubernetes deployment. Recently we are running into a problem with one of the pods being restarted frequently.
The service inside is using C++, with Google Logging and should dump a stacktrace on a crash (it does do that when run locally).
Unfortunately, the only log message I was able to find, related to the pod restart is from containerd, just saying "shim reaped".
Do I need to turn on some extra logging/monitoring to have the reasons for restart retained?
Your can check crashed pod log by running
$ kubectl logs -f <pod name> -n <namespace> --previous
The pod could have been terminated for reasons like out of memory. Use kubectl describe pod <podname> which contains the information.
There should be output like this (could also be a different reason than OOM):
Last State: Terminated
Reason: OOMKilled
I made a task defination in AWS ECS as shown in screenshots:
Now when I run taskdefination in a cluster, it successfully run, but status of container remains unhealthy forever. And when I try same command(healthcheck command[curl command]) running inside container, I am able to run same command inside container, then it run successfully. I had also tried CMD instead of CMD-SHELL , but nothing working. Inside container apache is running at port 80.
Note I am making docker image by committing docker container not by Dockerfile.
Not getting why healthcheck is not working. Nothing significant found online. Please help if someone before had faced this issue.
You have made a mistake in command in HEALTHCHECK section (double pipe).
I suppose you want to use CMD-SHELL,curl --fail http://localhost/ || exit 1 instead of ...|exit 1|
Remove the double quotes around the command, should work
I'm running Docker on a t2.micro AWS EC2 instance with Ubuntu.
I'm running several containers. One of my long-running containers (always the same) just disappeared after running about 2-5 days for the third time right now. It is just gone with no sign of a crash.
The machine has not been restarted (uptime says 15 days).
I do not use the --rm flag: docker run -d --name mycontainer myimage.
There is no exited zombie of this container when running docker ps -a.
There is no log, i.e. docker logs mycontainer does not find any container.
There is no log entry in journalctl -u docker.service within the time frame
where the container disappears. However, there are some other log entries
regarding another container (let's call it othercontainer) which are
occuring repeatedly about every 6 minutes (it's a cronjob, don't know if relevant):
could not remove cluster networks: This node is not a swarm manager. Use
"docker swarm init" or "docker swarm join" to connect this node to swarm
and try again
Handler for GET /v1.24/networks/othercontainer_default returned error:
network othercontainer_default not found
Firewalld running: false
Even if there would be e.g. an out-of-memory issue or if my application just exits, I would still have an exited Docker container zombie in the ps -a overview, probably with exist status 0 or != 0, right?
I also don't want to --restart automatically, I just want to see the exited container.
Where can I look for more details to trace the issue?
Versions:
OS: Ubuntu 16.04.2 LTS (Kernel: 4.4.0-1013-aws)
Docker: Docker version 17.03.1-ce, build c6d412e
Thanks to a hint to look at dmesg or maybe the general journalctl I think I finally found the issue.
Somehow, one of the cronjobs has been running docker system prune -f at its end every 5 minutes. This command basically seems to remove everything unused and non-running.
I didn't know about this command before but certainly this has to be the way how my exited containers got removed without me knowing how it happened.