Google Cloud container orchestration and cleanup - google-cloud-platform

In short, I am looking to see if it is possible to run multiple Docker containers on the same machine via gcloud's create-with-container functionality (or similar). The idea is that there will be some "worker" container (which does some arbitrary work) which runs and completes, followed by a "cleanup" container which subsequently runs performing the same task each time.
Longer explanation:
I currently have an application that launches tasks that run inside Docker containers on Google Cloud. I use gcloud beta compute instances create-with-container <...other args...> to launch the VM, which runs the specified container. I will call that the "worker" container, and the tasks it performs are not relevant to my question. However, regardless of the "worker" container, I would like to run a second, "cleanup" container upon the completion of the first. In this way, developers can write independently write Docker containers that do not have to "repeat" the work done by the "cleanup" container.
Side note:
I know that I could alternatively specify a startup script (e.g. a bash script) which starts the docker containers as I describe above. However, when I first tried that, I kept running into issues where the docker pull <image> command would timeout or fail for some reason when communicating with dockerhub. The gcloud beta compute instances create-with-container <...args...> seemed to have error handling/retries built-in, which seemed ideal. Does anyone have a working snippet that would provide relatively robust error handling in the startup script?

As far as I know the limitation is one container per VM instance. See limitations.

Answer: It is currently not possible to launch multiple containers with the create-with-container functionality.
Alternative: You mentioned that you have already tried launching your containers with a startup script. Another option would be to specify a cloud-init config through instance metadata. Cloud-init is built into Container-Optimized OS (the same OS that you would use with create-with-container).
It works by adding and starting a systemd service, which means that you can:
specify that your service should run after other services: network-online.target and docker.socket
specify a Restart policy for the service to do retries on failure,
add an ExecStopPost specification to run your cleanup (or add a separate service for that in the cloud-init config)
This is a snippet that could be a starting point (you would need to add it under user-data metadata key):
#cloud-config
users:
- name: cloudservice
uid: 2000
write_files:
- path: /etc/systemd/system/cloudservice.service
permissions: 0644
owner: root
content: |
[Unit]
Description=Start a simple docker container
Wants=network-online.target docker.socket
After=network-online.target docker.socket
[Service]
ExecStart=/usr/bin/docker run --rm -u 2000 --name=cloudservice busybox:latest /bin/sleep 180
ExecStopPost=/bin/echo Finished!
Restart=on-failure
RestartSec=30
runcmd:
- systemctl daemon-reload
- systemctl start cloudservice.service

Related

How to run a docker image from within a docker image?

I run a dockerized Django-celery app which takes some user input/data from a webpage and (is supposed to) run a unix binary on the host system for subsequent data analysis. The data analysis takes a bit of time, so I use celery to run it asynchronously. The data analysis software is dockerized as well, so my django-celery worker should do os.system('docker run ...'). However, celery says docker: command not found, obviously because docker is not installed within my Django docker image. What is the best solution to this problem? I don't want to run docker within docker, because my analysis software should be allowed to use all system resources and not just the resources assigned to the Django image.
I don't want to run docker within docker, because my analysis software should be allowed to use all system resources and not just the resources assigned to the Django image.
I didn't catch the causal relationship here. In fact, we just need to add 2 steps to your Django image:
Follow Install client binaries on Linux to download the docker client binary from prebuilt, then your Django image will have the command docker.
When starting the Django container, add /var/run/docker.sock bind mount, this allows the Django container to directly talk to the docker daemon on the host machine and start the data-analysis tool container on the host. As the analysis container does not start in Django container, they can have separate system resources. In other words, the analysis container's resources do not depend on the resource of the Django image container.
Samples with one docker image which already has the docker client in it:
root#pie:~# ls /dev/fuse
/dev/fuse
root#pie:~# docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock docker /bin/sh
/ # ls /dev/fuse
ls: /dev/fuse: No such file or directory
/ # docker run --rm -it -v /dev:/dev alpine ls /dev/fuse
/dev/fuse
You can see, although the initial container does not have access to the host's /dev folder, the docker container whose command initialized from the initial container could really have separate resources.
If the above is what you need, then it's the right solution for you. Otherwise, you will have to install the analysis tool in your Django image

What is a quick and simple way to know if Docker containers are running on an EC2 instance?

I have a few Docker containers running on EC2 instances in AWS. In the past I have had situations where the Docker containers simply exit due to errors on the docker daemon, and they never start up even though the restart policies are in place (daemon is not running so I don't expect them to get up of course).
Since I am going on holiday I want to implement a quick and easy solution that would allow me to be notified if any containers have exited unexpectedly. The only quick solution I could find was using an Amazon Event Bridge rule for running a scheduled task every X minutes and executing a Systems Manager RunDockerAction command (docker ps) on the instances, but this does not give me any output except for the fact that the command has successfully executed on the instance.
Is there any way that I can get the output of such an Event Bridge task to send the results over an SNS topic if things go wrong?
IF you are running Linux on your AWS EC2 instance, then one solution is to use e-mail as a notification system. In that case, I would suggest the following:
On the AWS EC2 instance, create a Bash script that runs docker ps -a and combine that with a grep statement to filter on the docker container IDs that you want to monitor.
In the same Bash script, using echo and mail, you can e-mail yourself with statistics seen in the previous step. For example"
echo "${container} is not running" | mail -s "Alert! Docker container ${container} is not running!" "first.last#domain.com"
(The above relies on $container to be set appropriately. Use grep to filter out data of interest.)
Create a system crontab job (etc/crontab) and schedule the Bash script to run at your wanted interval.
This is only one possible solution, one that I use myself for quick checks at times.

Same Application in a docker compose configuration mapping different ports on AWS EC2 instance

The app has the following containers
php-fpm
nginx
local mysql
app's API
datadog container
In the dev process many feature branches are created to add new features. such as
app-feature1
app-feature2
app-feature3
...
I have an AWS EC2 instance per feature branches running docker engine V.18 and docker compose to build the and run the docker stack that compose the php app.
To save operation costs 1 AWS EC2 instance can have 3 feature branches at the same time. I was thinking that there should be a custom docker-compose with special port mapping and docker image tag for each feature branch.
The goal of this configuration is to be able to test 3 feature branches and access the app through different ports while saving money.
I also thought about using docker networks by keeping the same ports and using an nginx to redirect traffic to the different docker network ports.
What recommendations do you give?
One straight forward way I can think of in this case is to use the .env file for your docker-compose.
docker-compose.yaml file will look something like this
...
ports:
- ${NGINX_PORT}:80
...
ports:
- ${API_PORT}:80
.env file for each stack will look something like this
NGINX_PORT=30000
API_PORT=30001
and
NGINX_PORT=30100
API_PORT=30101
for different projects.
Note:
.env must be in the same folder as your docker-compose.yaml.
Make sure that all the ports inside .env files will not be conflicting with each other. You can have some kind of conventions like having prefix for features like feature1 will have port starting with 301 i.e. 301xx.
In this way, your docker-compose.yaml can be as generic as you may like.
You're making things harder than they have to be. Your app is containerized- use a container system.
ECS is very easy to get going with. It's a json file that defines your deployment- basically analogous to docker-compose (they actually supported compose files at some point, not sure if that feature stayed around). You can deploy an arbitrary number of services with different container images. We like to use a terraform module with the image tag as a parameter, but easy enough to write a shell script or whatever.
Since you're trying to save money, create a single application load balancer. each app gets a hostname, and each container gets a subpath. For short lived feature branch deployments, you can even deploy on Fargate and not have an ongoing server cost.
It turns out the solution involved capabilities from docker-compose. In docker docs the concept is called Multiple Isolated environments on a single host
to achieve this:
I used an .env file with so many env vars. The main one is CONTAINER_IMAGE_TAG that defines the git branch ID to identify the stack.
A separate docker-compose-dev file defines ports, image tags, extra metadata that is dev related
Finally the use of --project-name in the docker-compose command allows to have different stacks.
an example docker-compose Bash function that uses the docker-compose command
docker_compose() {
docker-compose -f docker/docker-compose.yaml -f docker/docker-compose-dev.yaml --project-name "project${CONTAINER_IMAGE_TAG}" --project-directory . "$#"
}
The separation should be done in the image tags, container names, network names, volume names and project name.

Running Docker container randomly disappears on AWS EC2 Ubuntu

I'm running Docker on a t2.micro AWS EC2 instance with Ubuntu.
I'm running several containers. One of my long-running containers (always the same) just disappeared after running about 2-5 days for the third time right now. It is just gone with no sign of a crash.
The machine has not been restarted (uptime says 15 days).
I do not use the --rm flag: docker run -d --name mycontainer myimage.
There is no exited zombie of this container when running docker ps -a.
There is no log, i.e. docker logs mycontainer does not find any container.
There is no log entry in journalctl -u docker.service within the time frame
where the container disappears. However, there are some other log entries
regarding another container (let's call it othercontainer) which are
occuring repeatedly about every 6 minutes (it's a cronjob, don't know if relevant):
could not remove cluster networks: This node is not a swarm manager. Use
"docker swarm init" or "docker swarm join" to connect this node to swarm
and try again
Handler for GET /v1.24/networks/othercontainer_default returned error:
network othercontainer_default not found
Firewalld running: false
Even if there would be e.g. an out-of-memory issue or if my application just exits, I would still have an exited Docker container zombie in the ps -a overview, probably with exist status 0 or != 0, right?
I also don't want to --restart automatically, I just want to see the exited container.
Where can I look for more details to trace the issue?
Versions:
OS: Ubuntu 16.04.2 LTS (Kernel: 4.4.0-1013-aws)
Docker: Docker version 17.03.1-ce, build c6d412e
Thanks to a hint to look at dmesg or maybe the general journalctl I think I finally found the issue.
Somehow, one of the cronjobs has been running docker system prune -f at its end every 5 minutes. This command basically seems to remove everything unused and non-running.
I didn't know about this command before but certainly this has to be the way how my exited containers got removed without me knowing how it happened.

AWS: CodeDeploy for a Docker Compose project?

My current objective is to have Travis deploy our Django+Docker-Compose project upon successful merge of a pull request to our Git master branch. I have done some work setting up our AWS CodeDeploy since Travis has builtin support for it. When I got to the AppSpec and actual deployment part, at first I tried to have an AfterInstall script do docker-compose build and then have an ApplicationStart script do docker-compose up. The containers that have images pulled from the web are our PostgreSQL container (named db, image aidanlister/postgres-hstore which is the usual postgres image plus the hstore extension), the Redis container (uses the redis image), and the Selenium container (image selenium/standalone-firefox). The other two containers, web and worker, which are the Django server and Celery worker respectively, use the same Dockerfile to build an image. The main command is:
CMD paver docker_run
which uses a pavement.py file:
from paver.easy import task
from paver.easy import sh
#task
def docker_run():
migrate()
collectStatic()
updateRequirements()
startServer()
#task
def migrate():
sh('./manage.py makemigrations --noinput')
sh('./manage.py migrate --noinput')
#task
def collectStatic():
sh('./manage.py collectstatic --noinput')
# find any updates to existing packages, install any new packages
#task
def updateRequirements():
sh('pip install --upgrade -r requirements.txt')
#task
def startServer():
sh('./manage.py runserver 0.0.0.0:8000')
Here is what I (think I) need to make happen each time a pull request is merged:
Have Travis deploy changes using CodeDeploy, based on deploy section in .travis.yml tailored to our CodeDeploy setup
Start our Docker containers on AWS after successful deployment using our docker-compose.yml
How do I get this second step to happen? I'm pretty sure ECS is actually not what is needed here. My current status right now is that I can get Docker started with sudo service docker start but I cannot get docker-compose up to be successful. Though deployments are reported as "successful", this is only because the docker-compose up command is run in the background in the Validate Service section script. In fact, when I try to do docker-compose up manually when ssh'd into the EC2 instance, I get stuck building one of the containers, right before the CMD paver docker_run part of the Dockerfile.
This took a long time to work out, but I finally figured out a way to deploy a Django+Docker-Compose project with CodeDeploy without Docker-Machine or ECS.
One thing that was important was to make an alternate docker-compose.yml that excluded the selenium container--all it did was cause problems and was only useful for local testing. In addition, it was important to choose an instance type that could handle building containers. The reason why containers couldn't be built from our Dockerfile was that the instance simply did not have the memory to complete the build. Instead of a t1.micro instance, an m3.medium is what worked. It is also important to have sufficient disk space--8GB is far too small. To be safe, 256GB would be ideal.
It is important to have an After Install script run service docker start when doing the necessary Docker installation and setup (including installing Docker-Compose). This is to explicitly start running the Docker daemon--without this command, you will get the error Could not connect to Docker daemon. When installing Docker-Compose, it is important to place it in /opt/bin/ so that the binary is used via /opt/bin/docker-compose. There are problems with placing it in /usr/local/bin (I don't exactly remember what problems, but it's related to the particular Linux distribution for the Amazon Linux AMI). The After Install script needs to be run as root (runas: root in the appspec.yml AfterInstall section).
Additionally, the final phase of deployment, which is starting up the containers with docker-compose up (more specifically /opt/bin/docker-compose -f docker-compose-aws.yml up), needs to be run in the background with stdin and stdout redirected to /dev/null:
/opt/bin/docker-compose -f docker-compose-aws.yml up -d > /dev/null 2> /dev/null < /dev/null &
Otherwise, once the server is started, the deployment will hang because the final script command (in the ApplicationStart section of my appspec.yml in my case) doesn't exit. This will probably result in a deployment failure after the default deployment timeout of 1 hour.
If all goes well, then the site can finally be accessed at the instance's public DNS and port in your browser.