AKS Container failed to start

AKS Container failed to start - dockerfile

I updated my docker file to upgrade ubuntu but it started failing and I'm unsure why...
dockerfile:
# using digest for version 20.04 as there is multiple digest that used this tag#
FROM ubuntu#sha256:82becede498899ec668628e7cb0ad87b6e1c371cb8a1e597d83a47fac21d6af3
ENV DEBIAN_FRONTEND=noninteractive
RUN echo "APT::Get::Assume-Yes \"true\";" > /etc/apt/apt.conf.d/90assumeyes
#install tools
#removed for clarity
WORKDIR /azp
COPY ./start.sh .
RUN chmod +x start.sh
CMD ["./start.sh"]
my evens from the pod
Successfully assigned se-agents/agent-se-linux-5c9f647768-25p7v to aks-linag-56790600-vmss000002
Pulling image "compregistrynp.azurecr.io/agent-se-linux:25319"
Successfully pulled image "comregistrynp.azurecr.io/agent-se-linux:25319"
Created container agent-se-linux
Started container agent-se-linux
Back-off restarting failed container
When I check the error in the pod, I see the following message:
standard_init_linux.go:228: exec user process caused: no such file or directory
Not even sure where to look anymore. The only difference in the dockerfile was the ubuntu tag and I added 1 tool to install. I tried to deploy what was in Prod to dev and it's failing with the same error. I'm convinced there's something in my AKS...

So the issue was that someone on my team had modified the shell script and didn't set the proper End of Line characters to Lf.
I will be running a script to convert the file to Linux to ensure this doesn't happen again in my pipeline!

Related

Why does google cloud build run differently for these two commands?

We run these two commands (the first one is async and the other runs synchronously)
#async BUT does something funky and doesn't run the Dockerfile image as-is
gcloud alpha builds triggers run staging-deploy --branch master
# sync BUT runs the image the way it's supposed to run!!!
gcloud builds submit --config cloudbuild.yaml
both are using our cloudbuild.yaml
steps:
- name: gcr.io/$PROJECT_ID/continuous-deploy
args: ['${_SERVICE}', '${_DOWNLOAD_URL}']
timeout: 1000s
substitutions:
_SERVICE: none
_DOWNLOAD_URL: none
timeout: 1100s
Our Dockerfile is very very simple
FROM gcr.io/google.com/cloudsdktool/cloud-sdk:alpine
RUN mkdir -p ./monobuild
COPY . ./monobuild/
WORKDIR "/monobuild"
#NOTE: This file in google cloud build trigger MUST be in root of monorepo BUT I don't know why
#NOTE: This command receives any arguments to docker
#ie. for "docker run {image} {args}", it receives the args
ENTRYPOINT ["./downloadAndExtract.sh"]
Sooo, when I run the SECOND command, it completely uses the docker image obeying the Dockerfile. When I run the first command, it's ignoring all my Dockerfile stuff and trying to run scripts in my git repo(which is very frustrating and not what I want).
We HAD this directory structure
- gitroot
- stagingDeploy
- Dockerfile
- deployStaging.sh # part of Dockerfile
- cloudbuild.yaml
- prodDeploy
- Dockerfile
- prodDeploy.sh #part of Docker file
- cloudbuild.yaml
Of course, only the second command works with this directory structure. The first command CANNOT find deployStaging.sh UNTIL we ln -s stagingDeploy/deployStaging.sh from our gitrepo root and we have around 5 deploy directories and now our git repo root is fully polluted.
It is to say the least very frustrating and we are not sure how to clean this up so prodDeploy contains all the prod deploy scripts and staging, the staging ones and get rid of all root files.
Of course, we now have a corrupted git repo directory structure with a whole slew of files in the root directory from various different builds(sometimes conflicting on accident as files get the same names sometimes).
EDIT: Not really much to share on configuration of twitter as each one just points to the yaml file is all like so
thanks,
Dean

Perform actions on server after CircleCI deployment

I have a Django project that I deploy on a server using CircleCI. The server is a basic cloud server, and I can SSH into it.
I set up the deployment section of my circle.yml file, and everything is working fine. I would like to automatically perform some actions on the server after the deployment (such as migrating the database or reloading gunicorn).
I there a way to do that with CircleCI? I looked in the docs but couldn't find anything related to this particular problem. I also tried to put ssh user#my_server_ip after my deployment step, but then I get stuck and cannot perform any action. I can successfully SSH in, but the rest of the commands is not called.
Here is what my ideal circle.yml file would look like:
deployment:
staging:
branch: develop
commands:
- rsync --update ./requirements.txt user#server:/home/user/requirements.txt
- rsync -r --update ./myapp/ user#server:/home/user/myapp/
- ssh user#server
- workon myapp_venv
- cd /home/user/
- pip install -r requirements.txt

I solved the problem by putting a post_deploy.sh file on the server, and putting this line on the circle.yml:
ssh -i ~/.ssh/id_myhost user#server 'post_deploy.sh'
It executes the instructions in the post_deploy.sh file, which is exactly what I wanted.

AWS Elastic Beanstalk Docker deployment failed

Has anyone encountered failed deployment when deploying docker app to aws eb?
Here's a piece of log
time="2016-09-20T09:36:42.802106539Z" level=error msg="Handler for DELETE /v1.23/containers/c7bc72d9ccec returned error: You cannot remove a running container c7bc72d9ccec6557ddca8e90c7c77b350cb0c80be9a90921478adccd70a2b97a. Stop the container before attempting removal or use -f"
time="2016-09-20T09:36:42.924322201Z" level=error msg="Handler for DELETE /v1.23/images/9daab71ad3c0 returned error: conflict: unable to delete 9daab71ad3c0 (cannot be forced) - image is being used by running container c7bc72d9ccec"
time="2016-09-20T09:36:42.924865908Z" level=error msg="Handler for DELETE /v1.23/images/dbcc41959b55 returned error: conflict: unable to delete dbcc41959b55 (cannot be forced) - image has dependent child images"
For the first time of the environment deployment, it works well. However, every time I deploy a new version of the app, it fails.
Running on 64bit Amazon Linux 2016.03 v2.1.6 | Docker 1.11.2
My Dockerfile is rather simple:
# Get Node Latest
FROM node:6.5.0
# Create working directory
WORKDIR /app
ADD . /app
# Install depencencies
RUN npm install
# Expost 3000 port
EXPOSE 3000
# Start app
CMD ["node", "server.js"]

It turns out that npm install might take too long to run, cuz once I put node_modules into the zip and remove npm install from Dockerfile, it takes 3-5 minutes for deploying now.

AWS: CodeDeploy for a Docker Compose project?

My current objective is to have Travis deploy our Django+Docker-Compose project upon successful merge of a pull request to our Git master branch. I have done some work setting up our AWS CodeDeploy since Travis has builtin support for it. When I got to the AppSpec and actual deployment part, at first I tried to have an AfterInstall script do docker-compose build and then have an ApplicationStart script do docker-compose up. The containers that have images pulled from the web are our PostgreSQL container (named db, image aidanlister/postgres-hstore which is the usual postgres image plus the hstore extension), the Redis container (uses the redis image), and the Selenium container (image selenium/standalone-firefox). The other two containers, web and worker, which are the Django server and Celery worker respectively, use the same Dockerfile to build an image. The main command is:
CMD paver docker_run
which uses a pavement.py file:
from paver.easy import task
from paver.easy import sh
#task
def docker_run():
migrate()
collectStatic()
updateRequirements()
startServer()
#task
def migrate():
sh('./manage.py makemigrations --noinput')
sh('./manage.py migrate --noinput')
#task
def collectStatic():
sh('./manage.py collectstatic --noinput')
# find any updates to existing packages, install any new packages
#task
def updateRequirements():
sh('pip install --upgrade -r requirements.txt')
#task
def startServer():
sh('./manage.py runserver 0.0.0.0:8000')
Here is what I (think I) need to make happen each time a pull request is merged:
Have Travis deploy changes using CodeDeploy, based on deploy section in .travis.yml tailored to our CodeDeploy setup
Start our Docker containers on AWS after successful deployment using our docker-compose.yml
How do I get this second step to happen? I'm pretty sure ECS is actually not what is needed here. My current status right now is that I can get Docker started with sudo service docker start but I cannot get docker-compose up to be successful. Though deployments are reported as "successful", this is only because the docker-compose up command is run in the background in the Validate Service section script. In fact, when I try to do docker-compose up manually when ssh'd into the EC2 instance, I get stuck building one of the containers, right before the CMD paver docker_run part of the Dockerfile.

This took a long time to work out, but I finally figured out a way to deploy a Django+Docker-Compose project with CodeDeploy without Docker-Machine or ECS.
One thing that was important was to make an alternate docker-compose.yml that excluded the selenium container--all it did was cause problems and was only useful for local testing. In addition, it was important to choose an instance type that could handle building containers. The reason why containers couldn't be built from our Dockerfile was that the instance simply did not have the memory to complete the build. Instead of a t1.micro instance, an m3.medium is what worked. It is also important to have sufficient disk space--8GB is far too small. To be safe, 256GB would be ideal.
It is important to have an After Install script run service docker start when doing the necessary Docker installation and setup (including installing Docker-Compose). This is to explicitly start running the Docker daemon--without this command, you will get the error Could not connect to Docker daemon. When installing Docker-Compose, it is important to place it in /opt/bin/ so that the binary is used via /opt/bin/docker-compose. There are problems with placing it in /usr/local/bin (I don't exactly remember what problems, but it's related to the particular Linux distribution for the Amazon Linux AMI). The After Install script needs to be run as root (runas: root in the appspec.yml AfterInstall section).
Additionally, the final phase of deployment, which is starting up the containers with docker-compose up (more specifically /opt/bin/docker-compose -f docker-compose-aws.yml up), needs to be run in the background with stdin and stdout redirected to /dev/null:
/opt/bin/docker-compose -f docker-compose-aws.yml up -d > /dev/null 2> /dev/null < /dev/null &
Otherwise, once the server is started, the deployment will hang because the final script command (in the ApplicationStart section of my appspec.yml in my case) doesn't exit. This will probably result in a deployment failure after the default deployment timeout of 1 hour.
If all goes well, then the site can finally be accessed at the instance's public DNS and port in your browser.

AWS EB, Play Framework and Docker: Application Already running

I am running a Play 2.2.3 web application on AWS Elastic Beanstalk, using SBTs ability to generate Docker images. Uploading the image from the EB administration interface usually works, but sometimes it gets into a state where I consistently get the following error:
Docker container quit unexpectedly on Thu Nov 27 10:05:37 UTC 2014:
Play server process ID is 1 This application is already running (Or
delete /opt/docker/RUNNING_PID file).
And deployment fails. I cannot get out of this by doing anything else than terminating the environment and setting it up again. How can I avoid that the environment gets into this state?

Sounds like you may be running into the infamous Pid 1 issue. Docker uses a new pid namespace for each container, which means first process gets PID 1. PID 1 is a special ID which should be used only by processes designed to use it. Could you try using Supervisord instead of having playframework running as the primary processes and see if that resolves your issue? Hopefully, supervisord handles Amazon's termination commands better than the play framework.

#dkm was having the same issue with my dockerized play app. I package my apps as standalone for production using '$ sbt clean dist` commands. This produces a .zip file that you can deploy to some folder in your docker container like /var/www/xxxx.
Get a bash shell into your container: $ docker run -it <your image name> /bin/bash
Example: docker run -it centos/myapp /bin/bash
Once the app is there you'll have to create an executable bash script I called mine startapp and the contents should be something like this:
Create the script file in the docker container:
$ touch startapp && chmod +x startapp
$ vi startapp
Add the execute command & any required configurations:
#!/bin/bash
/var/www/<your app name>/bin/<your app name> -Dhttp.port=80 -Dconfig.file=/var/www/pointflow/conf/<your app conf. file>
Save the startapp script then from a new terminal and then you must commit your changes to your container's image so it will be available from here on out:
Get the running container's current ID:
$ docker ps
Commit/Save the changes
$ docker commit <your running containerID> <your image's name>
Example: docker commit 1bce234 centos/myappsname
Now for the grand finale you can docker stop or exit out of the running container's bash. Next start the play app using the following docker command:
$ docker run -d -p 80:80 <your image's name> /bin/sh startapp
Example: docker run -d -p 80:80 centos/myapp /bin/sh startapp
Run docker ps to see if your app is running. You see something similar to this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
19eae9bc8371 centos/myapp:latest "/bin/sh startapp" 13 seconds ago Up 11 seconds 0.0.0.0:80->80/tcp suspicious_heisenberg
Open a browser and visit your new dockerized app
Hope this helps...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js