cloud run build and building docker images while building a docker image - google-cloud-platform

Currently, I found out a google cloud build happens during building a docker image time(not as I thought in that it would build my image and then execute my image to do all the building). That was in this post
quick start in google cloud build
soooo, I have a Dockerfile that is real simple now like so
FROM gcr.io/google.com/cloudsdktool/cloud-sdk:alpine
RUN mkdir -p ./monobuild
COPY . ./monobuild/
WORKDIR "/monobuild"
RUN ["/bin/bash", "./downloadAndExtract.sh"]
and I have a single downloadAndExtract that downloads any artifacts(zip files) from the last monobuild run that were built(only modified servers are built OR servers that dependend on changes in the last CI build are built like downstream libraries may be changed). This first line just lists urls of the zip files I need in a file...
curl "https://circleci.com/api/v1.1/project/butbucket/Twilio/orderly/latest/artifacts?circle-token=$token" | grep -o 'https://[^"]*zip' > artifacts.txt
while read url; do
echo "Downloading url=$url"
zipFile=${url/*\//}
projectName=${zipFile/.zip/}
echo "Zip filename=$zipFile"
echo "projectName=$projectName"
wget "$url?circle-token=$token"
mv "$zipFile?circle-token=$token" $zipFile
unzip $zipFile
rm $zipFile
cd $projectName
./deployGcloud.sh
cd ..
done <artifacts.txt
echo "DONE"
Of course, the deployGcloud script has these commands in it sooooo this means we are building docker images WHILE building the google cloud build docker image(which still seems funny to me)....
docker build . --tag gcr.io/twix/authservice
docker push gcr.io/twix/authservice
gcloud run deploy staging-admin --region us-west1 --image gcr.io/twix/authservice --platform managed
BOTH docker commands seem to be erroring out on this..
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
while the gcloud command seems to be very happy doing a deploy but just using a previous image we deployed at that location.
So, how to get around this error so my build will work and build N images and deploy them all to cloud run?

oh, I finally figured it out. Google has this weird thing in it's config.yaml files of use this docker image to run a curl command and then on next step use this OTHER dockerr image to run some other command and so on using 5 different images. This is all very confusing so instead, I realized I had to figure out how to create my ONE docker image and just run it as a command so I modify the above to have an ENTRYPOINT instead and then docker build and docker push my image into google. Then, I have a cloudbuild.yaml with a single step image command to run.
In this way, we can tweak our builds easily within our docker image that is just run. This is now way simpler than the complex model that google had setup as it becomes basic do your build in the container however you like and install whatever tools you need in the one docker image.
ie. beware the google quick starts which honestly IMHO are really overcomplicating it compared to circleCI and other systems. (of course, that is just an opinion and each to their own).

Related

Tagging multi-platform images in ECR creates untagged manifests

I've started using docker buildx to tag and push mutli-platform images to ECR. However, ECR appears to apply the tag to the parent manifest, and leaves each related manifest as untagged. ECR does appear to prevent deletion of the child manifests, but it makes managing cleanup of orphaned untagged images complicated.
Is there a way to tag these child manifests in some way?
For example, consider this push:
docker buildx build --platform "linux/amd64,linux/arm64" --tag 1234567890.dkr.ecr.eu-west-1.amazonaws.com/my-service/my-image:1.0 --push .
Inspecting the image:
docker buildx imagetools inspect 1234567890.dkr.ecr.eu-west-1.amazonaws.com/my-service/my-image:1.0
Shows:
Name: 1234567890.dkr.ecr.eu-west-1.amazonaws.com/my-service/my-image:1.0
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest: sha256:4221ad469d6a18abda617a0041fd7c87234ebb1a9f4ee952232a1287de73e12e
Manifests:
Name: 1234567890.dkr.ecr.eu-west-1.amazonaws.com/my-service/my-image:1.0#sha256:c1b0c04c84b025357052eb513427c8b22606445cbd2840d904613b56fa8283f3
MediaType: application/vnd.docker.distribution.manifest.v2+json
Platform: linux/amd64
Name: 1234567890.dkr.ecr.eu-west-1.amazonaws.com/my-service/my-image:1.0#sha256:828414cad2266836d9025e9a6af58d6bf3e6212e2095993070977909ee8aee4b
MediaType: application/vnd.docker.distribution.manifest.v2+json
Platform: linux/arm64
However, ECR shows the 2 child images as untagged
I'm running into the same problem. So far my solution seems a little easier than some of the other suggestions, but I still don't like it.
After doing the initial:
docker buildx build --platform "linux/amd64,linux/arm64" --tag 1234567890.dkr.ecr.eu-west-1.amazonaws.com/my-service/my-image:1.0 --push .
I follow up with:
docker buildx build --platform "linux/amd64" --tag 1234567890.dkr.ecr.eu-west-1.amazonaws.com/my-service/my-image:1.0-amd --push .
docker buildx build --platform "linux/arm64" --tag 1234567890.dkr.ecr.eu-west-1.amazonaws.com/my-service/my-image:1.0-arm --push .
This gets me the parallel build speed of building multiple platforms at the same time, and gets me the images tagged in ECR. Thanks to having the build info cached it is pretty quick, it appears to just push the tags and that is it. In a test I just did the buildx time for the first command was 0.5 seconds. and the second one took 0.7 seconds.
That said, I'm not wild about this solution, and found this question while looking for a better one.
There are several ways to tag the image, but they all involve pushing the platform specific manifest with the desired tag. With docker, you can pull the image, retag it, and push it, but the downside is you'll have to pull every layer.
A much faster option is to only transfer the manifest json with registry API calls. You could do this with curl, but auth becomes complicated. There are several tools for working directly with registries, including Googles crane, RedHat's skopeo, and my own regclient. Regclient includes the regctl command which would implement this like:
image=1234567890.dkr.ecr.eu-west-1.amazonaws.com/my-service/my-image
tag=1.0
regctl image copy \
${image}#$(regctl image digest --platform linux/amd64 $image:$tag) \
${image}:${tag}-linux-amd64
regctl image copy \
${image}#$(regctl image digest --platform linux/arm64 $image:$tag) \
${image}:${tag}-linux-arm64
You could also script an automated fix to this, listing all tags in the registry, pulling the manifest list for the tags that don't already have the platform, and running the image copy to retag each platform's manifest. But it's probably easier and faster to script your buildx job to include something like regctl after buildx pushes the image.
Note if you use a cred helper for logging into ECR, regctl can use this with the local command. If want to run regctl as a container, and you are specifically using ecr-login, use the alpine version of the images since they include the helper binary.
In addition to what Brandon mentioned above on using regctl, here's the command for skopeo if you're looking to use it with ECR credential helper. https://github.com/awslabs/amazon-ecr-credential-helper
skopeo copy \
docker://1234567890.dkr.ecr.us-west-2.amazonaws.com/stackoverflow#sha256:1badbc699ed4a1785295baa110a125b0cdee8d854312fe462d996452b41e7755 \
docker://1234567890.dkr.ecr.us-west-2.amazonaws.com/stackoverflow:1.0-linux-arm64
https://github.com/containers/skopeo
Paavan Mistry, AWS Containers DA

Why container-optimized compute instance uses cached image instead of the latest one?

I'm running a container-optimized compute instance with this startup-script:
#!/bin/bash
mkdir /home/my-app
cd /home/my-app
export HOME=/home/my-app
docker-credential-gcr configure-docker
docker run --rm --security-opt seccomp=./config.json gcr.io/my-project/my-app:latest
This scripts works well when creating a new instance. But when I restart an existing instance it doesnt't pull the latest image.
I've tried to delete all images from the gcr, the instance was able to start anyways, which proves that it doesn't even try to pull the latest image from gcr.
Also, for some reason startup-script logs are not showing up in Cloud Logger.
As per kubernetes, With Docker, if the image already exists, the pull attempt is fast because all image layers are cached and no image download is needed.
As a workaround you can add step 1 and 2 in to your script:
1- docker images // will show you the list of images including (gcr.io/my-project/my-app:latest)
2- docker rmi --force gcr.io/my-project/my-app:latest // will delete local image
3- docker run (rest of your command, it will download the latest image again from the gcr.io)

Why does google cloud build run differently for these two commands?

We run these two commands (the first one is async and the other runs synchronously)
#async BUT does something funky and doesn't run the Dockerfile image as-is
gcloud alpha builds triggers run staging-deploy --branch master
# sync BUT runs the image the way it's supposed to run!!!
gcloud builds submit --config cloudbuild.yaml
both are using our cloudbuild.yaml
steps:
- name: gcr.io/$PROJECT_ID/continuous-deploy
args: ['${_SERVICE}', '${_DOWNLOAD_URL}']
timeout: 1000s
substitutions:
_SERVICE: none
_DOWNLOAD_URL: none
timeout: 1100s
Our Dockerfile is very very simple
FROM gcr.io/google.com/cloudsdktool/cloud-sdk:alpine
RUN mkdir -p ./monobuild
COPY . ./monobuild/
WORKDIR "/monobuild"
#NOTE: This file in google cloud build trigger MUST be in root of monorepo BUT I don't know why
#NOTE: This command receives any arguments to docker
#ie. for "docker run {image} {args}", it receives the args
ENTRYPOINT ["./downloadAndExtract.sh"]
Sooo, when I run the SECOND command, it completely uses the docker image obeying the Dockerfile. When I run the first command, it's ignoring all my Dockerfile stuff and trying to run scripts in my git repo(which is very frustrating and not what I want).
We HAD this directory structure
- gitroot
- stagingDeploy
- Dockerfile
- deployStaging.sh # part of Dockerfile
- cloudbuild.yaml
- prodDeploy
- Dockerfile
- prodDeploy.sh #part of Docker file
- cloudbuild.yaml
Of course, only the second command works with this directory structure. The first command CANNOT find deployStaging.sh UNTIL we ln -s stagingDeploy/deployStaging.sh from our gitrepo root and we have around 5 deploy directories and now our git repo root is fully polluted.
It is to say the least very frustrating and we are not sure how to clean this up so prodDeploy contains all the prod deploy scripts and staging, the staging ones and get rid of all root files.
Of course, we now have a corrupted git repo directory structure with a whole slew of files in the root directory from various different builds(sometimes conflicting on accident as files get the same names sometimes).
EDIT: Not really much to share on configuration of twitter as each one just points to the yaml file is all like so
thanks,
Dean

quick start in google cloud build

I ran the quick start
https://cloud.google.com/cloud-build/docs/quickstart-build
and in the section "View the build details", I don't see the output of the quickstart.sh file anywhere. Where is the logs from actually running the quickstart.sh file?
Without any output from quickstart.sh, I am unsure how to log what is going on in docker so I can fix broken builds that build in docker.
In this official tutorial, a docker container is built via Cloud Build, with only one executable bash script which is displaying the current date.
#!/bin/sh
echo "Hello, world! The time is $(date)."
Here is the Dockerfile :
FROM alpine
COPY quickstart.sh /
CMD ["/quickstart.sh"]
It means quickstart.sh is never executed during build phase but only at the execution step of container.
To see the output of script, you should run container (either locally on your computer, or via Cloud Shell) :
$ docker run gcr.io/[PROJECT-ID]/quickstart-image:latest
Hello, world! The time is Sat Jun 13 05:10:41 UTC 2020.
If you want to execute a script during build phase of container, you should use RUN command.
For example, let's create a second executable script called build.sh in the same directory:
#!/bin/sh
echo "Hello, build at $(date)."
Then, add it on Dockerfile file description :
FROM alpine
COPY quickstart.sh /
COPY build.sh /
RUN /build.sh
CMD ["/quickstart.sh"]
Now, we can build a new version of container image :
gcloud builds submit --tag gcr.io/[PROJECT-ID]/quickstart-image
This time, output of build.sh could be seen in the details output log in Cloud Build console:
Of course, here it's just a simple example to give you a quick answer. You may check all other possible options to write a correct and clean Dockerfile. But it's not really linked with Cloud Build.

AWS: CodeDeploy for a Docker Compose project?

My current objective is to have Travis deploy our Django+Docker-Compose project upon successful merge of a pull request to our Git master branch. I have done some work setting up our AWS CodeDeploy since Travis has builtin support for it. When I got to the AppSpec and actual deployment part, at first I tried to have an AfterInstall script do docker-compose build and then have an ApplicationStart script do docker-compose up. The containers that have images pulled from the web are our PostgreSQL container (named db, image aidanlister/postgres-hstore which is the usual postgres image plus the hstore extension), the Redis container (uses the redis image), and the Selenium container (image selenium/standalone-firefox). The other two containers, web and worker, which are the Django server and Celery worker respectively, use the same Dockerfile to build an image. The main command is:
CMD paver docker_run
which uses a pavement.py file:
from paver.easy import task
from paver.easy import sh
#task
def docker_run():
migrate()
collectStatic()
updateRequirements()
startServer()
#task
def migrate():
sh('./manage.py makemigrations --noinput')
sh('./manage.py migrate --noinput')
#task
def collectStatic():
sh('./manage.py collectstatic --noinput')
# find any updates to existing packages, install any new packages
#task
def updateRequirements():
sh('pip install --upgrade -r requirements.txt')
#task
def startServer():
sh('./manage.py runserver 0.0.0.0:8000')
Here is what I (think I) need to make happen each time a pull request is merged:
Have Travis deploy changes using CodeDeploy, based on deploy section in .travis.yml tailored to our CodeDeploy setup
Start our Docker containers on AWS after successful deployment using our docker-compose.yml
How do I get this second step to happen? I'm pretty sure ECS is actually not what is needed here. My current status right now is that I can get Docker started with sudo service docker start but I cannot get docker-compose up to be successful. Though deployments are reported as "successful", this is only because the docker-compose up command is run in the background in the Validate Service section script. In fact, when I try to do docker-compose up manually when ssh'd into the EC2 instance, I get stuck building one of the containers, right before the CMD paver docker_run part of the Dockerfile.
This took a long time to work out, but I finally figured out a way to deploy a Django+Docker-Compose project with CodeDeploy without Docker-Machine or ECS.
One thing that was important was to make an alternate docker-compose.yml that excluded the selenium container--all it did was cause problems and was only useful for local testing. In addition, it was important to choose an instance type that could handle building containers. The reason why containers couldn't be built from our Dockerfile was that the instance simply did not have the memory to complete the build. Instead of a t1.micro instance, an m3.medium is what worked. It is also important to have sufficient disk space--8GB is far too small. To be safe, 256GB would be ideal.
It is important to have an After Install script run service docker start when doing the necessary Docker installation and setup (including installing Docker-Compose). This is to explicitly start running the Docker daemon--without this command, you will get the error Could not connect to Docker daemon. When installing Docker-Compose, it is important to place it in /opt/bin/ so that the binary is used via /opt/bin/docker-compose. There are problems with placing it in /usr/local/bin (I don't exactly remember what problems, but it's related to the particular Linux distribution for the Amazon Linux AMI). The After Install script needs to be run as root (runas: root in the appspec.yml AfterInstall section).
Additionally, the final phase of deployment, which is starting up the containers with docker-compose up (more specifically /opt/bin/docker-compose -f docker-compose-aws.yml up), needs to be run in the background with stdin and stdout redirected to /dev/null:
/opt/bin/docker-compose -f docker-compose-aws.yml up -d > /dev/null 2> /dev/null < /dev/null &
Otherwise, once the server is started, the deployment will hang because the final script command (in the ApplicationStart section of my appspec.yml in my case) doesn't exit. This will probably result in a deployment failure after the default deployment timeout of 1 hour.
If all goes well, then the site can finally be accessed at the instance's public DNS and port in your browser.