When you have a complicated RUN apt-get install section that you reuse over multiple docker images, what is the best way to reuse it?
The options that I think we have are
copy-paste the RUN command n times across your Dockerfiles (this is what I do today)
make a docker image and use it as a build step + COPY --from=builder... (this is what I wan't, but I don't konw how to do it).
I am thinking of something like this:
Dockerfile with reusable apt install command, tagged as my-builder-img:
FROM debian:buster
RUN ... apt-get install ...
Dockerfile that reuses that complicated install:
FROM my-builder-img as builder
#nothing here
FROM debian:buster
COPY --from=builder /usr/bin:/usr/bin # (...???)
TL;DR how to reuse apt-get install from a previus image onto a new image.
You just use the image you put all the packages in directly.
Multi-stage builds shine when you are creating an artifact and copying that to a new image. If you are just installing packages those will exist in the image.
Dockerfile with packages you want:
FROM debian:buster
RUN ... apt-get install ...
Tag it as my-image.
Now, just use that image in other Dockerfiles and the packages installed will be available.
FROM my-image:latest
# other directives...
Related
Background
I have a CI pipeline for a C++ library I've been developing. So far, I can distribute this lib to Linux and Windows systems. Since I use GitLab to build, test and package my lib, I'd like to have my Windows builds running faster and I have no clue on how to do that.
Currently, I use the following script for my Windows builds:
.windows_template:
tags:
- windows
before_script:
- choco install cmake.install -y --installargs '"ADD_CMAKE_TO_PATH=System"'
- choco install python --pre -y
- choco install git -y
- $env:ChocolateyInstall = Convert-Path "$((Get-Command choco).Path)\..\.."; Import-Module "$env:ChocolateyInstall\helpers\chocolateyProfile.psm1"; refreshenv
- python -m pip install --upgrade pip
- pip install conan monotonic
The problem
Any build with the script above can take up to 10 minutes; worse: I have two stages, each one taking the same amount of time. This means that my whole CI pipeline will take 20 minutes to finish because of slowness in Windows builds.
Ideal solution
EVERYTHING in my before_script can be cached or stored as an image. I only need some hints on how to do it properly.
Additional information
I use the following tools for my builds:
CMake: to support my building process;
Python3: to test and build packages;
Conan (requires Python3): to support the creation of packages with several features, as well as distribute them;
Git: to download Googletest in CMake configuration step This is already provided in the cookbooks - I might just remove this extra installation step in my before_script;
Googletest (requires Python3): testing library;
Visual Studio DEV Tools: to compile the library This is already in the cookbooks.
Installing packages like this (whether it's OS packages though apt-get install... or pip, or anything else) is generally against best practices for CI/CD jobs because every job that runs will have to do the same thing, costing a lot of time as you run more pipelines, as you've seen already.
A few alternatives are to search for an existing image that has everything you need (possible but not likely with more dependencies), split up your job into pieces that might be solved by an image with just one or two dependencies, or create a custom docker image to use in your jobs. I answered a similar question with an example a few weeks ago here: "Unable to locate package git" when running GitLab CI/CD pipeline
But here's an example Dockerfile with Windows:
# Dockerfile
FROM mcr.microsoft.com/windows
RUN ./install_chocolatey.sh
RUN choco install cmake.install -y --installargs '"ADD_CMAKE_TO_PATH=System"'
RUN choco install python --pre -y
RUN choco install git -y
...
The FROM line says that our new image extends the mcr.microsoft.com/windows base image. You can extend any image you have access to, even if it already extends another image (in fact, that's how most images work: they start with something small, like a base OS installation, then add things needed for that package. PHP for example starts on an Ubuntu image, then installs the necessary PHP packages).
The first RUN line is just an example. I'm not a Windows user and don't have experience installing Chocolatey, but you'd do here whatever you'd normally do to install it locally. The rest are for installing whatever else you need.
Then run
docker build /path/to/dockerfile-dir -t mygroup/mytag:version
The path you supply needs to be the directory that contains the Dockerfile, not the Dockerfile itself. The -t flag sets the image's tag after it's built (though you can do that with a separate command, docker tag too).
Next, you'll have to log into whichever registry you're using (Docker Hub (https://docs.docker.com/docker-hub/repos/), Gitlab Container Registry (https://docs.gitlab.com/ee/user/packages/container_registry/), a private registry your employer may support, or any other option.
docker login my.docker.hub.com
Now you can push the image to the registry:
docker push my.docker.hub.com/mygroup/mytag:version
You'll have to review the information in the docs about telling your Gitlab runner or pipelines how to authenticate with the registry (unless it's Public on Docker Hub or you use the Gitlab Container Registry) https://docs.gitlab.com/ee/ci/docker/using_docker_images.html#define-an-image-from-a-private-container-registry
Once all that's done, you can use your new image in your CI jobs, and everything we put into the image will be ready to use:
.windows_template:
image: my.docker.hub.com/mygroup/mytag:version
tags:
- windows
...
I’m looking to understand how to properly structure my .gitlab-ci.yml and Dockerfile such that I can build a C++ application into a Docker container.
I’m struggling with where the actual compilation and link of the C++ application should take place within the CI workflow.
What I’ve done:
My current in approach is to use Docker in Docker with a private gitlab docker registry.
My gitlab-ci.yml uses a dind docker image service I created based on the the docker:19.03.1-dind image but includes my certificates to talk securely to my private gitlab docker registry.
I also have a custom base image referenced by my gitlab-ci.yml based on docker:19.03.1 that includes what I need for building, eg cmake, build-base mariadb-dev, etc.
Have my build script added to the gitlab-ci.yml to build the application, cmake … && cmake --build .
The dockerfile then copies the final binary produced in my build step.
Having done all of this it doesn’t feel quite right to me and I’m wondering if I’m missing the intent. I’ve tried to find a C++ example online to follow as example but have been unsuccessful.
What I’m not fully understanding is the role of each player in the docker-in-docker setup: docker image, dind image, and finally the container I’m producing…
What I’d like to know…
Who should perform the build and contain the build environment, the base image specified in my .gitlab-ci.yml or my Dockerfile?
If I build with the dockerfile, how to i get the contents of the source into the docker container? Do I copy the /builds dir? Should I mount it?
Where to divide who performs work, gitlab-ci.yml or Docker file?
Reference to a working example of a C++ docker application built with Docker-in-Docker Gitlab CI.
.gitlab-ci.yml
image: $CI_REGISTRY/building-blocks/dev-mysql-cpp:latest
#image: docker:19.03.1
services:
- name: $CI_REGISTRY/building-blocks/my-dind:latest
alias: docker
stages:
- build
- release
variables:
# Use TLS https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#tls-enabled
DOCKER_TLS_CERTDIR: "/certs"
CONTAINER_TEST_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
CONTAINER_RELEASE_IMAGE: $CI_REGISTRY_IMAGE:latest
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
build:
stage: build
script:
- mkdir build
Both approaches are equally valid. If you look at other SO questions, one thing you'll probably notice is that Java/Docker images almost universally build a jar file on their host and then COPY it into an image, but Go/Docker images tend to use a multi-stage Dockerfile starting from sources.
If you already have a fairly mature build system and your developers already have a very consistent setup, it makes sense to do more work in the CI environment (in your .gitlab.yml file). Build your application the same way you already do, then COPY it into a minimal Docker image. This approach is also helpful if you need to ship both Docker and non-Docker artifacts. If you have a make dist style tar file and want to get a Docker image out of it, you could use a very straightforward Dockerfile like
FROM ubuntu
RUN apt-get update && apt-get install ...
ADD dist/myapp.tar.gz /usr/local # unpacking it
EXPOSE 12345
CMD ["myapp"] # /usr/local/bin/myapp
On the other hand, if your developers have a variety of desktop environments and you're really trying to standardize things, and you only need to ship the Docker image, it could make sense to centralize most things in the Dockerfile. This would have the advantage that every developer could run the exact build sequence themselves locally, rather than depending on the CI system to try simple changes. Something built around GNU Autoconf might look more like
FROM ubuntu AS build
RUN apt-get update \
&& apt-get install --no-install-recommends --assume-yes \
build-essential \
lib...-dev
WORKDIR /app
COPY . .
RUN ./configure --prefix=/usr/local \
&& make \
&& make install
FROM ubuntu
RUN apt-get update \
&& apt-get install --no-install-recommends --assume-yes \
lib...
COPY --from=build /usr/local /usr/local
CMD ["myapp"]
If you do the primary build in a Dockerfile, you need to COPY the source code in. Volume mounts don't work at this point in the sequence. CI systems should avoid bind-mounting source code into a container in any case: you want to run tests against the actual artifact you've built, and not a hybrid of a built Docker image but with all of its source code replaced.
In the dockerfiles I have seen, and the in the best practices for writing a docker file: https://docs.docker.com/engine/reference/builder/#copy, when apt-get is used to install some packages, apt-get update is always run first. I have a concern on this because the app we build in the corresponding docker container would depend on these installed packages, if there is some inconsistency in the newest version of the installed packages, the software we build will not work right any more. Why we do not specify a version of the packages, but use apt-get update instead?
From the man page for apt-get:
update is used to resynchronize the package index files from their
sources. The indexes
of available packages are fetched from the location(s) specified in
/etc/apt/sources.list. For example, when using a Debian archive, this command retrieves
and scans the Packages.gz files, so that information about new and updated packages is
available. An update should always be performed before an upgrade or dist-upgrade.
Please be aware that the overall progress meter will be incorrect as the size of the
package files cannot be known in advance.
You can try running apt-get install without running update on a docker image but you'll probably find that a lot of things will fail to install because the package indexes are out of date.
Once you update the package data, then you can specify a specific version for packages when you run install e.g.
apt update && apt install -y \
git=1:2.7.4-0ubuntu1.4
Example with docker container:
> sudo docker run -it ubuntu:16.04 /bin/bash
# root#513eb786d86d:/# apt install git
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package git
root#513eb786d86d:/# apt install git=1:2.7.4-0ubuntu1.4
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package git
root#513eb786d86d:/# apt update
...
root#513eb786d86d:/# apt install git=1:2.7.4-0ubuntu1.4
# works this time!
I have an application that is running some processes and exposes them through WebAPI. Part of these processes need to execute Python scripts through the IronPython library. For this to happen though, Python 2.7 must also be installed on the system.
Has anyone solved this problem by figuring out how to install Python in the ASPNET Core Docker image (or by any other means). The only other hack I can think of it putting the Python executable into a dependency directory for the API.
Our current Docker File contents:
FROM microsoft/aspnetcore:2.0
ARG source
WORKDIR /app
EXPOSE 80
COPY ${source:-obj/Docker/publish} .
ENTRYPOINT ["dotnet", "Api.dll"]
You can use the RUN command to install it on the image. Simply add the following to your Dockerfile.
The image I pulled from Dockerhub seemed to be running Debian Linux as the base OS so the following should work. If it's another linux distro as the base in your instance, try yum instead or for windows OS chocolatey.
FROM microsoft/aspnetcore:2.0
RUN apt-get update -y && apt-get install python2.7 -y
ARG source
WORKDIR /app
EXPOSE 80
COPY ${source:-obj/Docker/publish} .
ENTRYPOINT ["dotnet", "AIA.Vietnam.dll"]
Now the python executable should be available in /usr/bin/python2.7
I'm using azk and my system depends on extra packages. I'd be able to install them using (since I'm using an Ubuntu-based image):
apt-get -yq update && apt-get install -y libqtwebkit-dev qt4-qmake
Can I add this steps to provision? In the Azkfile.js, it would look like:
// ...
provision: [
"apt-get -yq update",
"apt-get install -y libqtwebkit-dev qt4-qmake",
"bundle install --path /azk/bundler",
"bundle exec rake db:create",
"bundle exec rake db:migrate",
]
Or it's better to create a new Docker image?
Provision steps are run in a separated container, so all the data generated inside of it is lost after the provision step, unless you persist them. That's why you probably have bundle folders as persistent folders.
Since that, you should use a Dockerfile in this case. It'll look like this:
FROM azukiapp/ruby:2.2.2 # or the image you were using previously
RUN apt-get -yq update && \
apt-get install -y libqtwebkit-dev qt4-qmake && \
apt-get clean -qq && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* # Keeping the image as small as possible
After that, you should edit your Azkfile.js and replace the image property of your main system to use the created Dockerfile (you can check azk docs here):
image: { dockerfile: './PATH_TO_DOCKERFILE' },
Finally, when you run azk start, azk will build this Dockerfile and use it with all your dependencies installed.
Tip: If you want to force azk to rebuild your Dockerfile, just pass -B flag to azk start.
As it looks like you're using a Debian-based Linux distribution, you could create (https://wiki.debian.org/Packaging) your own Debian virtual package (https://www.debian.org/doc/manuals/debian-faq/ch-pkg_basics.en.html#s-virtual) that lists all the packages it depends on. If you just do that one thing, you can dpkg -i (or apt-get install if you host a custom debian repository yourself) your custom package and it will install all the dependencies you need via apt.
You can then move on to learning about postinst and prerm scripts in Debian packages (https://www.debian.org/doc/manuals/debian-faq/ch-pkg_basics.en.html#s-maintscripts). This will allow you to run commands like bundle and gem as the last step of the package installation and the first step of package removal.
There are a few advantages to doing it this way:
1. If you host a package repository somewhere you can use a pull method of dependency installation in a dynamic scaling environment by simply having the host apt-get update && apt-get install custom-dependencies-diego
2. Versioning your dependency list - Using dpkg -l you can tell what version everything is on a given host, including the version of your dependency virtual package.
3. With prerm scripts, you can ensure that removing your virtual package will also have the effect of removing the changes your installation scripts made so you can get a host back to a "clean" state".
The disadvantage of doing it this way is that it's debian/apt specific. If you wanted to deploy to Slack or RHEL you'd have to change things a bit. Changing to a new distro wouldn't be particularly hard, but it's definitely not as portable as using Bash, for example.