Dockerfile: Does it matter how I clean my build?

Dockerfile: Does it matter how I clean my build? - c++

Is there a difference (when it comes to image size) between
RUN make && make clean
and
RUN make
RUN make clean
Not sure how Docker works, but will the latter create an unnecessary layer?

RUN make clean as a separate step will never make your image smaller than skipping it entirely. If you're going to run it, run it in the same RUN command as the build.
More generally, each RUN command creates a new Docker image layer. Creating the layer in itself isn't expensive, but the resulting image is all of the old layers unmodified, plus the results of this step. In your case that translates to the entire build tree, plus a record that the object files should be deleted.
If you RUN make && make clean in a single step, then the layer isn't created until the entire shell command completes, so the intermediate state isn't persisted.
(Also consider Docker multi-stage builds as a way to build an application from source, but only include the built application in the final image, not any of the build tree or build tools.)

Related

How to use docker to test multiple compiler versions

What is the idiomatic way to write a docker file for building against many different versions of the same compiler?
I have a project which tests against a wide-range of versions of different compilers like gcc and clang as part of a CI job. At some point, the agents for the CI tasks were updated/changed, resulting in newer jobs failing -- and so I've started looking into dockerizing these builds to try to guarantee better reliability and stability.
However, I'm having some difficulty understanding what a proper and idiomatic approach is to producing build images like this without causing a large amount of duplication caused by layers.
For example, let's say I want to build using the following toolset:
gcc 4.8, 4.9, 5.1, ... (various versions)
cmake (latest)
ninja-build
I could write something like:
# syntax=docker/dockerfile:1.3-labs
# Parameterizing here possible, but would cause bloat from duplicated
# layers defined after this
FROM gcc:4.8
ENV DEBIAN_FRONTEND noninteractive
# Set the work directory
WORKDIR /home/dev
COPY . /home/dev/
# Install tools (cmake, ninja, etc)
# this will cause bloat if the FROM layer changes
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
# Default command is to use CMak
CMD ["cmake"]
However, the installation of tools like ninja-build and cmake occur after the base image, which changes per compiler version. Since these layers are built off of a different parent layer, this would (as far as I'm aware) result in layer duplication for each different compiler version that is used.
One alternative to avoid this duplication could hypothetically be using a smaller base image like alpine with separate installations of the compiler instead. The tools could be installed first so the layers remain shared, and only the compiler changes as the last layer -- however this presents its own difficulties, since it's often the case that certain compiler versions may require custom steps, such as installing certain keyrings.
What is the idiomatic way of accomplishing this? Would this typically be done through multiple docker files, or a single docker file with parameters? Any examples would be greatly appreciated.

I would separate the parts of preparing the compiler and doing the calculation, so the source doesn't become part of the docker container.
Prepare Compiler
For preparing the compiler I would take the ARG approach but without copying the data into the container. In case you wanna fast retry while having enough resources you could spin up multiple instances the same time.
ARG COMPILER=gcc:4.8
FROM ${COMPILER}
ENV DEBIAN_FRONTEND noninteractive
# Install tools (cmake, ninja, etc)
# this will cause bloat if the FROM layer changes
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
# Set the work directory
VOLUME /src
WORKDIR /src
CMD ["cmake"]
Build it
Here you have few options. You could either prepare a volume with the sources or use bind mounts together with docker exec like this:
#bash style
for compiler in gcc:4.9 gcc:4.8 gcc:5.1
do
docker build -t mytag-${compiler} --build-arg COMPILER=${compiler} .
# place to clean the target folder
docker run -v $(pwd)/src:/src mytag-${compiler}
done
And because the source is not part of the docker image you don't have bloat. You can also have two mounts, one for a readonly source tree and one for the output files.
Note: If you remove the CMake command you could also spin up the docker containers in parallel and use docker exec to start the build. The downside of this is that you have to take care of out of source builds to avoid clashes on the output folder.

put an ARG before the FROM and then invoke the ARG as the FROM
so:
ARG COMPILER=gcc:4.8
FROM ${COMPILER}
# rest goes here
then you
docker build . -t test/clang-8 --build-args COMPILER=clang-8
or similar.
If you want to automate just make a list of compilers and a bash script looping over the lines in your file, and paste the lines as inputs to the tag and COMPILER build args.
As for Cmake, I'd just do:
RUN wget -qO- "https://cmake.org/files/v3.23/cmake-3.23.1-linux-"$(uname -m)".tar.gz" | tar --strip-components=1 -xz -C /usr/local
When copying, I find it cleaner to do
WORKDIR /app/build
COPY . .
edit: formatting

As far as I know, there is no way to do that easily and safely. You could use a RUN --mount=type=cache, but the documentation clearly says that:
Contents of the cache directories persist between builder invocations without invalidating the instruction cache. Cache mounts should only be used for better performance. Your build should work with any contents of the cache directory as another build may overwrite the files or GC may clean it if more storage space is needed.
I have not tried it but I guess the layers are duplicated anyway, you just save time, assuming the cache is not emptied.
The other possible solution you have is similar to the one you mention in the question: starting with the tools installation and then customizing it with the gcc image. Instead of starting with an alpine image, you could start FROM scratch. scratch is basically the empty image, you could COPY the files generated by
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
Then you COPY the entire gcc filesystem. However, I am not sure it will work because the order of the initial layers is now reversed. This means that some files that were in the upper layer (coming from tools) now are in the lower layer and could be overwritten. In the comments, I asked you for a working Dockerfile because I wanted to try this out before answering. If you want, you can try this method and let us know. Anyway, the first step is extracting the files created from the tools layer.
How to extract changes from a layer?
Let's consider this Dockerfile and build it with docker build -t test .:
FROM debian:10
RUN apt update && apt install -y cmake && ( echo "test" > test.txt )
RUN echo "new test" > test.txt
Now that we have built the test image, we should find 3 new layers. You mainly have 2 ways to extract the changes from each layer:
the first is docker inspecting the image and then find the ids of the layers in the /var/lib/docker folder, assuming you are on Linux. Each layer has a diff subfolder containing the changes. Actually, I think it is more complex than this, that is why I would opt for...
skopeo: you can install it with apt install skopeo and it is a very useful tool to operate on docker images. The command you are interested in is copy, that extracts the layers of an image and export them as .tar:
skopeo copy docker-daemon:{image_name}:latest "dir:/home/test_img"
where image_name is test in this case.
Extracting layer content with Skopeo
In the specified folder, you should find some tar files and a configuration file (look at the skopeo copy command output and you will know which one is that). Then extract each {layer}.tar in a different folder and you are done.
Note: to find the layer containing your tools just open the configuration file (maybe using jq because it is json) and take the diff_id that corresponds to the RUN instruction you find in the history property. You should understand it once you open the JSON configuration. This is unnecessary if you have a small image that has, for example, debian as parent image and a single RUN instruction containing the tools you want to install.
Get GCC image content
Now that we have the tool layer content, we need to extract the gcc filesystem. we don't need skopeo for this one, but docker export is enough:
create a container from gcc (with the tag you need):
docker create --name gcc4.8 gcc:4.8
export it as tar:
docker export -o gcc4.8.tar gcc4.8
finally extract the tar file.
Putting all together
The final Dockerfile could be something like:
FROM scratch
COPY ./tools_layer/ /
COPY ./gcc_4.x/ /
In this way, the tools layer is always reused (unless you change the content of that folder, of course), but you can parameterize the gcc_4.x with the ARG instruction for example.
Read carefully: all of this is not tested but you might encounter 2 issues:
the gcc image overwrites some files you have changed in the tools layer. You could check if this happens by computing the diff between the gcc layer folder and the tools layer folder. If it happens, you can only keep track of that file/s and add it/them in the dockerfile after the COPY ./gcc ... with another COPY.
When in the upper layer a file is removed, docker marks that file with a .wh extension (not sure if it is different with skopeo). If in the tools layer you delete a file that exists in the gcc layer, then that file will not be deleted using the above Dockerfile (the COPY ./gcc ... instruction would overwrite the .wh). In this case too, you would need to add an additional RUN rm ... instruction.
This is probably not the correct approach if you have a more complex image that the one you are showing us. In my opinion, you could give this a try and just see if this works out with a single Dockerfile. Obviously, if you have many compilers, each one having its own tools set, the maintainability of this approach could be a real burden. Instead, if the Dockerfile is more or less linear for all the compilers, this might be good (after all, you do not do this every day).
Now the question is: is avoiding layer replication so important that you are willing to complicate the image-building process this much?

Why is `config.status --recheck` being used at all? – because it doesn't save anything

I've just run ./config.status --recheck and it didn't take into account the changes that I've made to the configure script – i.e.: Makefiles haven't been regenerated.
This puzzles me… What is the use of this script, then? It automatically detects changes on make so that it then re-runs ./configure with all the options recalled and reused from the disk, but that's all that it does – the result of this operation isn't saved to the disk … What is the use of the I've had detected some changes to the build scripts then?

It automatically detects changes on make so that it then re-runs ./configure with all the options recalled and reused from the disk
Which seems to be a very good use case.
If you fixed something in the build system, and want to rebuild, chances are you want to keep all the options passed to configure when you last ran it.
the result of this operation isn't saved to the disk
This is not really true.
./config.status --recheck does run configure with the --no-create option, which says to "not create output files", but that's only half-true: It does update the config.status script itself.
Typically you do not run config.status manually, but it gets invoked automatically by make. And make will then typically also invoke the just updated config.status (without the --recheck flag), which in turn will update your Makefile.
And then it will build the project using the updated Makefile.

How to add/include altered package into final image? [Yocto Project]

I have recently started using Yocto. I 'm looking for option to include/add altered package into final build image. Below I have described the scenario.
I'm working on RDK, which is yocto based system for STB(Set-top Box) Emulator. I have already build complete system once. Now I'm making some changes in some particular module, to see final effect of that in build/image, I rebuilt that particular module(at this point I came to know bitbake doesn't work like makefile utility, that you make changes and it will take care of rest and your package will be compiled as well as included into final image/binary), I used bitbake -c cleansstate <module_name>, then bitbake <module_name> to rebuild the package.
Next thing was to get it inside the final image, but the same thing I had to go through the pain again, bitbake -c cleansstate <image_name>, then bitbake <image_name> to rebuild the image.
Basically, only once package is changed and to include that into final image I have create complete image again.Which is very time-consuming process!!!
I'm wondering is there any way that I can reduce this build time and include altered package into final image?
NOTE: Not looking for optimization option, I know about local.conf BB_NUMBER_THREADS and PARALLEL_MAKE options. It is just about, can we add package into final image without generating all dependency for final image as described in scenario.

Assuming by "making changes" you mean modifying the underlying code, I would suggest using devtool modify - this will set up a local source tree for the recipe where you can make your changes, and each time you make a change and then run bitbake on the recipe or something that depends upon it (such as your image) it will rebuild it including your changes. Basic steps:
devtool modify <recipe>
Make your changes within the source tree that is set up
bitbake <recipe> or bitbake <image>
Test the result; loop back to step 2 if you need to make further changes
devtool finish <recipe> to write your changes back as patches against the recipe

I happened to me that after adding a recipe on meta/recipes-extended/myrecipe_0.0.1.bb
I was able to build my new recipe with the command
bitbake myrecipe
but the binaries never got included on the rootfs image when running
bitbake core-image-minimal
To add the output of my recipe to the output images, I've added the following to my ${BUILDDIR}/conf/local.conf file:
IMAGE_INSTALL_append = " myrecipe"
in my local.conf file.

How do I add a pre-unit-test step to my top-level Makefile.am?

What should I add to my top-level Makefile.am to cause make check to run a custom command before entering any subdirectory?
Here's some context:
When our make check process enters each subdirectory in our source code tree, it builds that subdirectory's unit test binary, copies the binary to a target hardware board using scp, and runs the binary remotely on the target using ssh. We have more developers than we have boards, so we're sharing, and the target platform only has one user ID set up, so if two or more of us run make check at the same time, we clobber each other's unit test binaries on the target.
What I'd like is for make check to use a unique subdirectory on the target for each developer, probably in the home directory of the target's only user. Before anything else is done during a make check run, I'll need to ensure that this subdirectory exists.
I'm hoping I can add a command along the lines of ssh <board-ip> mkdir -p <unique-dirname-based-on-user> to a variable or target in my top-level Makefile.am which will cause that command to be run at the start of a make check run, before any subdirectory is entered and any copying happens.

simply make check-recursive depend on a target you want to execute.
e.g. add the following to your Makefile.am:
check-recursive: pre-check-recursive
.PHONY=pre-check-recursive
pre-check-recursive:
#echo "called before running check recursively (YOUR CODE HERE)"

Hudson project with multiple triggers

I am building a continuous integration system with Hudson, and have a project split into two parts. The first is the main build and unit test element, which I want to run in the traditional CI fashion, triggered off SVN. The second is the functional tests, which take a long time to run, and so I want them to run overnight.
Is there any way of setting up a Hudson project with multiple triggers, i.e. so the functional tests run each night if and only if the main project has changed and has built successfully.
I've seen this question: Hudson - different build targets for different triggers, but that simply runs each night regardless of the state of the main project.

I have exactly the same situation that you do: a build with some quick sanity tests tied to SVN, but a nightly regression test that takes longer.
Our solution was to use the DOS Build Trigger Plugin. On that build trigger, we attach a schedule that triggers once a night. The Trigger Script is a series of simple commands like this:
set CAUSE=
curl http://localhost:8080/job/THEBUILDJOB/lastSuccessfulBuild/artifact/fingerprint.txt -o current.txt
if not exist current.txt exit 0
fc /B current.txt last.txt
if ERRORLEVEL 1 set CAUSE=New build available
copy /y current.txt last.txt
This gets a particular file (fingerprint.txt) from the last successful build and compares it (via fc) to a copy we've stored in the workspace. If they match - no build occurs. If they're different, we trigger a build via the DOS Build Trigger by setting the CAUSE variable, then store the new file in the trigger's workspace.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js