Why Multistage Image build? Why not Basic images? - dockerfile

Below is a sample Docker Multistage build file I am working for a documentation.
'''
FROM mcr.microsoft.com/dotnet/sdk:7.0 AS Dev
COPY *.csproj ./
WORKDIR /crud/
COPY . .
RUN dotnet publish -c Release -o output
FROM mcr.microsoft.com/dotnet/sdk:7.0 AS runtime
ENTRYPOINT ["dotnet", "Asp.netCoreMvcCrud.dll"]
COPY --from=Dev /crud/output .
'''
So this copies, builds on stage one and
Copies the published binaries from stage one in final to create a skimmed image.
We should be able to manually build the image in a dev environment and copy the published binaries in a basic image and achieve the same skimmed size right?
like this
'''
FROM mcr.microsoft.com/dotnet/sdk:7.0
WORKDIR /crud
COPY publishedfolder ./
ENTRYPOINT ["dotnet", "Asp.netCoreMvcCrud.dll"]
'''
What else Multistage build gives as a advantage beyond Basic Image build and Builder Pattern?

What I would see as the main benefit of building within an image is consistency - you could build the application 'outside' the image, but building inside an image gives you a consistent environment every time. For example, if each dev on a team has a different machine they may have different versions of all the tooling required (JDKs, node, maven, .Net, etc, etc) and so the artifact they build could be subtly different each time meaning the artifact copied into the final image will differ each time. The same applies to building in a CI environment - you may not have the same control over the tooling on the build machine that you have on the build-stage image. When selecting a build image you can specify the exact image you want or build your own with the exact tooling required in it.

Related

Can Cmake be used to build a docker container and then build C++ code within that container?

I have a dockerfile I use to download images for my C++ code. I currently use a Makefile system but can use CMake in theory.
root
|-dockerFile
|-Makefile
|-dockerMakefile
|-CMakeLists.txt
||-target1
|||-CMakeLists.txt
|||-Makefile
|||-src
|||-inc
||-target2
|||-CMakeLists.txt
|||-Makefile
|||-src
|||-inc
The dockerMakefile is just shell script that builds and tags a docker build, then runs that container and then runs make inside that container. I suppose I can modify my script to do that with CMake but is there a more elegant way to do this directly with CMake?
You can add a custom target that would run the docker commands. The only problem is that cmake by its nature is a generator, i.e. it will generate a makefile first and then you'll have to run make anyways. So, to me, it seems a bit pointless.

Why do AWS CodeBuild buildspec.yml support multiple build phases?

Why does the buildspec file support multiple build phases? install, pre_build, build, post_build? Am I doing something wrong if I put all of my build steps into a single phase? Is there something very useful, except for keeping some kind of structure, with these phases?
Failures in some phases will cause the build to exit, whilst others will not. This article spells it out pretty well: https://docs.aws.amazon.com/codebuild/latest/userguide/view-build-details.html#view-build-details-phases
Yeah, we can do all the things in one phase (i.e build phase). But it is not recommended, if any thing fails it will be hard to debug especially when there is lot of things going on your build stage.
INSTALL STAGE - install or upgrate something like java version, node version, gatsby.. etc
PRE_BUILD - node modules, composer packages.. etc
BUILD - your build commands.
POST_BUILD - things to do after build. Like uploading to S3.
If your Build is just one or two commands, Just go with the BUILD phase alone.
Note: Uploading to s3 can also be done using ARTIFACT section.

Ideas for speeding up installing dependencies

I have a project that depends on many external libraries like GLFW3, GLEW, GLM, FreeType2, zlib etc. It would be best to store/share installed dependencies between jobs so it wouldn't have to download/install them all the time which takes about half of the time. I can see couple ideas how to handle it:
a) for each job for each build download dependencies and install them
b) put dependencies (sources) inside my repo and have little speedup becouse i will no longer have to download them from outside servers (still have to compile and install them)
c) compile them by hand, put on some server and just download right package for each build
a) it leaves least work for me to update dependencies for building and testing, allows to use newest versions to build my project but it takes most time (both compiling and downloading)
b) bloats repository with extra code (not mine), gives little speedup (downloading is not that slow usually), adds manual work to update dependencies, i guess worse then a)
c) fastest but requires most work from me to constantly keep built dependencies up to date and upload them on fast server (also diffrent per each build task (compiler etc)), allows for fastest builds (just download & copy/install).
So, how are you managing your external dependencies and keep them up to date for your travis builds?
Note that i use free version of Travis and kinda need sudo for updating cmake, gcc etc. and installing dependencies... Could somehow trick CMake to use local versions of dependencies instead of /usr/... but this somehow bloats CMake which i believe should be very simple and clear.
Let's call the entire set of dependencies your build requires at some point in time the dependency "lineup".
I'd separate the use of (a certain version of) the lineup in your builds from the task of updating the lineup version (when a new version of a dependency is needed) - mixing them unnecessarily complicates the picture IMHO (I'm assuming that many of your builds would use the same dependency lineup version).
Let's look at the use of the lineup first.
From your description the installation of the (already built) dependencies is common to all 3 options and executed at every build, so let's put that aside for now.
The difference between your approaches remains only in how you obtain the built dependencies:
a - download them from outside servers - at every build
b - build them from repo sources - at every build
c - download them from a local fast server - at every build
It seems a and c are fundamentally the same except c would be faster due to local access. Most likely b would be the slowest.
So it looks like c is the winner from the build speed prospective in this context.
Now the question would be if/how you can get your built dependencies lineup on the local server faster for the c approach. I see 2 options (unsure if they're both possible in your context):
download dependencies already built (as you would in your a
approach) from outside servers (effectively just caching them on the
local servers)
building dependencies locally from source, except you don't have to neither place those sources in the same repo as your project nor build them for every project build (as in your b approach) - you only need to do this when you update your lineup (and only for new versions of the respective dependencies).
You could also look into mixing of the 2 options: some dependencies using option 1, others using option 2, as you see fit.
Now, if you're using VM (or docker or similar) images for your build machines and you have control over such images it might be possible to significantly speedup your builds by customizing these VM images - have the dependency lineup installed on them, making them immediately available to any build on a machine running such customized image.
Finally, when time comes to update your dependency lineup (which, BTW, should be present in your a and b approaches, too, not only in the c one) you'd have to just download the new dependencies versions, build them if needed, store the built dependencies on the local fast server and, if the customized VM image solution works for you, update/re-create the customized VM image with the installation of the new dependency lineup.

How should I provide library binaries to developers?

I want to make it easy for others to work on my repository. However, since some of the compiled dependencies are over 100mb in size, I cannot include them into the repository. Github rejects those files.
What is the best way to handle large binaries of dependencies? Building the libraries from source is not easy under Windows and takes hours. I don't want every developer to struggle with this process.
I've recently been working on using Ivy (http://ant.apache.org/ivy/) with C++ binaries. The basic idea is that you build the binaries for every build combination. You will then zip each build combination into a file with a name like mypackage-windows-vs12-x86-debug.zip. In your ivy.xml, you will associate each zip file with exactly one configuration (ex: windows-vs12-x86-debug). Then you publish this package of multiple zip files to an Ivy repo. You can either host the repo yourself or you can try to upload to an existing Ivy repo. You would create a package of zip files for each dependency, and the ivy.xml files will describe the dependency chain among all the packages.
Then, your developers must set up Ivy. In their ivy.xml files, they will list your package as a dependency, along with the configuration they need (ex: windows-vs12-x86-debug). They will also need to add an ivy resolve/retrieve step to their build. Ivy will download the zip files for your package and everything that your package depends on. Then they will need to set up unzip & move tasks in their builds to extract the binaries you are providing, and put them in places their build is expecting.
Ivy's a cool tool but it is definitely streamlined for Java and not for C++. When it's all set up, it's pretty great. However, in my experience as a person who is not really familiar with DevOps at all, integrating it into a C++ build has been challenging. I found that it was easiest to create simple ant tasks that do the required ivy actions, then use my "regular" build system (make) to call those ant tasks when needed.
So I should also mention that the reason I looked into using Ivy was that I was implementing this in a corporate environment where I couldn't change system files. If you and your developers can do that, you may be better off with a RPM/APT system. You'd set up a repo and get your developers to add your repo to the appropriate RPM/APT config file. Then they would run commands like sudo apt-get install mypackage and apt-get would do all the work of downloading and installing the right files in the right places. I don't know how this would work on Windows, maybe someone has created a windows RPM/APT client.

Should I provide ./autogen.sh to build for end users?

I've started a new project in git and use ./autogen.sh and xfce4-dev-tools to generate configure script and others files.
I'm wondering if it's a bad idea to only provide git release or I need to create dist tarball also?
A distribution tarball is easier to use but possibly limited to a certain Linux distribution and sometimes even to a certain version of said distribution.
autogen.sh makes things more flexible but at the cost of needing a more complex setup before you can use your project.
My approach to this problem is to have a script which
installs all dependencies or at least directs people to where they can get them
creates all system specific files
builds the whole project
runs the tests
creates distribution tarballs
I use the same script to build the dist tarballs, so the script is a) useful and b) executed often to keep it healthy.