Ideas for speeding up installing dependencies

Ideas for speeding up installing dependencies - c++

I have a project that depends on many external libraries like GLFW3, GLEW, GLM, FreeType2, zlib etc. It would be best to store/share installed dependencies between jobs so it wouldn't have to download/install them all the time which takes about half of the time. I can see couple ideas how to handle it:
a) for each job for each build download dependencies and install them
b) put dependencies (sources) inside my repo and have little speedup becouse i will no longer have to download them from outside servers (still have to compile and install them)
c) compile them by hand, put on some server and just download right package for each build
a) it leaves least work for me to update dependencies for building and testing, allows to use newest versions to build my project but it takes most time (both compiling and downloading)
b) bloats repository with extra code (not mine), gives little speedup (downloading is not that slow usually), adds manual work to update dependencies, i guess worse then a)
c) fastest but requires most work from me to constantly keep built dependencies up to date and upload them on fast server (also diffrent per each build task (compiler etc)), allows for fastest builds (just download & copy/install).
So, how are you managing your external dependencies and keep them up to date for your travis builds?
Note that i use free version of Travis and kinda need sudo for updating cmake, gcc etc. and installing dependencies... Could somehow trick CMake to use local versions of dependencies instead of /usr/... but this somehow bloats CMake which i believe should be very simple and clear.

Let's call the entire set of dependencies your build requires at some point in time the dependency "lineup".
I'd separate the use of (a certain version of) the lineup in your builds from the task of updating the lineup version (when a new version of a dependency is needed) - mixing them unnecessarily complicates the picture IMHO (I'm assuming that many of your builds would use the same dependency lineup version).
Let's look at the use of the lineup first.
From your description the installation of the (already built) dependencies is common to all 3 options and executed at every build, so let's put that aside for now.
The difference between your approaches remains only in how you obtain the built dependencies:
a - download them from outside servers - at every build
b - build them from repo sources - at every build
c - download them from a local fast server - at every build
It seems a and c are fundamentally the same except c would be faster due to local access. Most likely b would be the slowest.
So it looks like c is the winner from the build speed prospective in this context.
Now the question would be if/how you can get your built dependencies lineup on the local server faster for the c approach. I see 2 options (unsure if they're both possible in your context):
download dependencies already built (as you would in your a
approach) from outside servers (effectively just caching them on the
local servers)
building dependencies locally from source, except you don't have to neither place those sources in the same repo as your project nor build them for every project build (as in your b approach) - you only need to do this when you update your lineup (and only for new versions of the respective dependencies).
You could also look into mixing of the 2 options: some dependencies using option 1, others using option 2, as you see fit.
Now, if you're using VM (or docker or similar) images for your build machines and you have control over such images it might be possible to significantly speedup your builds by customizing these VM images - have the dependency lineup installed on them, making them immediately available to any build on a machine running such customized image.
Finally, when time comes to update your dependency lineup (which, BTW, should be present in your a and b approaches, too, not only in the c one) you'd have to just download the new dependencies versions, build them if needed, store the built dependencies on the local fast server and, if the customized VM image solution works for you, update/re-create the customized VM image with the installation of the new dependency lineup.

Related

How should intermediate library dependencies be handled in c++?

I'm building a c++ repo that depends on external company repos that exist to support this repo. I have to build these to target certain versions of boost and other libraries specific to my system. At the end of the (long) build process, I have several static libraries and my finished executable. I use Docker for these builds.
I'm trying to decide what the cleanest approach is for managing these dependencies.
git submodules and build binaries from source each time (longest build)
build libraries individually and store them as artifacts/releases for each repo (most work, across several repos)
make a README on how to rebuild and commit the binaries to the main repo (feels dirty for some reason)
What is the common practice in c++ for dealing with these intermediate binaries?

It's probably impossible to give a generally valid answer. You have to think about who the users of your code (I mean those who compile it) are and what their workflow is.
I personally more and more tend to favor option three in your list - yes, the README file. The reason is that in many cases the users don't need to (and should not) bother with dependencies at all. Very often there is the higher-level build process in place, that makes sure all dependencies are properly prepared (downloaded, optionally patched, compiled and installed) as your application expects. With Docker, I have the feeling this is becoming the norm now. I always provide a Dockerfile (or sometimes even a complete Docker image) where all dependencies are in place and the user can compile without even thinking of those.
If it's not Docker, there may be an other higher-level build process that handles dependencies. As I'm in the embedded industry, we use mostly Yocto, but there are others also. The users don't even need to use Yocto themselves, as I provide them with an SDK that contains all dependencies.
And for the very few who refuse to use all these options and insist on compiling natively on their main machine, I write a list of dependencies in the README file, with a few lines on each, describing how to fetch, compile and install them.
As for the other two options you mentioned:
Option one (git submodules) - I must say that I never really got comfortable with that. IMO they are a bit poorly implemented in git (e.g. it's really cumbersome to find out exactly at which version each submodule is currently checked out). Also, with a higher-level build process in place, it might mean double-fetching each dependency, which is inefficient. Then it's hard to apply patches to the dependency, if you must. You would have to write some extra script and include it in your build process. Lastly, your dependencies may have dependencies by themselves, and then it gets really ugly.
Option two (storing binary artifacts in the repo) is an absolute emergency solution that I would always try to avoid.
And because somebody mentioned git subtrees - we tried that in our team for about half a year. It was an absolute disaster. Nobody really understood it, and about once a week someone messed up the entire repository. Never would I use that again.

Is there any way to build TensorFlow from source without having internet?

Currently building tensorflow from source need to connect to internet to download some dependencies. Every time when I rebuild it, bazel will delete what have been downloaded and re-download them.
I wonder if there is any possible way to avoid this by pre-download all the dependencies and just build it without internet?

It is possible to pre-download 3rd party dependencies, as explained here.
In tensorflow v0.11.0, they are listed in "tensorflow/workspace.bzl". After downloading files you need, replace links like this:
url = "http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz",
to
url = "file:////mnt/a/usr/bzip2-1.0.6.tar.gz",
Note, that there's about 20 dependencies to download.

It seems to be a case not well supported by TensorFlow, apparently one way is to build Docker or other VM environment, and distribute that --
https://github.com/tensorflow/tensorflow/issues/3194#issuecomment-231326381

Bazel automatically caches the external dependencies it downloads. Is it possible you are:
Moving the tensorflow source around
Changing the BUILD files it uses for external repositories
Building different targets (that might require other dependencies) each time?
If none of those seem likely, can you add the output for running identical bazel builds twice in a row where you're seeing re-downloading behavior, using --explain?

TeamCity dependency without copying files?

Is there a way to have the artifacts of a TeamCity build referenced (but not copied) as a dependency for another TeamCity build?
Some background: I've been trying to reduce the build times in a couple of our TeamCity configurations. It's a c++ program that depends on several 3rd party libraries, which our Sys Admin has been loathe to install on the build machine.
Our first run had the libraries zipped up and uncompressed / compiled as a build step within the configurations. This takes a while, so the Sys Admin auggested moving the 3rd party lib decompression / compilation into a separate configuration and setting the artifacts of that build as a dependency for the build I'm trying to speed up.
Things are worse under this build configuration, however. The size of the expanded / compiled 3rd party libs (over 1GB) actually makes the original configuration speedier by over 10 minutes. If there was a way to just reference the artifact directory without copying stuff over, that would be fantastic.

Do not use artefact dependencies.
Instead create two or more build configurations (one for your main application, one or more for the 3rd party libraries) then create snapshot dependencies between them, configuring it to Run build on the same agent.
Doing this will ensure the binaries from your 3rd party libraries are always available on the local file system and always up-to-date (yet without being constantly rebuilt - assuming no source changes).
You should be able to locate the 3rd party binaries easy enough in the checkout directory.
The reason artefacts are slow is they get uploaded to the central central server, then downloaded by agents. Obviously not a good fit for a 1GB of 3rd party libraries.

As far as I know there is no way to prevent artifact copy from server to agent: it will be impossible for the compiler / linker to find dependencies...
In my opinion you can take the best of both configurations by publishing zipped artifacts (just postpone a ".zip" to the destination path) and fetching them from "last successful build".
This way you will trigger the lib recompile only on respective source code changes (decreasing overall build time) and artifacts will be transferred as a compressed archive (decreased transfer time).
Maybe you can optimize further by building each lib separately from others: only libs with pending changes will be recompiled.

Is there a build system for C++ which can manage release dependencies?

A little background, we have a fairly large code base, which builds in to a set of libraries - which are then distributed for internal use in various binaries. At the moment, the build process for this is haphazard and everything is built off the trunk.
We would like to explore whether there is a build system which will allow us to manage releases and automatically pull in dependencies. Such a tool exists for java, Maven. I like it's package, repository and dependency mechanism, and I know that with either the maven-native or maven-nar plugin we could get this. However the problem is that we cannot fix the source trees to the "maven way" - and unfortunately (at least the maven-nar) plugins don't seem to like code that is not structured this way...
So my question is, is there a tool which satisfies the following for C++
build
package (for example libraries with all headers, something like the .nar)
upload package to a "repository"
automatically pull in the required dependencies from said repository, extract headers and include in build, extract libraries and link. The depedencies would be described in the "release" for that binary - so if we were to use CI server to build that "release", the build script has the necessary dependencies listed (like the pom.xml files).
I could roll my own by modifying either make+shell scripts or waf/scons with extra python modules for the packaging and dependency management - however I would have thought that this is a common problem and someone somewhere has a tool for this? Or does everyone roll their own? Or have I missed a significant feature of waf/scons or CMake?
EDIT: I should add, OS is preferred, and non-MS...

Most of the linux distributions, for example, contain dependency tracking for their packages. Of all the things that I've tried to cobble together myself to take on your problem, in the end they all are "not quite perfect". The best thing to do, IMHO, is to create a local yum/deb repository or something (continuing my linux example) and then pull stuff from there as needed.
Many of the source-packages also quickly tell you the minimum components that must be installed to do a self-build (as opposed to installing a binary pre-compiled package).
Unfortunately, these methods are that much easier, though it's better than trying to do it yourself. In the end, to be cross-platform supporting, you need one of these systems per OS as well. Fun!

I am not sure if I understand correctly what you want to du, but I will tell you what we use and hope it helps.
We use cmake for our build. It hat to be noted that cmake is quite powerful. Among other things, you can "make install" in custom directories to collect headers and binaries there to build your release. We combine this with some python scripting to build our releases. YMMV, but some things might just be too specific for a generic tool and a custom script may be the simpler solution.
Our build tool builds releases directly from a svn reposity (checkout, build, ...) which I can really recommend to avoid some local state polluting the release in some unforseen way. It also enforces reproducability.

It depends a lot on the platforms you're targeting. I can only really speak for Linux, but there it also depends on the distributions you're targeting, packages being a distribution-level concept. To make things a bit simpler, there are families of distributions using similar packaging mechanisms and package names, meaning that the same recipe for making a Debian package will probably make an Ubuntu package too.
I'd definitely say that if you're willing to target a subset of all known Linux distros using a manageable set of packaging mechanisms, you will benefit in the long run from not rolling your own and building packages the way the distribution creators intended. These systems allow you to specify run- and build-time dependencies, and automatic CI environments also exist (like OBS for rpm-based distros).

Source code dependency manager for C++

There are already some questions about dependency managers here, but it seems to me that they are mostly about build systems, while I am looking for something targeted purely at making dependency tracking and resolution simpler (and I'm not necessarily interested in learning a new build system).
So, typically we have a project and some common code with another project. This common code is organized as a library, so when I want to get the latest code version for a project, I should also go get all the libraries from the source control. To do this, I need a list of dependencies. Then, to build the project I can reuse this list too.
I've looked at Maven and Ivy, but I'm not sure if they would be appropriate for C++, as they look quite heavily java-targeted (even though there might be plugins for C++, I haven't found people recommending them).
I see it as a GUI tool producing some standardized dependency list which can then be parsed by different scripts etc. It would be nice if it could integrate with source control (tag, get a tagged version with dependencies etc), but that's optional.
Would you have any suggestions? Maybe I'm just missing something, and usually it's done some other way with no need for such a tool? Thanks.

You can use Maven in relationship with C++ in two ways. First you can use it for dependency management of components between each other. Second you can use Maven-nar-plugin for creating shared libraries and unit tests in relationship with boost library (my experience). In the end you can create RPM's (maven-rpm-plugin) out of it to have adequate installation medium. Furthermore i have created the installation for CI environment via Maven (RPM's for Hudson, Nexus installation in RPM's).

I'm not sure if you would see an version control system (VCS) as build tool but Mercurial and Git support sub-repositories. In your case a sub-repository would be your dependencies:
Join multiple subrepos into one and preserve history in Mercurial
Multiple git repo in one project
Use your VCS to archive the build results -- needed anyway for maintenance -- and refer to the libs and header files in your build environment.
If you are looking for a reference take a look at https://android.googlesource.com/platform/manifest.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js