Managing library dependencies using git - c++

I have a project which is built for multiple OSes(Linux and Windows for now, maybe OS X) and processors. To this project I have a handful of library dependencies, which are manly external but I have a couple of internal ones, in source form which I compile(cross-compile) for each OS-processor combination possible in my context.
Most of the external libraries are not changed very often, just maybe in case of a local bugfix or some feature\bugfix implemented in a newer version I think it may benefit the project. The internal libraries change quite often(1 month cycles) and are provided by another team in my company in binary form, although I also have access to the source code and if I need a bug to be fixed I can do that and generate new binaries for my usage until the next release cycle. The setup I have right now is the following(filesystem only):
-- dependencies
|
-- library_A_v1.0
|
--include
|
--lib
|
-- library_A_v1.2
|
--include
|
--lib
|
-- library_B
|
--include
|
--lib
| ...
The libraries are kept on a server and every time I make an update I have to copy any new binaries and header files on the server. The synchronization on the client side is done using a file synchronization utility. Of course any updates to the libraries need to be announced to the other developers and everyone has to remember to synchronize their "dependencies" folder.
Needless to say that I don't like very much this scheme. So I was thinking of putting my libraries under version control(GIT). Build them, pack them in a tgz\zip and push them on the repo. Each library would have its own git repository so that I could easily tag\branch already used versions and test drive new versions. A "stream" of data for each library that I could easily get, combine, update. I would like to have the following:
get rid of this normal filesystem way of keeping the libraries; right now complete separate folders are kept and managed for each OS and each version and sometimes they get out of sync resulting in a mess
more control over it, to be able to have a clear history of which versions of the libs we used for which version of our project; much like what we can obtain from git(VCS) with our source code
be able to tag\branch the versions of the dependencies I'm using(for each and every one of them); I have my v2.0.0 tag/branch for library_A from which I normally take it for my project but I would like to test drive the 2.1.0 version, so I just build it, push it on the server on a different branch and call my build script with this particular dependency pointing to the new branch
have simpler build scripts - just pull the sources from the server, pull the dependencies and build; that would allow also to use different versions of the same library for different processor-OS combinations(more than often we need that)
I tried to find some alternatives to the direct git based solution but without much success - like git-annex which kind of seems overly complicated for what I'm trying to do.
What I'm facing right now is the fact that there seems to be very strong opinion against putting binary files under git or any VCS(although technically I would have also header files; I could also push the folder structure that I described directly to git to not have the tgz\zip, but I would still have the libraries binaries) and that some of my colleagues, driven by that shared strong opinion, are against this scheme of things. I perfectly understand that git tracks content and not files, but to some extent I will be tracking also content and I believe it will definitely be an improvement over the current scheme of things we have right now.
What would be a better solution to this situation? Do you know of any alternatives to the git(VCS) based scheme of things? Would it be such a monstrous thing to have my scheme under git :)? Please share your opinions and especially your experience in handling these types of situations.
Thanks

An alternative, which would still follwo your project, would be to use git-annex, which would allow you track header files, while keeping binaries stored elsewhere.
Then each git repo can be added as a submodule to your main project.

Related

How should intermediate library dependencies be handled in c++?

I'm building a c++ repo that depends on external company repos that exist to support this repo. I have to build these to target certain versions of boost and other libraries specific to my system. At the end of the (long) build process, I have several static libraries and my finished executable. I use Docker for these builds.
I'm trying to decide what the cleanest approach is for managing these dependencies.
git submodules and build binaries from source each time (longest build)
build libraries individually and store them as artifacts/releases for each repo (most work, across several repos)
make a README on how to rebuild and commit the binaries to the main repo (feels dirty for some reason)
What is the common practice in c++ for dealing with these intermediate binaries?
It's probably impossible to give a generally valid answer. You have to think about who the users of your code (I mean those who compile it) are and what their workflow is.
I personally more and more tend to favor option three in your list - yes, the README file. The reason is that in many cases the users don't need to (and should not) bother with dependencies at all. Very often there is the higher-level build process in place, that makes sure all dependencies are properly prepared (downloaded, optionally patched, compiled and installed) as your application expects. With Docker, I have the feeling this is becoming the norm now. I always provide a Dockerfile (or sometimes even a complete Docker image) where all dependencies are in place and the user can compile without even thinking of those.
If it's not Docker, there may be an other higher-level build process that handles dependencies. As I'm in the embedded industry, we use mostly Yocto, but there are others also. The users don't even need to use Yocto themselves, as I provide them with an SDK that contains all dependencies.
And for the very few who refuse to use all these options and insist on compiling natively on their main machine, I write a list of dependencies in the README file, with a few lines on each, describing how to fetch, compile and install them.
As for the other two options you mentioned:
Option one (git submodules) - I must say that I never really got comfortable with that. IMO they are a bit poorly implemented in git (e.g. it's really cumbersome to find out exactly at which version each submodule is currently checked out). Also, with a higher-level build process in place, it might mean double-fetching each dependency, which is inefficient. Then it's hard to apply patches to the dependency, if you must. You would have to write some extra script and include it in your build process. Lastly, your dependencies may have dependencies by themselves, and then it gets really ugly.
Option two (storing binary artifacts in the repo) is an absolute emergency solution that I would always try to avoid.
And because somebody mentioned git subtrees - we tried that in our team for about half a year. It was an absolute disaster. Nobody really understood it, and about once a week someone messed up the entire repository. Never would I use that again.

g++: Use ZIP files as input

We have the Boost library in our side. It consists of a huge number of files which never change and only a tiny portion of it is used. We swap the whole boost directory if we are changing versions. Currently we have the Boost sources in our SVN, file by file which makes the checkout operations very slow, especially on Windows.
It would be nice if there were a notation / plugin to address C++ files inside ZIP files, something like:
// #ZIPFS ASSIGN 'boost' 'boost.zip/boost'
#include <boost/smart_ptr/shared_ptr.hpp>
Are there any support for compiler hooks in g++? Are there any effort regarding ZIP support? Other ideas?
I assume that make or a similar buildsystem is involved in the process of building your software. I'd put the zip file in the repository, and add a rule to the Makefile to extract it before the actual build starts.
For example, suppose your zip file is in the source tree at "external/boost.zip", and it shall be extracted to "external/boost", and it contains at its toplevel a file "boost_version.h".
# external/Makefile
unpack_boost: boost/boost_version.h
boost/boost_version.h: boost.zip
unzip $<
I don't know the exact syntax of the unzip call, ask your manpage about this.
Then in other Makefiles, you can let your source files depend on the unpack_boost target in order to have make unpack Boost before a source file is compiled.
# src/Makefile (excerpt)
unpack_boost:
make -C ../external unpack_boost
source_file.cpp: unpack_boost
If you're using a Makefile generator (or an entirely different buildsystem), please check the documentation for these programs for how to create something like the custom target unpack_boost. For example, in CMake, you can use the add_custom_command directive.
The fine print: The boost/boost_version.h file is not strictly necessary for the Makefile to work. You could just put the unzip command into the unpack_boost target, but then the target would effectively be phony, that is: it would be executed during each build. The file inbetween (which of course you need to replace by a file which is actually present in the zip archive) ensures that unzip only runs if necessary.
A year ago I was in the same position as you. We kept our source in SVN and, even worse, included boost in the same repository (same branch) as our own code. Trying to work on multiple branches was impossible, as it would take most of a day to check-out a fresh working copy. Moving boost into a separate vendor repository helped, but it would still take hours to check-out.
I switched the team over to git. To give you an idea of how much better it is than SVN, I have just created a repository containing the boost 1.45.0 release, then cloned it over the network. (Cloning copies all of the repository history, which in this case is a single commit, and creates a working copy.)
That clone took six minutes.
In the first six seconds a compressed copy of the repository was copied to my machine. The rest of the time was spent writing all of those tiny files.
I heartily recommend that you try git. The learning curve is steep, but I doubt you'll get much pre-compiler hacking done in the time it would take to clone a copy of boost.
We've been facing similar issues in our company. Managing boost versions in build environments is never going to be easy. With 10+ developers, all coding on their own system(s), you will need some kind of automation.
First, I don't think it's good idea to store copies of big libraries like boost in SVN or any SCM system for that matter, that's not what those systems are designed for, except if you plan to modify code in boost yourself. But let's assume you're not doing that.
Here's how we manage it now, after trying lots of different methods, this works best for us.
For every version of boost that we use, we put the whole tree (unzipped) on a file server and we add extra subdirectories, one for each architecture/compiler-combination, where we put the compiled libraries.
We keep copies of these trees on every build system and in the global system environment we add variables like:
BOOST_1_48=C:\boost\1.48 # Windows environment var
or
BOOST_1_48=/usr/local/boost/1.48 # Linux environment var, e.g. in /etc/profile.d/boost.sh
This directory contains the boost tree (boost/*.hpp) and the added precompiled libs (e.g. lib/win/x64/msvc2010/libboost_system*.lib, ...)
All build configurations (vs solutions, vs property files, gnu makefiles, ...) define an internal variable, importing the environment vars, like:
BOOSTROOT=$(BOOST_1_48) # e.g. in a Makefile, or an included Makefile
and further build rules all use the BOOSTROOT setting for defining include paths and library search paths, e.g.
CXXFLAGS += -I$(BOOSTROOT)
LFLAGS += -L$(BOOSTROOT)/lib/linux/x64/ubuntu/precise
LFLAGS += -lboost_date_time
The reason for keeping local copies of boost is compilation speed. It takes up quite a bit of disk space, especially the compiled libs, but storage is cheap and a developer losing lots of time compiling code is not. Plus, this only needs to be copied once.
The reason for using global environment vars is that build configurations are transferrable from one system to another, and can thus be safely checked in to your SCM system.
To smoothen things a bit, we've developed a little tool that takes care of the copying and setting the global environment. With a CLI, this can even be included in the build process.
Different working environments mean different rules and cultures, but believe me, we've tried lots of things and finally, we decided to define some kind of convention. Maybe ours can inspire you...
This is something you would not do in g++, because any other application that wants to do it would also have to be modified.
Store the files on a compressed filesystem. Then every application gets the benefit automatically.
It should be possible in an OS to allow transparent access to files inside a ZIP file. I know that I put it in the design of my own OS a long time ago (2004 or so) but never got it to a point where it was usable. The downside is that seeking backwards in a file inside a ZIP is slower as it's compressed (and you can't rewind the compressor state, so you have to seek from the start instead). This also makes using a zip-inside-a-zip slow for rewinding and reading. Fortunately, most cases just read a file sequentially.
It should also be retrofittable to current OSes, at least in client space. You can hook the filesystem access functions used (fopen, open, ...) and add a set of virtual file descriptors that your own software would return for a given filename. If it's a real file just pass it on, if it's not open the underlying file (possibly again via this very function) and pass a virtual handle. When accessing the file contents, read directly from the zip file without caching.
On Linux you would use an LD_PRELOAD to inject it into existing software (at usage time), on Windows you can hook the system calls or inject a DLL into the space of software to hook the same functions.
Does anybody know if this already exists? I can't see any clear reason it wouldn't...

Organizing solutions, projects and SVN

I would like some help in setting up a project in SVN with regards to directory structure. I have read several answers regarding this on SO, but as I am new to this, most of them are difficult to understand.
I am building a single library, on which several other distinct project depends on:
I need the ability to export MyLibrary (headers and .lib only) easily for use by third parties
MyLibrary1
Depends on external libraries, should be able to manage different versions of these libraries!
MyLibrary2
Depends on External Libraries fmod, glew, ...
Project 1, 2, 4, 5, 6 ...
Depends on MyLibrary1, 2, or both
Each project could need versions for multiple platforms (osx, windows ...)
I would like to know of a good way to organize this, do keep in mind that I am rather new to this - a more pedantic answer would be helpful. For example if you write something like /src, do explain what is supposed to go into it! I would be able to guess, but I wont be sure =)
////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Edit
I cant put this into a comment, so here goes:
#J.N, thanks for the extensive reply, I would like to clarify some stuff, I hope I understood what you meant properly:
root
library foo
/branches // old versions of foo
/tags // releases of foo
/trunk // current version
/build // stuff required by makefiles
/tools // scripts to launch tests ect
/data // test data needed when running
/output // binaries, .exe files
/dependencies // libraries that foo needs
/lib name
include
lib
/docs // documentation
/releases // generated archives
/sample // sample project that shows how to use foo
/source // *.h, *.cpp
program bar
/branches // old versions of bar
/tags // releases of bar
/trunk // current version
/build // stuff required by makefiles
/tools // scripts to launch tests ect
/data // test data needed when running
/output // binaries, .exe files
/dependencies // libraries that bar needs
/lib name
include
lib
/docs // documentation
/releases // generated archives
/sample // sample project that shows how to use bar
/source // *.h, *.cpp
1) Where do the *.sln files go? In /build?
2) do I need to copy foo/source into bar/dependencies/foo/include? After all, bar depends on foo
3) Where do *.dll files go? If foo has dependencies on dll files, then all programs using foo need access to the same dll files. Should this go into root/dlls?
There are several levels to your questions: how to organize a single project source tree, how to maintain the different projects together, how to maintain the dependencies of those project, how to maintain different variants of each projects and how to package them.
Please keep in mind that whatever you do, your project will eventually grow large enough to make it unadapted. It's normal to change the structure several times in the lifetime of a project. You'll get the feeling that it isn't right anymore when that will happen: it's usually when the setup is bothering you more than it helps.
1 - Maintaining the different variants of each project
Don't have variants for each project, you won't solve several variants by maintaining parralel versions or branches. Have a single source tree for every project/library that can be used for all variants. Don't manage different "OSes", manage different features. That is, have variants on things like "support posix sockets" or "support UI". That means that if a new OS come along, then you just need to choose the set of features it supports rather than starting a new version.
When specific code is needed, create an interface (abstract class in C++), and implement the behaviour with respect to it. That will isolate the problematic code and will help adding new variants in the future. Use a macro to choose the proper one at compile time.
2 - Maintaining the dependencies of each project
Have a specific "dependencies" folder in which each subfolder contains everything needed for one dependency (that is includes and sub dependencies). At the beggining when the codebase is not too large, you don't care too much about ensuring automatically that all the dependencies are compatible with each other, save it for later.
Don't try to merge the dependencies from their root location higher in the svn hierarchy. Formally deliver each new version to the teams needing it, up to them to update their own part of the SVN with it.
Don't try to use several versions of the same dependency at once. That will end badly. If you really need to (but try avoiding it as much as you can), branch your project for each version.
3 - Maintain the different projects
I'd advise to maintain each projects repository independently (with SVN they still could be the same repo, but in separated folders). Branches and tags should be specific to one project, not all of them. Try to limit to a maximum the number of branches, they don't solve problems (even with git). Use branches when you have to maintain different chronoligical versions in parallel (not variants) and fight back as much as you can before you actually do it, everybody will benefit from the use of the newer code.
That will allow to impose security restrictions (not sure if feasible with vanilla SVN, but there are some freely available servers that support it).
I'd recommend sending emails notifications whenever someone commits on a project to everybody potentially interested.
4 - Project source tree organization
Each project should have the following SVN structures:
trunk (current version)
branches (older versions, still in use)
tags (releases, used to create branches without thinking too much when patches are required)
When the project gets bigger, organize branches and tags in sub folders (for instance branches/V1.0/V1.1 and branches/V2.0/V2.1).
Have a root folder with the following subfolders: (some of this may be created by VC itself)
Build system (stuff required by your makefiles or others)
Tools (if any, like an XSLT tool or SOAP compiler, scripts to launch the tests)
Data (test data you need while running)
Output (where the build system put the binaries)
Temp Output (temporary files created by the compilation, optional)
Dependencies
Docs (if any ;) or generated docs)
Releases (the generated archives see later)
Sample (a small project that demonstrate how to use the project / library)
Source ( I don't like to split headers and .cpp, but that's my way )
Avoid too many levels of subfolders, it's hard to search trees, lists are easier
Define properly the build order of each folder (less necessary for VC but still)
I make my namespaces match my folders names (old Java habits, but works)
Clearly define the "public" part that you need to export
If the project is large enough to hold several binaries / dlls each should have its own folder
Don't commit any binaries you generate, only the releases. Binaries like to conflict with each other and cause pain to the other people in the team.
5 - Packaging the projects
First, make sure to include a text file with the SVN revision and the date, there's an automated way to do that with auto props.
You should have a script to generate releases (if time allows). It will check that everything is commited, generate a new version number .... Create a zip/tar.gz archive you must commit/archive, whose name contains the SVN revision, branch and the current date (the format should be normalized accross projects). The archive should have everything that is needed to run the app / use the library in a file structure. Create a tag so that you can start from it for emergency bug fixing.

Is there a build system for C++ which can manage release dependencies?

A little background, we have a fairly large code base, which builds in to a set of libraries - which are then distributed for internal use in various binaries. At the moment, the build process for this is haphazard and everything is built off the trunk.
We would like to explore whether there is a build system which will allow us to manage releases and automatically pull in dependencies. Such a tool exists for java, Maven. I like it's package, repository and dependency mechanism, and I know that with either the maven-native or maven-nar plugin we could get this. However the problem is that we cannot fix the source trees to the "maven way" - and unfortunately (at least the maven-nar) plugins don't seem to like code that is not structured this way...
So my question is, is there a tool which satisfies the following for C++
build
package (for example libraries with all headers, something like the .nar)
upload package to a "repository"
automatically pull in the required dependencies from said repository, extract headers and include in build, extract libraries and link. The depedencies would be described in the "release" for that binary - so if we were to use CI server to build that "release", the build script has the necessary dependencies listed (like the pom.xml files).
I could roll my own by modifying either make+shell scripts or waf/scons with extra python modules for the packaging and dependency management - however I would have thought that this is a common problem and someone somewhere has a tool for this? Or does everyone roll their own? Or have I missed a significant feature of waf/scons or CMake?
EDIT: I should add, OS is preferred, and non-MS...
Most of the linux distributions, for example, contain dependency tracking for their packages. Of all the things that I've tried to cobble together myself to take on your problem, in the end they all are "not quite perfect". The best thing to do, IMHO, is to create a local yum/deb repository or something (continuing my linux example) and then pull stuff from there as needed.
Many of the source-packages also quickly tell you the minimum components that must be installed to do a self-build (as opposed to installing a binary pre-compiled package).
Unfortunately, these methods are that much easier, though it's better than trying to do it yourself. In the end, to be cross-platform supporting, you need one of these systems per OS as well. Fun!
I am not sure if I understand correctly what you want to du, but I will tell you what we use and hope it helps.
We use cmake for our build. It hat to be noted that cmake is quite powerful. Among other things, you can "make install" in custom directories to collect headers and binaries there to build your release. We combine this with some python scripting to build our releases. YMMV, but some things might just be too specific for a generic tool and a custom script may be the simpler solution.
Our build tool builds releases directly from a svn reposity (checkout, build, ...) which I can really recommend to avoid some local state polluting the release in some unforseen way. It also enforces reproducability.
It depends a lot on the platforms you're targeting. I can only really speak for Linux, but there it also depends on the distributions you're targeting, packages being a distribution-level concept. To make things a bit simpler, there are families of distributions using similar packaging mechanisms and package names, meaning that the same recipe for making a Debian package will probably make an Ubuntu package too.
I'd definitely say that if you're willing to target a subset of all known Linux distros using a manageable set of packaging mechanisms, you will benefit in the long run from not rolling your own and building packages the way the distribution creators intended. These systems allow you to specify run- and build-time dependencies, and automatic CI environments also exist (like OBS for rpm-based distros).

Source code dependency manager for C++

There are already some questions about dependency managers here, but it seems to me that they are mostly about build systems, while I am looking for something targeted purely at making dependency tracking and resolution simpler (and I'm not necessarily interested in learning a new build system).
So, typically we have a project and some common code with another project. This common code is organized as a library, so when I want to get the latest code version for a project, I should also go get all the libraries from the source control. To do this, I need a list of dependencies. Then, to build the project I can reuse this list too.
I've looked at Maven and Ivy, but I'm not sure if they would be appropriate for C++, as they look quite heavily java-targeted (even though there might be plugins for C++, I haven't found people recommending them).
I see it as a GUI tool producing some standardized dependency list which can then be parsed by different scripts etc. It would be nice if it could integrate with source control (tag, get a tagged version with dependencies etc), but that's optional.
Would you have any suggestions? Maybe I'm just missing something, and usually it's done some other way with no need for such a tool? Thanks.
You can use Maven in relationship with C++ in two ways. First you can use it for dependency management of components between each other. Second you can use Maven-nar-plugin for creating shared libraries and unit tests in relationship with boost library (my experience). In the end you can create RPM's (maven-rpm-plugin) out of it to have adequate installation medium. Furthermore i have created the installation for CI environment via Maven (RPM's for Hudson, Nexus installation in RPM's).
I'm not sure if you would see an version control system (VCS) as build tool but Mercurial and Git support sub-repositories. In your case a sub-repository would be your dependencies:
Join multiple subrepos into one and preserve history in Mercurial
Multiple git repo in one project
Use your VCS to archive the build results -- needed anyway for maintenance -- and refer to the libs and header files in your build environment.
If you are looking for a reference take a look at https://android.googlesource.com/platform/manifest.